Difference between revisions of "Schema based code generation"
Rene spronk (talk | contribs) |
Rene spronk (talk | contribs) |
||
Line 21: | Line 21: | ||
===Optimization of the schema=== | ===Optimization of the schema=== | ||
+ | The XML schema (as published by HL7) have to be optimized for code generation. The following are the main optimization methods used prior to the code generation process: | ||
+ | #Flatten the schema. Remove all includes from the schema and create one single schema file. | ||
+ | #Simplify the data types. | ||
+ | #*Simplify the datatypes.xsd schema, by removing all unnecessary [for code generation] hierarchies from the definition, and removing all features from the data type definition that won't be used in the context of a particular [code generation] project. | ||
+ | #*Replace all HL7 v3 data types that have a direct (functional) equivalent in the XML schema language with their equivalents. The generated code will be smaller, and won't reference the hierarchical data type definition as defined by HL7. | ||
+ | #*Examples: replace ST and CS with xsd:string, and TS with xsd:date. | ||
+ | #Replace element names and attribute names by more 'readable' names. The more 'meaningful' the schema, the more 'meaningful' the code that is generated. Disadvantage of this step is that upon serialization of an object tree one has to transform the element/attribute names back to their original names as present in the published HL7 v3 schema. | ||
+ | Many of the above optimization steps can be dealt with by an automated process, i.e. by means of a XSLT transform of the XML schema as published by HL7. | ||
+ | Suppose one has to generate code for 10 different (but related) HL7 v3 interactions. each of those interactions consists of two wrappers ([[Transmission Wrapper]] and [[ControlAct Wrapper]]) and may reference a number of [[CMET]]s. If one doesn't optimize for code re-use each and every interaction schema will produce code for the wrappers. Optimization of schema for code-generation is mostly based on the following principles: | ||
+ | #xxxx | ||
==Discussion== | ==Discussion== |
Revision as of 13:40, 28 January 2010
Contents
Summary
When it comes to code generation one of the options is XML Schema based code generation. Schema based code generators exist for most programming languages.
- An alternative to schema based code generation is MIF based code generation.
- NOTE: this wiki page assumes that one uses the XML ITS 1.1 schema (and not the new RIM ITS schema).
Analysis
- Note: schema based code generation is discussed in detail in the following tutorial: Implementation Mechanics (PPT). The tutorial has a Creative Commons license.
Code Generation (as a generic mechanism) has several advantages over manual coding. MIF based code generation has advantages over schema based code generation. The advantages and disadvantages are discussed below.
The XML schema (as published by HL7) aren't optimized for code generation. Prior to performing the code generation process one should transform the schema to optimize them for code generation and code re-use. See below for details.
Advantages and disadvantages
Code Generation versus manual coding:
- Code Generation is an example of Model Driven Software development. The main advantages are flexibility and rapid development; the main disadvantage is that it may lead to a larger code base, and lower re-use of code for overlapping parts of models. See MDD for a full list of advantages.
Schema-based code generation versus MIF based code generation.
- An advantage of schema based code generation is the wide availability of tools. MIF based code generators exist for Java and .net - but the choice is much more limited.
- A serious disadvantage of schema based code generation is the fact that the XML schema language isn't powerful enough to express all of the constraints as contained in the MIF. The MIF contains the full details of the HL7 v3 model. The XML schema of the HL7 v3 model is derived from the MIF - with a loss of a significant amount of detail.
Optimization of the schema
The XML schema (as published by HL7) have to be optimized for code generation. The following are the main optimization methods used prior to the code generation process:
- Flatten the schema. Remove all includes from the schema and create one single schema file.
- Simplify the data types.
- Simplify the datatypes.xsd schema, by removing all unnecessary [for code generation] hierarchies from the definition, and removing all features from the data type definition that won't be used in the context of a particular [code generation] project.
- Replace all HL7 v3 data types that have a direct (functional) equivalent in the XML schema language with their equivalents. The generated code will be smaller, and won't reference the hierarchical data type definition as defined by HL7.
- Examples: replace ST and CS with xsd:string, and TS with xsd:date.
- Replace element names and attribute names by more 'readable' names. The more 'meaningful' the schema, the more 'meaningful' the code that is generated. Disadvantage of this step is that upon serialization of an object tree one has to transform the element/attribute names back to their original names as present in the published HL7 v3 schema.
Many of the above optimization steps can be dealt with by an automated process, i.e. by means of a XSLT transform of the XML schema as published by HL7.
Suppose one has to generate code for 10 different (but related) HL7 v3 interactions. each of those interactions consists of two wrappers (Transmission Wrapper and ControlAct Wrapper) and may reference a number of CMETs. If one doesn't optimize for code re-use each and every interaction schema will produce code for the wrappers. Optimization of schema for code-generation is mostly based on the following principles:
- xxxx