This wiki has undergone a migration to Confluence found Here

Difference between revisions of "Schema based code generation"

From HL7Wiki
Jump to navigation Jump to search
Line 52: Line 52:
 
#*Simplify the datatypes.xsd schema, by removing all unnecessary [for code generation] hierarchies from the definition, and removing all features from the data type definition that won't be used in the context of a particular [code generation] project.
 
#*Simplify the datatypes.xsd schema, by removing all unnecessary [for code generation] hierarchies from the definition, and removing all features from the data type definition that won't be used in the context of a particular [code generation] project.
 
#*Replace all HL7 v3 data types that have a direct (functional) equivalent in the XML schema language with their equivalents. The generated code will be smaller, and won't reference the hierarchical data type definition as defined by HL7.
 
#*Replace all HL7 v3 data types that have a direct (functional) equivalent in the XML schema language with their equivalents. The generated code will be smaller, and won't reference the hierarchical data type definition as defined by HL7.
#*Examples: replace ST and CS with xsd:string, and TS with xsd:date.
+
#*Examples: replace ST with xsd:string, and CS with xsd:token, and TS with xsd:date (use very carefully for partial dates!).
 
#Replace element names and attribute names by more 'readable' names - notably the names of the type definitions, these are automatically generated by HL7 tooling and cryptic in nature. The more 'meaningful' the schema, the more 'meaningful' the code that is generated. Disadvantage of this step could be that upon serialization of an object tree one has to transform the element/attribute names back to their original names as present in the published HL7 v3 schema.
 
#Replace element names and attribute names by more 'readable' names - notably the names of the type definitions, these are automatically generated by HL7 tooling and cryptic in nature. The more 'meaningful' the schema, the more 'meaningful' the code that is generated. Disadvantage of this step could be that upon serialization of an object tree one has to transform the element/attribute names back to their original names as present in the published HL7 v3 schema.
 
Many of the above optimization steps can be dealt with by an automated process, i.e. by means of a XSLT transform of the XML schema as published by HL7.
 
Many of the above optimization steps can be dealt with by an automated process, i.e. by means of a XSLT transform of the XML schema as published by HL7.

Revision as of 15:49, 23 April 2010

Summary

Code generation is a process whereby the source code (in a particular programming language) is automatically generated. It is an example of Model Driven Software Development. When it comes to code generation the best (most complete) code should be generated from the MIF. Code Generation tends to be a mechanism for the creation of most RIMBAA applications.

The (dis-)advantages of Schema-based code generation versus MIF based code generation include:

  • An advantage of schema based code generation is the wide availability of tools. MIF based code generators exist for Java and .net - but the choice is much more limited.
  • A serious disadvantage of schema based code generation is the fact that the XML schema language isn't powerful enough to express all of the constraints as contained in the MIF. The MIF contains the full details of the HL7 v3 model. The XML schema of the HL7 v3 model is derived from the MIF - with a loss of a significant amount of detail.

Choice of ITS

When it comes to XML schema one has the choice between 2 ITSs:

  1. The XML ITS (v1.1), in use since the inception of HL7 v3, where clone-name based schema are generated for each and every R-MIM/CIM.
  2. The RIM ITS, defined in 2010, where one single RIM-based schema (with about 50 classes) is used for all RIM-based object instances.

From the viewpoint of code generation the XML ITS schema are much more specific than the RIM ITS schema. Currently (2010) most code generation is based on the XML ITS. Neither set of schema is able perform a complete validation.

XML ITS 1.1

This section assumes that one uses the XML ITS 1.1 schema (and not the new RIM ITS schema).

  • Note: schema based code generation is discussed in detail in the following tutorial: Implementation Mechanics (PPT). The tutorial has a Creative Commons license.

Optimization of the schema

The XML schema (as published by HL7) aren't optimized for code generation. Prior to performing the code generation process one should transform the schema to optimize them for code generation and code re-use.

The following are the main optimization methods used prior to the code generation process:

  1. Flatten the schema. Remove all includes from the schema and create one single schema file.
  2. Simplify the data types.
    • Simplify the datatypes.xsd schema, by removing all unnecessary [for code generation] hierarchies from the definition, and removing all features from the data type definition that won't be used in the context of a particular [code generation] project.
    • Replace all HL7 v3 data types that have a direct (functional) equivalent in the XML schema language with their equivalents. The generated code will be smaller, and won't reference the hierarchical data type definition as defined by HL7.
    • Examples: replace ST and CS with xsd:string, and TS with xsd:date.
  3. Replace element names and attribute names by more 'readable' names - notably the names of the type definitions, these are automatically generated by HL7 tooling and cryptic in nature. The more 'meaningful' the schema, the more 'meaningful' the code that is generated. Disadvantage of this step could be that upon serialization of an object tree one has to transform the element/attribute names back to their original names as present in the published HL7 v3 schema.

Many of the above optimization steps can be dealt with by an automated process, i.e. by means of a XSLT transform of the XML schema as published by HL7.

Improve level of code re-use

Suppose one has to generate code for 10 different (but related) HL7 v3 interactions. each of those interactions consists of two wrappers (Transmission Wrapper and ControlAct Wrapper) and may reference a number of CMETs. If one doesn't optimize for code re-use each and every interaction schema will produce code for the wrappers. Optimization of schema for code-generation is mostly based on the following principles:

  1. xxxx
  • Tooling hint: in order to detect similarities/differences between different (versions of) schema, see this description of a Schema Diff tool.

RIM ITS

This section assumes that one uses the RIM ITS schema (and not the older XML ITS schema).

The schema associated with the RIM ITS is very generic which means one has to rely on the presence of template ids in the instance in order to do additional validation (using Schematron or some other non-schema based functionality). One advantage of the RIM ITS is that all models other than the RIM become templates (and this includes R-MIMs).

Optimization of the schema

The XML schema (as published by HL7) aren't optimized for code generation. Prior to performing the code generation process one should transform the schema to optimize them for code generation and code re-use.

There is essentially one core schema, which means there's a high level of re-use - one and the same code applies for all message types.

The following are the main optimization methods used prior to the code generation process:

  1. Simplify the data types.
    • Simplify the datatypes.xsd schema, by removing all unnecessary [for code generation] hierarchies from the definition, and removing all features from the data type definition that won't be used in the context of a particular [code generation] project.
    • Replace all HL7 v3 data types that have a direct (functional) equivalent in the XML schema language with their equivalents. The generated code will be smaller, and won't reference the hierarchical data type definition as defined by HL7.
    • Examples: replace ST with xsd:string, and CS with xsd:token, and TS with xsd:date (use very carefully for partial dates!).
  2. Replace element names and attribute names by more 'readable' names - notably the names of the type definitions, these are automatically generated by HL7 tooling and cryptic in nature. The more 'meaningful' the schema, the more 'meaningful' the code that is generated. Disadvantage of this step could be that upon serialization of an object tree one has to transform the element/attribute names back to their original names as present in the published HL7 v3 schema.

Many of the above optimization steps can be dealt with by an automated process, i.e. by means of a XSLT transform of the XML schema as published by HL7.