Software Implementation of CDA
Notes:
- this page contains a draft deliverable for RIMBAA project 550
- this is not a "pattern" (it's too wide ranging for that), but a "best practices" document
Contents
Summary
This paper addresses the creation of a software application that has to support the CDA R2 model. It discusses the application architecture, and discusses various approaches with regards to code generation and persistence.
Although it is tempting to use XML techniques to support the creation, validation and parsing of CDA documents this paper shows that this approach is associated with a high risk of non conformant CDA instances. A model driven class code generator should be used if one wants to ensure compliance with the CDA standard as well as the appropriate implementation guide and associated templates.
Note: this paper assumes the application has to support one HL7 version 3 model (CDA) only. There are drawbacks to locking down an application to a specific SIM (a.k.a. R-MIM). The use-case whereby one needs to support multiple version 3 SIMs is covered in these discussion pages: Schema based code generation and MIF based code generation.
Introduction
The HL7 e-Document standard (Clinical Document Architecture or CDA) is part of the HL7 version 3 standard. The current release of that standard (Release 2) was published in 2005 . CDA documents are used in a large number of projects, quite often in combination with HL7 version 3 messages or services. This article covers the development of software applications that support the CDA standard. The primary audience consists of application architects and software developers.
The implementation of the CDA standard and the validation of CDA-conform XML instances is based on two types of specifications:
- The CDA class model, a refinement of the HL7 Reference Information Model (RIM). The class model is expressed in MIF (Model Interchange Format ), the meta model format used by HL7 for all version 3 artefacts, or in derivations thereof such as UML or XML Schema. The CDA class model references Hl7 version 3 data types and coding systems.
- Context-specific constraints (Templates) of the generic CDA model, as defined in an DA implementationguide for specific document type and one specific context (e.g. country or project). Templates could express constraints on the class model itself, the use of data types, the values defined by coding systems, or they could be expressions of business rules. An example of the latter category is a template which defines that documents of the type “Natal report” SHALL be created within 7 days after the birth of the child.
At this point in time Templates are mostly defined as part of a CDA implementation guide in a textual form; these can be transformed into software processable specifications such OCL or Schematron . An increasing number of implementation guides are made available with Schematron-based versions of the templates.
The management aspect of templates is a major issue: a single CDA implementation guide may define hundreds of templates – which are quite often defined in terms of templates defined in other (more generic) CDA implementation guides.
MIF and XML schema
A HL7 MIF definition of the CDA class model is provided with the HL7 v3 standard. The CDA MIF file can be transformed into less "rich" expressions such as UM and XML schema. Parts of the requirements as expressed by the MIF are lost during the transformation process.
CDA instances are based on XML and the standard requires that all CDA instances validate (at a minimum) against a published CDA XML schema. This is the main reason why a lot of CDA implementations are based on the CDA XML schema. The wide availability of XML tools is a definitive advantage; there are disadvantages as well. The XML schema language is not rich enough by far to express all of the requirements as present in the original CDA class model. A CDA document instance that validates against the XML schema is not guaranteed to be a valid CDA instance - to be a valid CDA instance one has to create XML that conforms to the rquirements that are expressed in the CDA class model.
Examples of the limited capabilities of the XMl schema language to express the model requirements include the use of conditional XML attributes with a HL7 v3 data type: a CD data type should either use both the attributes {@code and @codeSystem}, or the attribute @nullFlavor. This requirement simply can't be expressed in XML schema. This has the consequence that a CDA instance that only contains @code will be considered to be a valid document instance if validated against the CDA XML schema. Another example is the use of empty XML elements (<element/>), these are not allowed in any HL7 version 3 instance (see Footnote 1). This can't be specified in XML schema. There are complex workarounds for some of the above limitations of the XML schema language; these however lead to large schema definitions. Nictiz, the Dutch NHIN provider which specifies HL7 v3 artefacts for use in the Netherlands, has resorted to publishing a large set of Schematron files (mainly for data types and coding systems) to deal with the 'incomplete' validation as supported by XML schema.
In order to fulfill all requirements as expressed by the CDA class model the starting point for all CDA implementations would have to be the CDA MIF. MIF has the disadvantage that it is a HL7 specific format which is only supported by a limited number of tools.
Software development approaches
The current implementations of CDA can be divided in two categories: a group which uses XML technologies and tools, and another group which is based on the CDA class model (MIF or UML).
CDA implementation using XML techniques
The main reason for using XML techniques when implementing CDA is the fact that CDA is based on the XML standard: CDA instances are XML documents and there is a published XML schema for CDA documents. Template definitions are generally mad available in the form of Schematron files. Schematron is part of the XML family of specifications; a Schematron compiler is based on a generic XSLT engine.
Class generators are commonly used next to other well known XML techniques such as Xpath and DOM/SAX. JAXB is an example of a class generator: a tool which transforms XML schema to corresponding Java classes.
A number of online CDA validation tools are based on XML techniques (Schema and Schematron); examples include NIST (http://xreg2.nist.gov/cda-validation/validation.html), Alschuler Associates (http://www.alschulerassociates.com/validator/), and IHE (http://gazelle.ihe.net/EVSClient/cda/validator.seam?cid=9). These tools can be used to test for validity of CDA instances. A XML document that is considered to be valid these tools need not be a valid CDA instance.
Model driven CDA implementation
The basis for model driven implementations is the CDA class model as documented in the CDA MIF. Because of the fact that CDA essentially is limited to an information model one has the option of creating a very solid mapping from CDA MIF to UML, which in turn allows for the use of UML based tools.
The CDA MIF (or the UML equivalent thereof) can be used by class generators to create a set of classes (in e.g. Java or C#). At this point in time (November 2010) there are a couple of freely available class generators which one could consider when implementing CDA:
- MDHT (http://www.cdatools.org/), a CDA specific class generator. This tools generates Java classes based on a UML representation of the CDA class model and on an OCL representation of applicable templates. The tool also supports the management and definition of templates to deal with the multitude of template definitions in CDA implementation guides. MDHT allows for the definition of templates in a table-like structure. Templates can be defined to be additional constraints on other templates. Templates defined in this way are automatically transformed into formal OCL statements.
- MARC-HI Everest (http://everest.marc-hi.ca/), a HL7 version 3 (not just CDA) MIF-based class generator. This tool doesn't support CDA templates at this point in time; one could use Schematron versions of those templates as an interim solution.
- Java SIG (http://aurora.regenstrief.org/javasig), a (unfortunately not recently updated) MIF-based toolkit which generates Java classes. Unlike the tools listed above this toolkit has been available to the HL7 v3 implementation community for years. It ahs been successfully used in multiple projects, e.g. DRV Bund (a hospital group, Germany) en PCS (leading HIS vendor, Austria). This tool doesn't support CDA templates; one could use Schematron versions of those templates as an interim solution.
There is an online validation tool based on MDHT (http://cdatools.org/validation/). A MIF based validation tool for the Eclipes platform can be found via (http://hl7book.net/index.php?title=Eclipse_Instance_Editor).
Persistence
The CDA standard contains a number of requirements when it comes to the persistence of CDA documents. One of the requirements is that one should be able to reproduce the exact same XML instance which was originally received. In a relational database this requires that one stores the CDA document as a blob; in a native XML database .
If one uses class generator it is recommended that one uses a ORM (Object-Relational Mapper) toolkit to persist the data as present in the CDA instance. The CDA document is 'shredded' into its constituent data components. If data is extracted from a CDA document one should ensure that the relationship between the data and its source document is persisted as well - it could be that document (as a whole) is replaced or nullified at a later point in time, in which case one should also have the ability to designate the data as either nullified or replaced.
Summary and Recommendations
The diagram below shows the relationships between the various artefacts touched upon in this whitepaper. A CDA document has to conform to the requirements as defined in an CDA Implementation Guide. It has to conform to both the formal CDA class model as well as the templates. The CDA class model can be expressed in either MIF, or derived formats such as UML or XML schema. Templates can be expressed in Schematron, or in OCL, or in MIF with OCL annotations. The actual validation of CDA instances is based on the expressions of the CDA class model and the applicable templates.
The use of XML techniques leads to the creation of CDA documents that are not valid instances of the CDA standard. If one is forced to solely rely on XML techniques one should pay particular attention to the HL7 version 3 data types and coding systems. Those are areas that mostly lead to issues because of the lack of expressivity/richness of the XML schema language.
Sources at two of the organizations responsible for online validation tools confirm the above: if they were to develop such a tool from scratch they would not base it on XMl techniques. The current online tools produce too many false positives - documents which are erroneously declared to be valid CDA instances. The management of templates in the form of a set of Schematron files is also reported to be problematic.
A software application will have to based on the CDA class model if one wishes to ensure that one creates valid CDA instances. Applications that are based on teh CDA XML schema can't guarantee that the documents are valid CDA instances. The MDHT tool is currently the best tool available to support the creation of CDA model based application development. MDHT also offers support for the management and validation of templates.
Footnotes
- On the use of empty XML elements (<element/>): in certain rather exotic circumstances empty XML elements may occur in HL7 version 3 instances. For example: if a model were to have a mandatory participation linked to a Role which has no required/mandatory attributes the Role could be present in the instance as an empty XML element. The CDA model doesn't contain any such requirements; a particular exotic CDA template could have this type of requirement as well, resulting in an empty XML element. The statement by the authors of this white paper that "no v3 instance SHALL contain an empty XML element" leads to a lot less "false positives" and a very small number of "false negatives".