Software Implementation of CDA

This whitepaper is one of a series of whitepapers created by the AID Work Group. The whitepaper is based on actual CDA implementation experiences and aims to document a best practice or an implementation pattern.

The contents of this whitepaper (this particular version) were approved by the AID WG on 2015-05-11 as a reflection of current best practice. The white paper will be up for approval again in May 2016. This is a "living" document, it may be updated by any person at any point in time.

Short URL: http://j.mp/gDwZKm - See also: CDA Implementation Tools

Summary

This paper addresses the creation of a software application that has to support the CDA R2 model. It discusses the application architecture, and discusses various approaches with regards to code generation and persistence.

Although it is tempting to use XML techniques to support the creation, validation and parsing of CDA documents this paper shows that this approach is associated with a high risk of non conformant CDA instances. A model driven class code generator should be used if one wants to ensure compliance with the CDA standard as well as the appropriate implementation guide and associated templates.

Note: this paper assumes the application has to support one HL7 version 3 model (CDA) only. The use-case whereby one needs to support multiple version 3 SIMs is covered in these discussion pages: Schema based code generation and MIF based code generation.

Introduction

The HL7 e-Document standard (Clinical Document Architecture or CDA) is part of the HL7 version 3 standard. The current release of that standard (Release 2) was published in 2005. CDA documents are used in a large number of projects, quite often in combination with HL7 version 3 messages or services. This article covers the development of software applications that have to support the CDA standard. The primary audience consists of application architects and software developers.

The implementation of the CDA standard and the validation of CDA-conform XML instances is based on two types of specifications:

The CDA class model, a refinement of the HL7 Reference Information Model (RIM). The class model is expressed in MIF (Model Interchange Format), the meta model format used by HL7 for all version 3 artefacts, or in derivations thereof such as UML or XML Schema. The CDA class model references HL7 version 3 data types and coding systems.
Context-specific constraints (Templates) of the generic CDA model, as defined in a CDA implementation guide for specific document type and one specific context (e.g. country or project). Templates could express constraints on the class model itself, on the use of data types, on the values defined by coding systems, or they could be expressions of business rules. An example of the latter category is a template which defines that the 'creation date' of “Natal report” documents SHALL be no more than 7 days after the birth of the child.

At this point in time Templates are defined either in

textual form as part of a CDA implementation guide; these can be (manually) transformed into software processable specifications such as OCL or Schematron. Many implementation guides are being published jointly with Schematron-based versions of the templates.
electronic form as supported by template design tools. The underlying electronic format is (as of yet) proprietary in the case of Lantana’s Trifolia (http://www.lantanagroup.com/newsroom/press-releases/trifolia-workbench-hl7-web-edition/), or is based on the HL7 Templates DSTU (HL7 Templates Standard: Specification and Use of Reusable Information Constraint Templates, Release 1) in the case of ART-DECOR (http://www.art-decor.org).

Recently (2014) HL7 has published a standard for the expression of template definitions (the DSTU HL7 Templates Standard). This format is in the process of being adopted by Template Editors, Template Repositories, as well as Schematron and code generation tools. The ART-DECOR tool already incorporated the DSTU, can create, edit and manage templates, generate schematrons out of the definitions, validate instances and acts both as a registry and a repository for templates and accompanying value sets. ART-DECOR also acts as a reference tools for the Templates DSTU. IHE’s testing and validation suite Gazelle (ObjectsChecker) has the ability to consume ART-DECOR templates in order to allow model driven validation of CDA instances.

The management aspect of templates is a major issue: a single CDA implementation guide may define hundreds of templates – which are quite often defined in terms of templates defined in other (more generic) CDA implementation guides. This issue is also illustrated by the creation of a US-Realm Consolidated CDA implementation guide (a.k.a. CCDA) - the number of templates (e.g. as defined by IHE, HL7, and HITSP) and the incompatibilities between them made it necessary to consolidate a number of template definitions.

MIF and XML schema

A HL7 MIF definition of the CDA class model is provided with the HL7 v3 standard. The CDA MIF file can be transformed into less "rich" expressions such as UML and XML schema. Parts of the requirements as expressed by the MIF are lost during the transformation process.

CDA instances are based on XML and the standard requires that all CDA instances validate (at a minimum) against a published CDA XML schema. This is the main reason why a lot of CDA implementations are based on the CDA XML schema. The wide availability of XML tools is a definite advantage; there are disadvantages as well. The XML schema language is not rich enough by far to express all of the requirements as present in the original CDA class model. A CDA document instance that validates against the XML schema is not guaranteed to be a valid CDA instance - to be a valid CDA instance one has to create XML that conforms to the requirements that are expressed in the CDA class model.

Examples of the limited capabilities of the XML schema language to express the model requirements include the use of conditional XML attributes with a HL7 v3 data type: a CD data type should either use both the attributes {@code and @codeSystem}, or the attribute @nullFlavor. This requirement simply can't be expressed in XML schema. This has the consequence that a CDA instance that only contains @code will be considered to be a valid document instance if validated against the CDA XML schema. Another example is the use of empty XML elements (<element/>), these are not allowed in any HL7 version 3 instance (see Footnote 1). This can't be specified in XML schema. There are complex workarounds for some of the above limitations of the XML schema language; these however lead to large and unwieldy schema definitions.

Note: (September 2012) XML Schema 1.1, a yet to be finalized W3C specification does support many of the desired features. It has yet to be determined whether or not most XML tools support version 1.1 - that would be a prerequisite for HL7 to start generating XML Schema 1.1.

Nictiz, the Dutch NHIN provider which specifies HL7 v3 artefacts for use in the Netherlands, has resorted to publishing a large set of Schematron files (mainly for data types and coding systems) to deal with the 'incomplete' validation as supported by XML schema. The limitations of XML schema are also illustrated by the "Common issues found in implementations of the HL7 Clinical Document Architecture (CDA)" paper (http://www.ringholm.de/docs/03020_en_HL7_CDA_common_issues_error.htm) written in 2008, and the "Model-based Analysis of HL7 CDA R2 Conformance and Requirements Coverage" paper (http://www.ejbi.org/img/ejbi/2015/2/Boufahja_en.pdf) written by IHE in 2015.

In order to fulfill all requirements as expressed by the CDA class model the starting point for all CDA implementations would have to be the CDA MIF. MIF however has the disadvantage that it is a HL7 specific format which is only supported by a limited number of tools.

Software development approaches

The current implementations of CDA can be divided in two categories: a group which uses XML technologies and tools, and another group which is based on the CDA class model (MIF or UML).

CDA implementation using XML techniques

The main reason for using XML techniques when implementing CDA is the fact that CDA is based on the XML standard: CDA instances are XML documents and there is a published XML schema for CDA documents. Template definitions are generally made available in the form of Schematron files. Schematron is part of the XML family of specifications; a Schematron compiler is based on a generic XSLT engine. The choice to use XML techniques is therefore an obvious one.

Class generators are commonly used next to other well known XML techniques such as Xpath and DOM/SAX. JAXB is an example of a class generator: a tool which transforms XML schema to corresponding Java classes.

A number of online CDA validation tools are based on XML techniques (Schema and Schematron); examples include NIST (http://xreg2.nist.gov/cda-validation/validation.html), and Lantana (https://www.lantanagroup.com/validator/). These tools can be used to test for validity of CDA instances. A XML document that is considered to be valid by these tools is not necessarily a valid CDA instance as XML-based tools are not capable of validating all aspects of a CDA.

Model driven CDA implementation

The basis for model driven implementations is the CDA class model as documented in the CDA MIF. Because of the fact that CDA essentially is an information model without any behavioral aspects associated with it one has the option of creating a very solid mapping from CDA MIF to UML, which in turn allows for the use of UML based tools.

The CDA MIF (or the UML equivalent thereof) can be used by class generators to create a set of classes (in e.g. Java or C#). At this point in time (November 2010) there are a couple of freely available class generators which one could consider when implementing CDA:

MDHT (http://www.cdatools.org/), a CDA specific class generator. This tool generates Java classes based on a UML representation of the CDA class model and on an OCL representation of applicable templates. The tool also supports the management and definition of templates to deal with the multitude of template definitions in CDA implementation guides. MDHT allows for the definition of templates in a table-like structure. Templates can be defined to be additional constraints on other templates. Templates defined in this way are automatically transformed into formal OCL statements; these OCL statements are in turn used when generating the Java classes.
MARC-HI Everest (http://everest.marc-hi.ca/), a HL7 version 3 (not just CDA) MIF-based class generator. This tool has recently (2014) embraced the HL7 Template Definition standard. A new add-on module (Sherpas) has been developed to compile the HL7 templates (e.g. as produced by ART DECOR) and to generate code. Sherpas is available in the development branch of Everest and will be included in the 1.4 release of Everest (which will be released in the summer of 2015). The current (2014) Everest 1.3 release has only partial support for templates. See https://www.youtube.com/watch?v=p5oasVIQaNE&feature=youtu.be for a recorded presentation related to Everest and Sherpas.

There is an online validation tool based on MDHT (http://cdatools.org/validation/), and IHE hosts an online 2-step (schema as well as model based) validation tool (http://gazelle.ihe.net/EVSClient/cda/validator.seam?cid=9)

Green CDA

The GreenCDA Project added the concept of a simplified XML format that can be transformed to and from the full normative CDA XML format. This has the implementation advantage that simplified schema exist for GreenCDA versions of a CDA document - an improvement over the generic normative CDA schema in terms of validation strength as well as its suitability to be used as a basis for code generation. Simplified schema, optimized for human readability, generate readable code.

In principle each and every implementer could define their own GreenCDA format, GreenCDA describes a process and not a format. When the process of 'Greening' is applied to Entry-level CDA templates there can be a significant level of re-use between various CDA document types. generally such an approach will only be possible if one has to implement multiple document types based on one common set of templates (e.g. in the context of US Meaningful Use requirements, or when implementing a set of IHE PCC implementation guides for CDA).

When implemented in an XML-centric fashion the same caveats and limitations as described above (see section CDA implementation using XML techniques) apply; the biggest one being that XML-based tools are simply not capable of validating all aspects of a CDA.
However, one could also view the Green CDA format as a (simplified) class model, and use it for code generation. The normative full CDA format is then transformed to/from the Green CDA XML format, which in turn is generated/processed by the generated code.

Persistence

The CDA standard contains a number of requirements when it comes to the persistence of CDA documents. One of the requirements is that one should be able to reproduce the exact same XML instance which was originally received. In a relational database this requires that one stores the CDA document as a blob; or one could use a native XML database .

If one uses a class generator it is recommended that one uses an ORM (Object-Relational Mapper) toolkit to persist the data as present in the CDA instance. The CDA document is 'shredded' into its constituent data components. If data is extracted from a CDA document one should ensure that the relationship between the data and its source document is persisted as well - it could be that document (as a whole) is replaced or nullified at a later point in time, in which case one should also have the ability to designate the data as either nullified or replaced.

Processing

CDA documents may be based on a specific implementation guide and a series of templates. These provide context which may be used when processing the contents of the document. Knowledge of the context (the underlying use case) does allow one to reuse code.

In general it is advisable to use the available context for processing. As an alternative one could base the entire processing logic on the data itself, irrespective of the implementation guide and templates used.

Summary and Recommendations

The diagram below shows the relationships between the various artefacts discussed in this whitepaper. A CDA document has to conform to the requirements as defined in a CDA Implementation Guide. It has to conform to both the formal CDA class model as well as the templates. The CDA class model can be expressed in either MIF, or in a derived format such as UML or XML schema. Templates can be expressed in Schematron, or in OCL, or in MIF with OCL annotations. The actual validation of CDA instances is based on the expressions of the CDA class model and the applicable templates.

The use of XML techniques leads to the creation of CDA documents that are not valid instances of the CDA standard. If one is forced to solely rely on XML techniques (and up to about two years ago there really wasn’t any other available option) one should pay particular attention to the HL7 version 3 data types and coding systems. Those are areas that mostly lead to issues because of the lack of expressivity/richness of the XML schema language.

Sources at two of the organizations responsible for CDA online validation tools confirm the above: if they were to develop such a tool from scratch they would not base it on XML techniques. The current online tools produce too many false positives - documents which are erroneously declared to be valid CDA instances. The management of templates in the form of a set of Schematron files is also reported to be problematic.

A software application will have to be based on the CDA class model if one wishes to ensure that one creates valid CDA instances. Applications that are based on the CDA XML schema can't guarantee that the documents are valid CDA instances. Both the MDHT as well as the Everest toolkit support templates - they are the best candidates for a model based implementation.

Footnotes

On the use of empty XML elements (<element/>): in certain rather exotic circumstances empty XML elements may occur in HL7 version 3 instances. For example: if a model were to have a mandatory participation linked to a Role which has no required/mandatory attributes the Role could be present in the instance as an empty XML element. The CDA model doesn't contain any such requirements; a particular exotic CDA template could have this type of requirement as well, resulting in an empty XML element. The statement by the authors of this white paper that "no v3 instance SHALL contain an empty XML element" leads to a lot less "false positives" and a very small number of "false negatives".

Software Implementation of CDA

Contents

Summary

Introduction

MIF and XML schema

Software development approaches

CDA implementation using XML techniques

Model driven CDA implementation

Green CDA

Persistence

Processing

Summary and Recommendations

Footnotes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

groups

meetings

general

Tools