Testing SMD for HL7 Use - Beeler
- 1 Introduction
- 2 Issues Summary
- 2.1 Failure to "Round-Trip" MIF Content
- 2.2 Inability of SMD to accept and to "persist" new constraint content
- 2.3 The time to re-diagram existing content
- 2.4 Unexplained conversion errors
- 3 Test Notes
This is a summary of findings encountered while testing the SMD Release 1.1 for use in HL7. Tests were run in late October, 2010. The basic test plan was previously outlined on the Wiki.
To assure a clean test, I installed all of the elements listed: newer Java JRE, MozillaXULRunner, and SMD_1.1RC_201009221500-win32.win32.x86.zip. The Java JRE updated my existing Java (had been 1.6.0_18), but the Mozilla and SMD packages were installed on the D: partition of my hard drive. This caused no apparent issues.
Downloaded the "org.openhealthtools.hl7.smd.2010-may-ballot-test" package and ran pre-tests to be sure things operated as expected. (Some erors encountered here are summarized below.)
The following is a summary of the issues reported below. It focuses on two issues that are paramount to the adoption of this tool for use in HL7 International, one that is very worrisome, and nuisance issues:
- failure to "round-trip" MIF content in many cases;
- inability to accept and "persist" new or even incorrect constraint content;
- time to re-diagram existing content coupled with a somewhat clumsy (COTS) interface for class definition; and
- unexplained conversion errors coupled with major difficulty adding new MIF files to project.
These are discussed in the following sub-sections.
Failure to "Round-Trip" MIF Content
From the inception of the HL7 Model Interchange Format (MIF), a core tenet of the Tooling Work Group is that any tool that "consumes" a MIF file and outputs an updated MIF file must be able to return the original content unchanged in the new MIF file. That is, the "static model design content" that is imported by a tool from a MIF file and is returned as an "exported" MIF file must be unaltered, except for intentional changes made by the users of that tool.
Unfortunately, as currently presented, SMD fails in this regard owing to a combination of errors and design decisions. This is the easiest possible test to run, and it is surprising that the MIF export was provided with no apparent attempt to reconcile issues of content loss.
Content Loss Through Errors
As noted below, the suite of files used for this test had NO MIF validation errors as tested both by the SMD and by validation against the MIF schemas.
The simplest example is to transform PRPA_MT402001UV02.mif (Active Inpatient Encounter) to the internal representation. This file has "line-item" documentation of all classes and attributes painstakingly added by Gregg Seppala begin_of_the_skype_highlighting end_of_the_skype_highlighting. None of these "definitions" appears in the internal SMD format (xhl7 files). Needless to say, it is missing therefore when the MIF is re-exported. Lloyd McKenzie has documented numerous places where valid annotations in the source MIF content are dropped, not always in a repeatable fashion.
Content Loss Through Design
The SMD relies on the availability of a complete set of content files, both the "coremif" files that provide model design and validation constraints, as well as the related designs that are used to link models together such as by CMET-reference and derivation.
The problem arises in that a "reference" within Model-A to an element of Model-B is only documented as an "id" link to the element in Model-B, and this reference cannot be established unless Model-B is also in the same "project" in the tool and the element being referred to is present in that model. This is apparently because the reference link (a form of GUID) is generated by an algorithm that appears to require the presence of the target at the time it is established, since the "id" would appear to have been generated when the target was converted.
In other HL7 tools, these references were made using a reference notation that can be uniquely resolved if and when Model-B is present, but that persists as a valid reference in the absence of a file for Model-B.
- In MIF the derivation relationship for the "classCode" of a clone derived from the class "Act" in the rim is stated as:
<mif:derivedFrom staticModelDerivationId="1" className="Act" attributeName="classCode"/>
- Where the staticModelDerivationId of "1" was previously linked to a particular RIM
- InSMD the internal expression of this derivation relationship is stated as:
<target xmi:type="static:HL7Attribute" href="coreModels/rim.xhl7#__MW5Z-EaEd-OHvnJ_3ZNdw"/>
- Note that this references a "local" RIM file (coreModels/rim.xhl7) and the internal GUID for Act.classCode
Consequences of this design paradigm
Some of the consequences of this design include:
- HL7 MIF files for "message types" have referenced derivation relationships from: the RIM, an RMIM, and an HMD. In practice, MIF files have never been generated for RMIMs, and in the case of most current designs, the HMD MIF is a redundant artifact. When a package of MT and HD files are transformed into SMD, the "derivation" link to the RMIM file cannot be resolved, and the information about the identification of this RMIM is simply dropped. When this is re-exported, an invalid MIF representation is created owing to the absence of data to populate the reference fields.
- Similarly, vocabulary constraints that are not currently in the Vocabulary MIF file are also simply dropped because there is nowhere to link them. These may be valid references to code systems and codes not maintained by HL7, but no record of the binding syntax is preserved as the MIF is transformed to "xhl7" and therefore is also not re-exported to MIF.
Likely Fragility of the SMD Tool
Limited testing suggests that this whole reference scheme is vulnerable to the accidental loss (deletion or corruption) of a single file. To run the "test", I:
- I have two projects ("A" and "B") both of which had imported the same RIM MIF file.
- I copied one rim.xhl7 file from Project-A to Project-B, thereby overwriting the rim.xhl7 file for Project-B.
- When I open a diagram from Project-B, it appears normal, but when I try to drop a new "class" onto the diagram, nothing happens. (Designer is broken).
- When I try to export one of the xhl7 static models to MIF 2.1.6, the export appears to run OK, but the actual MIF file content is mostly "empty" of meaningful data.
This is not a good circumstance. With a "complete" project involving upwards of 1000 files, having the integrity of all dependent on the presence of a few is "chancy." At minimum, the "coremif" files in "xhl7" format should be designated as "read only" so that casual overwriting or deletion cannot occur.
Implications appear to include:
- Risk of losing an entire project if one of the "coremif" models is removed.
- It appears that every component must be imported under the auspices of the project in which it is used. That is one cannot move e-core XMI (xhl7) files between one project and another.
- When a new RIM or Vocabulary is released (at least six times a year in HL7), EVERY project will need to be saved and re-assembled with the new coremif content.
Inability of SMD to accept and to "persist" new constraint content
HL7 International static model development is a widely distributed, loosely coordinated activity. It depends on reliable tools to "enforce" the methodology without requiring a strict "top-down" design paradigm. That is individual modelers must have the freedom to define new constraints that are not yet part of the Vocabulary, Data types or RIM specifications. The largest set of such "new" constraints are the creation of "proposed" Concept Domains, Concepts or Value Sets.
The present set of HL7-provided tools "locks" the modeler to the RIM class and attribute structure, but allows the expression of new constraints for Vocabulary, data type flavors, alternate attribute cardinality etc. These "new" constraints will show up as validation errors, but the expresion of them will be documentd in the tool and represented in the MIF files for the designs.
Unfortunately, the SMD tool provides no means to document these constraints, and it treats them as Null when a MIF source files is transformed for SMD use. While one might edit the Vocabulary file to add these concepts, it would no longer recognize them as invalid representations, and would be both a burden and risk corrupting other elements of the model.
My conclusion is that some means for documenting these bindings using the textual Static Model BNF Grammar agreed to by MnM earlier this year must be provided and the string constraint expression must be persisted by the design tool and then used to express the constraint when the design is re-expressed as MIF. Rendering these elements as Null fields is not acceptable.
The time to re-diagram existing content
The SMD has no utility to mimic a previous layout when converting a file from MIF, and graphical "round-tripping" appears for in the future.
- [Note: Without the ability to import graphical data, it would appear that each static model designed with the SMD can only ever be used within the Eclipse project in which it was first defined. There does not even appear to be any capability to "migrate" a design (in xhl7 format) from one RIM release to the next!!]
In that circumstance laying out the static models in new projects or with a new RIM release becomes a labor-intensive task. In my testing, it took 2-3 hours to lay out a large HL7 static model, even when I used the previous layout as a visual pattern.
Unexplained conversion errors
While converting from MIF to SMD I rtan into several "fatal errors" for which critical data was missing (such as what file was being processed when the conversion stopped! Moreover, in my initial experience, the ability to "transform" new MIF content was significantly (20-fold) slower than I saw with the wizard. In HL7, each facilitator must be able to create her or his own projects. Errors such as these only serve to reduce one's trust in the product.
What is almost as bad, the process of semantic validation (as distinct from schema validation of the MIF source) leads to long lists of errors and warnings that are not "real." In my sample suite of 6 static models, all passed MIF schema validation and yet produced:
- 2,184 ERROR or WARNING listings of which:
- 1,963 are "Derivation mapping errors" due to the absence of RMIM files (see #Content_Loss_Through_Design above)
- 194 are for CMETs listed in IFC but for which no model is provided ("incomplete" project)
- 18 are "non-errors" logged against the Vocabulary MIF file, and
- 9 are warnings about vocabulary constraints that are not in the Vocabulary MIF file.
The final nine, one-half of one percent, is the only set of real interest in terms of model quality.
Although I used the material provided on OHT for my first test, my primary test suite was a set of material drawn directly from Normative Edition 2010. This included:
- DEFN=UV=DT=1.0.coremif (Data types R1)
- DEFN=UV=IFC=1.9.3.coremif (Cmet list)
- DEFN=UV=RIM=0229R1.coremif (RIM using DT R1)
- DEFN=UV=VO=913-20091020.coremif (Vocabulary)
- Static Models:
- COCT_MT030000UV09.mif (E_LivingSubject universal )
- COCT_MT050000UV01.mif (R_Patient universal )
- COCT_MT530000UV.mif (A_SupportingClinicalStatement universal )
- PORX_MT010120UV01.mif (Medication Order)
- PRPA_MT202301UV02.mif (Service Delivery Location Activate )
- PRPA_MT402001UV02.mif (Active Inpatient Encounter )
These were selected to be relatively complete AND complex and to have available renderings in the Normative Edition for comparison. The test package is on Gforge.
Pre-run file verification
The initial pass showed a single MIF2 validation error in these files. (A CMET name was too long in DEFN=UV=IFC=1.9.3.coremif.) This error has been corrected in the test suite in order that there be no source validation errors.
File preparation - Import to SMD from MIF
The first time I executed this process I used the entire content of Normative Edition 2010. This is about 450 static models. (I did not include the "HD" files, only the "MT" files.) The process revealed that many of the MIF files did not pass schema validation, but I elected to fire off the new Project Wizard concversion anyway.
The converter percolated along just fine, and then stopped with this following error box.
This message is useless as it gives NO CLUE as to which file it was working on when it died.
The last Listed validation results (in the console display) were for PORR_MT049011UV01.mif and that mif has a correctly formed xhl7 file to go with it. The next in sequence (and therefore the culprit) is PORR_MT049012UV01.mif.
I used the right-click "MIF to HL7 Import" to try to transform the suspect model (PORR_MT049012UV01.mif) and got a much more useful and explicit error, as shown below. This is what probably should have appeared instead of the error above, BUT even this STILL lacks any identification of the rogue model.
At this point I was forced to use the right-click "MIF to HL7 Import" and tried transforming five files: PORR_*150101, 020001, 030001, 090001, and 090002. These appeared to run fine until I got the following Error.
In this case, it is apparently a bogus error as I cannot see that ANY of the seven files failed to transform. NOTE the transform runs TWENTY TIMES SLOWER in this mode than when using the new project wizard to convert a bulk set of models.
- (Perhaps regrettably, I did NOT close the SMD after the first error, and re-open it. Lloyd McKenzie notes that a lot of errors went away if he restarted the SMD after EVERY error message. When I repeated this process a day later after just opening the SMD, the conversion speed seemed similar to the wizartd, and the terminating error message did not appear.)
- (Note: Once the MIF-XML Transformation is started using using the right-click "MIF to HL7 Import" even on a single file, you cannot cancel it. It plods along for several minutes before giving up.)
On my initial pass, however, I set up a new dummy project to convert the final set of models this ran quickly, and I copied the remaining models into the old project. It was 24 hours later that I discovered that this was a useless step as the files were not bound to the RIM and Vocabulary content in the original project!
Diagram layout is a daunting task. given that the initial layout does not produce a satisfactory result. I used visual reference to the previous Visio models in order to determine where to block the content and which elements belonged together. (Note that this task could be done in 1/10th the time if only the initial layout could be automated based on the location of the "center" of each class in the old diagrams.)
The layout tools are accurate and relatively easy to learn. The "outline" view (thumb-nail overview) is extremely useful for this task. I encountered no errors or "clumsiness" in layout, but the process required 2 or more hours for each complex diagram.
For an example, see the following two thumbnails for a Clinical Statement CMET - the original Visio and the transformed one from the Static Model Designer. (Clicking on the thumbnail will show the full image.)
Dealing with specific initial errors: Invalid vocabulary constraints
My initial activity focused on layout (above) and the correction of invalid vocabulary constraints. My first shock came when I saw that the "invalid" vocabulary constraints showed up as NULL. This should not be!!! Not only is the "proposed" value lost in the process, but the user has to go back and ferret out the original intent from the MIF file. (Which is actually easier to do in MIF than in "xhl7". :-) )
I spent a while "banging on" the attribute editing window until I was convinced that there was no way to enter a "free-hand" binding statement.
I did discover the "Code" column in the attribute editor which provides a place to enter a "Code" and "CodeSystem" value pair. I considered using this with code systems named "CD" for new concept domain names (in the code filed), "VS" for new value set names, and new "codes" in existing code systems for new concepts. Although clumsy, this could be made to work, execpt for one fatal flaw:
- The CodeSystem/Code pair are persisted in the SMD static model and "xhl7" file, but they are discarded when the "xhl7" file is returned to MIF.
Thus the content is discarded on the route from MIF to SMD, and, if re-entered this way, it is discarded again on the way back out. A double-whammy!
File reversion to MIF - Export from SMD to MIF
This process is easy, and quick. This test took less than 10 minutes to create, but HOURS to analyze and understand. The next test was to validate the MIF, but this failed owing to the treatment of the derivation relationships.
The more complete and useful test was simply top compare the MIF file that started with process with the one that came out of the process. Two files - PRPA_MT402001UV02-RT.mif and PRPA_MT402001UV02.mif - have been posted in a ZIP archive on Gforge. They show the result of round-tripping (former) a PA MIF file (latter) through the SMD. Simple file comparison with XML SPY reveals the variety of changes cited earlier.