Design Principles for Alignment, Review and Constraint of V3 Publishing Content

From HL7Wiki
Jump to navigation Jump to search

Background

The HL7 V3 design process is predicated on three, inherently conflicting objectives -

  1. To allow individual Work Groups to focus on the material or subject matter with which they are experts;
  2. To provide coherent standards derived from singular high-level models for RIM, Vocabulary and Data Types;
  3. To re-use common specifications in order to reduce development effort increase efficiency; and
  4. To provide parallel development in order the provide standards that are respnosive to the evolving needs of the implementers.

Over the years, processes to support these objectives have been developed, including:

  • Harmonization to provide singular models for RIM, Vocabulary and Data types;
  • Definition of shared common structures for CDA, CMETs and selected message types; and
  • Tools to support the parallel development within a Work Group of both static model designs and the documentation of the domain framework in which those designs are used.

Where this all comes together is in "publishing." Four times a year (three ballots and a Normative Edition), the HL7 Director of Technical Publications receives material from myriad Work Groups that must be gathered, analyzed, corrected, and packaged to produce either a reliable ballot, or a formal Normative Edition. This must be done, in collaboration with the facilitators who make up the V3 Publishing Work Group, in a matter of a few weeks, starting with the initial content dead lines, and ending when the ballot opens.

This document attempts to lay out the rules and principles under which this activity proceeds, with a goal to "automating" as much of the process as possible.

Content Sources and Source Types

As noted, the content providers are the work groups, of which there may be many, but the material they provide and the types of content provided differ, and the treatment of this content may differ depending upon whether or not its primary intended use as a "common" shared content, or an implementable package.

The table at the end of this section (which is followed by a legend for the table) displays for each different grouping of submissions, whether those submissions primarily are for common use, the form(s) that the submissions are provided in, and the dependencies that these submissions have upon other submissions. (This table ignores "graphics" used to augment textual documentation in the submissions.)

For a typical ballot in May-2010, V3 Publishing processed:

  • 35 PubDbs with
    • 6 for common content and
    • 29 for general domains
  • 397 StaticModels (all but 17 of them as VisioXml) with
    • 233 for common content and
    • 164 for general domains
  • 50 non-domain specifications (of which roughly 30 were in PubXml)
  • 6 MIF or PubXml files for common content (RIM, Vocab, etc.)

Need less to say, the analysis and management of this content must be (and is) supported by automated analysis and needs more a linkages between the analysis and automated processing steps. After processing, this source came together to produce a web-site with about 17,000 files. This "explosion" of source is somewhat controlled. For example a single static model will result in 16 separate files as: 4 Mif files (MT and HMD in each of mif 1.1 and Mif 2.x), 3 graphics-related (overlay, thumb-nail and large), 2 table-views (MT and HMD), 1 Excel view, 4 schema files (HMD and MT in both Xsd and HTMl), and (on average) results in 2 interaction schema filess (xsd and html).

Submission Common? Source Forms Dependencies
Primary Design Supplemental RIM/Vocab/DT Wrappers CMETs CommonMsgs
RIM and DTs Y RimXml or PubXml Y
Vocabulary Y MIF Y
CMETs Y PubDb StaticModel PubDb Y Y
Wrappers Y PubDb StaticModel PubDb Y Y Y
CommonMsgs Y PubDb StaticModel PubDb Y Y Y Y
Domains PubDb StaticModel PubDb or PDF Y Y Y Y
Non-Domain Specs MIF or PubXml or PDF PDF ?

A legend for the above table follows:

  • CMETs - Common Model Element Types
  • Common - Material whose primary intent is to provide re-usable specifications for use in defining RIM-derived static model or for use in assembling "Domain" content.
  • CommonMsgs - Shared messages and interactions, primarily for acknowledgment transmittals
  • Domains - Every content specification that is not intended primarily as "Common", that includes RIM-derived, implementable static models, and that relies on a defined behavioral framework.
  • DTs - Data Types Specifications
  • MIF - File in Model Interchange Format (replacing PubXml and VisioXml formats
  • Non-DomainSpecs - Content specifications that are not intended primarily as "Common content" and that are not "Domains"
  • PDF - Page Description Format files for document publication.
  • Primary - This column under Source Forms lists the primary content format or a specification.
  • PubDb - Publication data base, which is expressed as PubXml
  • PubXml - File in HL7's XML document type definition defined in 2000.
  • RIM - Reference Information Model
  • RimXml - File in HL7's XML document type definition for RIM defined in 2000.
  • StaticModel - Form for expressing static models. In 2010 this is principally VisioXML and Visio "vsd" files. In future will be MIF with both graphics and design content.
  • VisioXML - File in HL7's XML type used to represent the design content of a static model defined in Hl7's RMIM Designer in Visio.
  • Wrappers - Transmission, Query, and Control Act Wrappers

Content Alignments

The critical step in content alignment is to assure references from one component to another connect correctly. This is principally a task of assuring that the identifier assigned an element as it is defined is the same as the identifier asserted in any references to that element. The critical kinds of identifier are listed in the table below along with the designation of the source document that acts as the "source of truth" for that identifier, the likely cause(s) of mis-alignment that may occur, and how alignment is effected.

Identifier "Source of Truth" Causes of Mis-Alignment How aligned?
All Artifact IDs in a PubDb Are governed by the HL7_artfactBallotStatus table in the PubDb. Responsibility for these rests with the Publishing Facilitator. Failure to maintain the current status in the "HL7_artfactBallotStatus" table. Automatically. The IDs are assigned as the content is extracted from the PubDb into PubXml format.
Static Model - BaseID SHARED between the PubDb in which it is defined and the static model design document. These sources are developed independently in PubDb and Visio and either of two different authors may err. These MUST be aligned by the facilitator for proper publishing. Accomplished by manually editing one or both sources.
Static Model - Version The PubDb in which the static model is defined rules on this version portion of the static model designer. This is determined by the ballot status that is documented in the "master" PubDB. The version on the static models is automatically updated to match the data from the defining PubDb.
CMET Name CmetInfoExport.txt is the master file of CMET Name/Identifier coordination . The file is maintained by MnM and incorporated as generator input. CMETs are bound to their using models by name, drawing data from the CmetInfo.txt files. However, tooling allows manual entry which may not be represented in CmetInfo. The Static Model Name is forced to align with CmetInfo , and all CMET references in other static models "look up" the correct name from CmetInfo and use that. Neither the name in the defining PubDb nor in the source design rules, but both SHOULD agree with CmetInfo.
RIM Content The rim.xml file placed in CommonSourceFiles sub-directory of Generator/InputFiles determines both the RIM version to be used and which data types release (determined by the binding in rim.xml) to use in Generation. As presently established, the only way to make an error is to place the wrong rim.xml file in the Generator. This should be changed in future. There should be a Generator parameter(s) that determines RIM and DT versions and then selects the appropiate RIM.xml file to use for generation. Any errant reference must be fixed manually. All references are to codes or names that are guaranteed to be unique within their name space.
Vocabulary and Data Type Content The "CoreMif" file for the vocabulary and PubXml for Data Types. Erroneous manual entry in a PubDb or static model design. Any errant reference must be fixed manually. All references are to codes or names that are guaranteed to be unique within their name space.
Graphic File Names The reference should be to the name of the graphic file itself Erroneous manual entry in a PubDb or static model design. Any errant reference must be fixed manually. The facilitator must assure uniqueness of these file names within their domain.

Review And Correction of Alignment Issues

The detection and reporting of content alignment issues is performed by the Quality Assurance Preview process that is the first step in all Generator runs and that can be invoked using publishing target 03.10...Process_Initial_QA.

The Publishing process first assures that all PubXml, MIF, and Static Model VisioXml are positioned in the InputFiles directories of the Generator. Next it "kicks off" the Generator "Preview Reference Integrity".

Assemble Ballot Cross-reference File

The first set of steps performed under "Preview Reference Integrity" are processes that build a single "sress-reference" file of all content submitted for publication in the Ballot. The file, MessageReferences.xml, is created in following sequence:

  • (as part of ScenarioSetup)
    • Working from InputFiles/CommonSourceFiles it analyzes rim.xml and vocabulary.mif to determine:
      • the rimVersion to be used from the generator (the release or version of rim.xml);
      • the rimDatatypeVersion to be used (determined by the data types used in rim.xml); and
      • the vocabRealm and vocabUVVersion identifier (determined from vocabulary.mif)
      (Although these can be over-ridden as other targets are invoked, it is likely to lead to an invalid rim.coremif in which the declarted data types binding may disagree with the actual data types used in the RIM.)
    • Loads the config.txt properties and prepares all directories
  • (as part of BuildModelDefinitionReferenceSource)
    • Analyzes the individual content of all domain-created PubXml files to produce "statusPackage" files that hold the definition (type, name, and combined Id) of artifacts - RMIMs, HMDs, MTs and interactions (for their references to other MTs.
      [Suspect that this should be extended to include ALL artifacts - Story Boards, Application Roles, Trigger Events, the references from interactions to TEs and ARs, and the references from interaction/receiverResponsibility to other, potentially external TEs and interactions.]
    • Assembles (concatenates) the status packages into "intermediateDynamicModelReferences.xml" file that holds all of the PubDb definitions.
  • (as part of BuildApprovalStatusPackage) done but not used
  • (as part of BuildCMETReferenceSource)
    • Builds CmetInfo.xml from CmetInfoExport.txt
    • Assembles a Cmet "statusPackage" to provide reference access to the CMET listings
  • (as part of BuildNonModelReferenceSource)
    • Analyzes the individual content of all non-domain PubXml files to produce "statusPackage" files that references these files use as graphic references spec-references and URL links.
  • (as part of BuildMessageReferenceSources)
    • Assembles a "statusPackage" for each staticModelPackage submitted listing:
      • the model' that they are the design for;
      • all CMET references that they contain;
      • every vocabulary reference used that is unique within that model
      • fatal serialization errors, principally associations that are not blocked in one direction (done only for VisioXml models); and
      • anticipated MIF validation errors, principally invalid class and association names or names that are too long (done only for VisioXml models).
      In this process, the "intermediateDynamicModelReferences" and "CMET status package are used when analyzing VisioXml models in order to correct the reference version numbers and to correct the CMET name references in the message statusPackages.. This correction is not applied to the analysis of designs submitted as HMD files, but could be.
  • Assembles a mimimal vocabulary file (for speed of access) that contains only the elements that can be referenced from models - Concept Domains (by name), Concepts (by CodeSystem Name and Code), and Value Sets (by name).
  • Concatenates all of the "statusPackage" (dynamic, non-domain, cmet, and static model) files into a single MessageReferences.xml file. This file is the foundation for detailed error analysis, alignment as source files are converted to MIF, and quality assurance reporting.

Note: When the Generator converts all static models in CMETandWrapperHMDs and VisioModelXmlFiles to MIF format, and.or converts pubXml to MIF (for the dynamic model), the processes invoked use MessageReferences as follows:

  • VisioToMif.xslt uses both packageApprovalStatus.mif and IntermediateDynamicModelReferences.xml to convert the "version" element of the identifiers of the static model itself and of all its CMET references to agree with the content of the "pubDb" definitions.
  • pubDbXmlToMif.xslt uses MessageReferences.xml to convert the "version" element of the identifier references to static models in the interactions (transmission wrapper, control act wrapper, pay load, and query) to agree with their PubDb definitions.

Analysis of Ballot Cross-Reference File

The analysis of the Ballot reference data proceeds in two stages. The first of these is contained within the V3 Generator, and functions in support of the Generator independently of the publishing process. The second step provides a more comprehensive analysis of the data from the generator and is an element of the publishing process. The following sub-sections document each of these.

Step 1: Generator Analysis of MessageReferences File

Thus the MessageReferences.xml file represents all of the content that has been submitted for generation in a single package that can be analyzed for errors, lost dependencies, etc. The analysis steps applied to this include (not in the order executed):

  • Create visioHref.ini and msgToHmdRefs.xml files that supports Visio in producing html graphic overlays for use with the few older models still maintained as "hmd" files.
  • Creates a "global" CMET usage file (Reports/CMET-Usage.xml) that is used by publishing to list all CMEts used and/or defined in a given domain.
  • Creates MessageReferenceTestResults.xml. This file is a detailed analysis of the errors exposed from the Generator perspective. It includes findings on: duplicate static models, non-serializable models, CMETs (missing references, unlisted, etc.), static models not in a PubDb and vice-versa, any unsatisfied message reference (such as from an interaction), MIF validity errors (mostly name anomalies), vocabulary bindings that are not satisfied in vocabulary MIF, and binding failures. This transform (AnalyzeConcatenatedMsgRefSource.xsl) uses the MessageReferences.xml as its primary source, but also uses the CommonSourceFile rim.xml and the previously generated "minimal vocabulary" files as reference sources.
  • Creates console summary of contents of MessageReferenceTestResults.xml.
  • Builds a Proposed Replacement for cmetInfoExport

Builds a Proposed Replacement for cmetInfoExport

The same transform that generates the CMET-Usage.xml file (see above) also computes a proposed replacement file for cmetInfoExport.txt. This proposed file is not incorporated into the Generator, unless the user specifically uses it to replace the cmetInforExport.txt in CommonSourceFiles. (The option to allow a property in configuration.txt to make the swap is worth considering, but if this happened at this point in the transformations, other, previously generated content would need to be regenerated. The key goal of this replacement file is to remove content from cmetInfoExport.txt that is no used at all, and to add content that should be there owing to a CMET reference in a static model that does not come close to matching an entry.

The primary issue is to determine the correct name/attributionLevel combination for this listing. The process proceeds as:

  • Sources: MessageReferences.xml (as primary); CMETrefSource.xml - a proper xml rendition of cmetInfo (as parameter); and CmetNameOverRides.xml - a CommonSourceFile to enforce name changes on previously published CMETs, mostly from non-COCT domains (as parameter).
  • Assembles a node set of mergedSources - message designs that should be listed based either on their being referenced in other designs or being fully listed in COCT domain. The set contains:
    • All message designs (msgElement[@elemType='msg']) where:
      • the message design has been referenced as a CMET in another design or
      • the message design is defined in CMET pubDB but not referenced as a CMET in another design
    • (and) All CMETs defined in CMET PubDb (msgElement[@elemType='msgdef']) and referenced as a CMET in another design, BUT for which there is no message design (msgElement[@elemType='msg'])
  • Assembles a node set theResult of <cmet/> items, one for each entry in mergedSources with most attributes as in the mergedSources entry, but with the name and attribution assembled as follows:
    • Sources:
      • definitionName - @name from the matching msgElement[@elemType='msgdef'] (the PubDb entry)
      • designName - @name from the static model design (from visioXml)
      • starter name - from the matching entry in the previous cmetInfo (CMETrefSource.xml) file
    • Chooses selectedName from following choose/when structure
      • use @name|@attributionLevel from starter name, when starter name exists.
      • use designName when designName contains a space (presumed to be attribution separator)
      • use designName when designName contains underscore (_) as second character
      • use definitionName when definitionName contains a space (presumed to be attribution separator)
      • use definitionName when definitionName contains underscore (_) as second character
      • use designName when it exists
      • use definitionName when it exists
      • use nothing otherwise
        Note: I question the preference (above) for design name (from the VisioXML file) over the definition name (from the PubDb. I would have thought the latter was more reliable, but I made the choice, so ... GWBeeler
    • constructs preferredName from selectedName as: selectedName, unless ((baseId is not from COCT) and (selectedName does not contain either a "|" or a "_")) in which case it prepends the first character of rimRoot (root class of the design) and an underscore (_) (like "R_") to selectedName.
    • finally, it constructs splitName (name|attributionLevel) from preferredName from following choose/when structure
        • when preferred name contains a pipe (|) use preferredName .
        • when an entry exists in CmetNameOverRides.xml for the given base identifier use name from CmetNameOverRides.xml .
        • when preferred name contains a space, use preferredName with first space replaced by "|", and everything after the split - (a) changed to lower case, (b) hyphen converted to "/" and (c) parentheses removed.
        • otherwise use preferredName with a pipe (|) appended.
    • There is also logic to assign a correct Role type for R_ CMETs (defaults to "performed only") and to determine the root classCode from the design or the previous cmetInfo, otherwise will default to the root code for the class (like "ACT" for Act).

Step 2: Publishing Detailed Quality Analysis and Manifest Generation

This section documents the remaining Analysis steps undertaken as part of 03...V3 PubProcess - Align, Check, Revise and Constrain Source Content.

Build Manifests (in Publishing)

The first step is to build a pair of manifest files that can subsequently be used by Publioshing to "package" domain content for submission, but also can be used (with additional processing) to constrain the content submitted to the Generator or, arguably, could be the basis for selective generation.

Arguably, this step should, perhaps, be part of the Generator tools, rather than part of Publishing tools. The rational for this argument is that the only source data comes from the Generator-provided MessageReferences.xml, and it stores its results files in the Generator OutputFiles/Reports and places temporary material in TemporaryFiles/specificManifests. On the other hand, a key feature of this functionality is the ability to test for the existence of source files in the "publishing" input directories. For now, will leave it on the publishing side.

The BuildManifests uses OutputFiles/Reports/MessageReferences.xml and a set of controlling parameters, including the names of its output files and their target directories, and two optional comma-delimited lists of domain codes (like "pa,ct") and non-domain codes (like "rim,vocabulary,its_iso_r2").

In sequence, it analyzes the MessageReference entries (from domain or non-domain pubXML) for every domain or non-domain in the two lists. (If the lists are empty, all of the entries in MessageReferences of the given type are analyzed.) For each such specification, it creates a <manifest/> entry with:

  • attributes for the name (like "uvab"); the directory name (also "uvab") and the directory parent (like "domains")
  • one <file/> element for each of the following:
    • "manifest" file (to be stored)
    • "specification" file (like "uvab.xml")
    • possible "databases" files (like "databases/*_PB*.mdb") (Note use of wild cards)
    • expected static "models" files (like both "sourcegraphics/POCG_*000011*.vsd" and "sourcegraphics/POCG_*000011*")
    • expected "other" file (like "otherdistribution/Final Clinical-Genomics-DAM.zip")
    • each expected graphic file (like "outputgraphics/ExpressionProperty.code.gif")
    • attributes of the <file/> element include:
      • type like "manifest", "models", "graphics", etc.
      • file which is the path string to the file relative to the directory name in the manifest element, with wild cards
      • fileExists a Boolean indicating where the manifest builder was able to identify a candidate file that matches this entry. This may be "false" even when the file is present, because the transform cannot try all possible combinations of the wild cards.
      • note an optional attribute that conveys information from the transform like "Existence of database file untested."

Create Ballot Quality Analysis Report (in Publishing)

The final analytic step is to create the Ballot Quality Analysis Report (saved as output/welcome/issues/ballotqareview'htm using the ballotqareview target. This analysis uses the following sources: 'MessageReferences.xml, MessageReferenceTestResults.xml, manifestData.xml, cmetinfo.coremif, and MIFMinimalVocabulary.coremif - all copied to the Publishing directories from the Generator. Its documentation and sequencing is provided by the "source file ballotqareview-publication.mif.

This report is best interpreted form the documentation that is part of the report. In brief, it documents errors (or suspected errors) involving:

  • Vocabulary Reference Errors;
  • Dynamic Model (interaction) Binding Errors;
  • CMET Reference and Listing Errors;
  • Static Model Design and Definition Errors, including:
    • Static models that cannot be serialized
    • Designs with no PubDb definition (or vice-versa),
    • Models with Invalid names for MIF representation and
    • Duplicate models (with two or more design files);
  • Apparently missing "graphic" or "other" content files that are linked from the specification; and
  • Hyper-links from the specification that cannot be automatically verified.

Correcting Errors found in Ballot Quality Analysis Report

The Ballot Quality Analysis Report documents "what" was found that is an error, but does not document the steps needed to "correct" those errors. A "guide" to correcting these errors will be provided in the future.

Constraining Published Content Based on Content Review

Proposed Strategy

These preceding analysis of the publication or ballot content had an initial purpose of allowing the editors and designers to correct the errors found. However, it also can be used to constrain the content being published in one (or both) of two ways:

  1. To remove unused content from the package. If a specification has extra graphics or static model designs that are not defined or referenced in the specification itself, then these should be "dropped" from the Generation and Publication, as they are simply useless "noise" in the publication.
  2. In the circumstance where one is publishing a single, or small set of domain content, the analysis can be used to remove all of the "common" content (CMETs, wrappers, common messages) that is not used by the domain content to be published. (Conceivably, this pruning could be extended to include all concept domains, value sets and code systems that are note referenced directly in any of" the RIM, data types, or the specific designs needed to satisfy the domain content.

The planned strategy to undertake such pruning is to take advantage of the references file and the manifests file. The manifests were designed to be used to define a "file set" in ANT that will package (zip) a particular domain. With that in mind,k these manifests can be used as follows:

  • Use the manifest created for each domain to "file set" of that domain's content that will be moved to the "generator" input directories, and also that will be moved as support material to the "output" directories. This will take care of the desire to "remove unused content from the package".
  • Use the reference analysis to determine for all of the non-common domain content, which common domain message types are being used. (This must include an analysis of CMETs used by CMETs that are referenced in the non-common domains, and well as the CMETs needed to support the common messages (wrappers) that are used by the non-common domains.) This analysis will allow one to further "prune" the manifest files for the "common" domains. The "pruned" manifests for the "common" domains can then be used to:
    • define the "file set" for that portion of the "common" domains' content that will be moved to the "generator" input directories, and that will be moved as support material to the "output" directories; and
    • to prune the unreferenced material from the "pubXml" from the "common" domains when it is preprocessed for publication. (The content for each "domain" is converted to a temporary qaEnhancedDomain.xml file before it is published. This conversion is done with transform v3mAddQaToDomain.xsl which could be augmented to perform the pruning.) This pruning avoids publishing content for the "common" domains that references message designs, etc. that has been dropped because it is "unused" in the non-common domains.

Note: A method for pruning the domain content was implemented in the early release of the "combined" Publishing tool. The earlier method involves "zipping" all unused domain content to an "unused" archive, thereby hiding it from use in Generation or Publication. Yet another alternative for the "Generator" is to include the "pruning" in the Generator build logic. The method proposed in this sub-section has the following advantages and disadvantages when compared to the previously considered methods:

  • Advantage: it is easier to manage and "safer" than the archiving process because it requires no change on the content of the "input" (publishing source) directories;
  • Disadvantage: It requires two "preliminary" (PreviewReferenceIntegrity) Generator runs - one to establish the manifests, and a second to re-generate using the manifests.
  • Advantage: it is isolates the "source" of the manifests to a single pass, and these can be re-used for subsequent passes so long as the source domain does not need "additional" content.
  • Advantage: it allows one to publish a special-purpose package for a single domain, something that is needed for DSTUs and for submission to ISO.