This wiki has undergone a migration to Confluence found Here
<meta name="googlebot" content="noindex">

Difference between revisions of "Design Principles for Alignment, Review and Constraint of V3 Publishing Content"

From HL7Wiki
Jump to navigation Jump to search
Line 174: Line 174:
  
 
===Step 2: Publishing Detailed Quality Analysis and Manifest Generation===
 
===Step 2: Publishing Detailed Quality Analysis and Manifest Generation===
'''Note:''' The first step requires a further CMET refinement in the Generator which step includes converting all static models in '''''CMETandWrapperHMDs''''' and '''''VisioModelXmlFiles''''' to mif1 (initial conversion). In these processes:
+
This section documents the remaining Analysis steps undertaken as part of [[V3_PubProcess_-_Align%2C_Check%2C_Revise_and_Constrain_Source_Content|03...V3 PubProcess - Align, Check, Revise and Constrain Source Content]]. 
 +
 
 +
====Build Manifests (in Publishing)====
 +
The first step is to build a pair of manifest files that can subsequently be used by Publioshing to "package" domain content for submission, but also can be used (with additional processing) to constrain the content submitted to the Generator or, arguably, could be the basis for selective generation.
 +
 
 +
'''Arguably, this step should, perhaps, be part of the Generator tools, rather than part of Publishing tools.''' The rational for this argument is that the only source data comes from the Generator-provided MessageReferences.xml, '''and''' it stores its results files in the Generator OutputFiles/Reports and places temporary material in TemporaryFiles/specificManifests.  On the other hand, a key feature of this functionality is the ability to test for the existence of source files in the "publishing" input directories.  For now, will leave it on the publishing side.
 +
 
 +
The '''''BuildManifests''''' uses OutputFiles/Reports/MessageReferences.xml and a set of controlling parameters, including the names of its output files and their target directories, and two optional comma-delimited lists of domain codes (like "pa,ct") and non-domain codes (like "rim,vocabulary,its_iso_r2").
 +
 
 +
In sequence, it analyzes the MessageReference entries (from domain or non-domain pubXML) for every domain or non-domain in the two lists. (If the lists are empty, all of the entries in MessageReferences of the given type are analyzed.)  For each such specification, it creates a '''''<manifest/>''''' entry with:
 +
*attributes for the name (like "uvab"); the directory name (also "uvab") and the directory parent (like "domains")
 +
*one '''''<file/>''''' element for each of the following:
 +
**"manifest" file (to be stored)
 +
**"specification" file (like "uvab.xml")
 +
**possible "databases" files (like "databases/*_PB*.mdb") (Note use of wild cards)
 +
**expected static "models" files (like '''both''' "sourcegraphics/POCG_*000011*.vsd" '''and''' "sourcegraphics/POCG_*000011*")
 +
**expected "other" file (like "otherdistribution/Final Clinical-Genomics-DAM.zip")
 +
**each expected graphic file (like "outputgraphics/ExpressionProperty.code.gif")
 +
**attributes of the '''''<file/> element include:  
 +
***'''''type''''' like "manifest", "models", "graphics", etc.
 +
***'''''file''''' which is the path string to the file relative to the directory name in the manifest element, with wild cards
 +
***'''''fileExists''''' a Boolean indicating where the manifest builder was able to identify a candidate file that matches this entry.  This may be "false" even when the file is present, because the transform cannot try all possible combinations of the wild cards.
 +
***'''''note''''' an optional attribute that conveys information from the transform like "Existence of database file untested."
 +
It starts by initiating a further CMET refinement process in the Generator which step includes converting all static models in '''''CMETandWrapperHMDs''''' and '''''VisioModelXmlFiles''''' to MIF format (initial conversion). In these processes:
 
*VisioToMif.xslt uses '''both''' ''packageApprovalStatus.mif'' and ''IntermediateDynamicModelReferences.xml'' to convert the "version" element of the identifiers of the static model itself and of all its CMET references to agree with the content of the "pubDb" definitions.
 
*VisioToMif.xslt uses '''both''' ''packageApprovalStatus.mif'' and ''IntermediateDynamicModelReferences.xml'' to convert the "version" element of the identifiers of the static model itself and of all its CMET references to agree with the content of the "pubDb" definitions.
 
*pubDbXmlToMif.xslt uses '''''MessageReferences.xml''''' to convert the "version" element of the identifier references to static models in the interactions (transmission wrapper, control act wrapper, pay load, and query) to agree with their PubDb definitions.
 
*pubDbXmlToMif.xslt uses '''''MessageReferences.xml''''' to convert the "version" element of the identifier references to static models in the interactions (transmission wrapper, control act wrapper, pay load, and query) to agree with their PubDb definitions.

Revision as of 22:33, 31 October 2010

Background

The HL7 V3 design process is predicated on three, inherently conflicting objectives -

  1. To allow individual Work Groups to focus on the material or subject matter with which they are experts;
  2. To provide coherent standards derived from singular high-level models for RIM, Vocabulary and Data Types;
  3. To re-use common specifications in order to reduce development effort increase efficiency; and
  4. To provide parallel development in order the provide standards that are respnosive to the evolving needs of the implementers.

Over the years, processes to support these objectives have been developed, including:

  • Harmonization to provide singular models for RIM, Vocabulary and Data types;
  • Definition of shared common structures for CDA, CMETs and selected message types; and
  • Tools to support the parallel development within a Work Group of both static model designs and the documentation of the domain framework in which those designs are used.

Where this all comes together is in "publishing." Four times a year (three ballots and a Normative Edition), the HL7 Director of Technical Publications receives material from myriad Work Groups that must be gathered, analyzed, corrected, and packaged to produce either a reliable ballot, or a formal Normative Edition. This must be done, in collaboration with the facilitators who make up the V3 Publishing Work Group, in a matter of a few weeks, starting with the initial content dead lines, and ending when the ballot opens.

This document attempts to lay out the rules and principles under which this activity proceeds, with a goal to "automating" as much of the process as possible.

Content Sources and Source Types

As noted, the content providers are the work groups, of which there may be many, but the material they provide and the types of content provided differ, and the treatment of this content may differ depending upon whether or not its primary intended use as a "common" shared content, or an implementable package.

The table at the end of this section (which is followed by a legend for the table) displays for each different grouping of submissions, whether those submissions primarily are for common use, the form(s) that the submissions are provided in, and the dependencies that these submissions have upon other submissions. (This table ignores "graphics" used to augment textual documentation in the submissions.)

For a typical ballot in May-2010, V3 Publishing processed:

  • 35 PubDbs with
    • 6 for common content and
    • 29 for general domains
  • 397 StaticModels (all but 17 of them as VisioXml) with
    • 233 for common content and
    • 164 for general domains
  • 50 non-domain specifications (of which roughly 30 were in PubXml)
  • 6 MIF or PubXml files for common content (RIM, Vocab, etc.)

Need less to say, the analysis and management of this content must be (and is) supported by automated analysis and needs more a linkages between the analysis and automated processing steps. After processing, this source came together to produce a web-site with about 17,000 files. This "explosion" of source is somewhat controlled. For example a single static model will result in 16 separate files as: 4 Mif files (MT and HMD in each of mif 1.1 and Mif 2.x), 3 graphics-related (overlay, thumb-nail and large), 2 table-views (MT and HMD), 1 Excel view, 4 schema files (HMD and MT in both Xsd and HTMl), and (on average) results in 2 interaction schema filess (xsd and html).

Submission Common? Source Forms Dependencies
Primary Design Supplemental RIM/Vocab/DT Wrappers CMETs CommonMsgs
RIM and DTs Y RimXml or PubXml Y
Vocabulary Y MIF Y
CMETs Y PubDb StaticModel PubDb Y Y
Wrappers Y PubDb StaticModel PubDb Y Y Y
CommonMsgs Y PubDb StaticModel PubDb Y Y Y Y
Domains PubDb StaticModel PubDb or PDF Y Y Y Y
Non-Domain Specs MIF or PubXml or PDF PDF ?

A legend for the above table follows:

  • CMETs - Common Model Element Types
  • Common - Material whose primary intent is to provide re-usable specifications for use in defining RIM-derived static model or for use in assembling "Domain" content.
  • CommonMsgs - Shared messages and interactions, primarily for acknowledgment transmittals
  • Domains - Every content specification that is not intended primarily as "Common", that includes RIM-derived, implementable static models, and that relies on a defined behavioral framework.
  • DTs - Data Types Specifications
  • MIF - File in Model Interchange Format (replacing PubXml and VisioXml formats
  • Non-DomainSpecs - Content specifications that are not intended primarily as "Common content" and that are not "Domains"
  • PDF - Page Description Format files for document publication.
  • Primary - This column under Source Forms lists the primary content format or a specification.
  • PubDb - Publication data base, which is expressed as PubXml
  • PubXml - File in HL7's XML document type definition defined in 2000.
  • RIM - Reference Information Model
  • RimXml - File in HL7's XML document type definition for RIM defined in 2000.
  • StaticModel - Form for expressing static models. In 2010 this is principally VisioXML and Visio "vsd" files. In future will be MIF with both graphics and design content.
  • VisioXML - File in HL7's XML type used to represent the design content of a static model defined in Hl7's RMIM Designer in Visio.
  • Wrappers - Transmission, Query, and Control Act Wrappers

Content Alignments

The critical step in content alignment is to assure references from one component to another connect correctly. This is principally a task of assuring that the identifier assigned an element as it is defined is the same as the identifier asserted in any references to that element. The critical kinds of identifier are listed in the table below along with the designation of the source document that acts as the "source of truth" for that identifier, the likely cause(s) of mis-alignment that may occur, and how alignment is effected.

Identifier "Source of Truth" Causes of Mis-Alignment How aligned?
All Artifact IDs in a PubDb Are governed by the HL7_artfactBallotStatus table in the PubDb. Responsibility for these rests with the Publishing Facilitator. Failure to maintain the current status in the "HL7_artfactBallotStatus" table. Automatically. The IDs are assigned as the content is extracted from the PubDb into PubXml format.
Static Model - BaseID SHARED between the PubDb in which it is defined and the static model design document. These sources are developed independently in PubDb and Visio and either of two different authors may err. These MUST be aligned by the facilitator for proper publishing. Accomplished by manually editing one or both sources.
Static Model - Version The PubDb in which the static model is defined rules on this version portion of the static model designer. This is determined by the ballot status that is documented in the "master" PubDB. The version on the static models is automatically updated to match the data from the defining PubDb.
CMET Name CmetInfoExport.txt is the master file of CMET Name/Identifier coordination . The file is maintained by MnM and incorporated as generator input. CMETs are bound to their using models by name, drawing data from the CmetInfo.txt files. However, tooling allows manual entry which may not be represented in CmetInfo. The Static Model Name is forced to align with CmetInfo , and all CMET references in other static models "look up" the correct name from CmetInfo and use that. Neither the name in the defining PubDb nor in the source design rules, but both SHOULD agree with CmetInfo.
RIM Content The rim.xml file placed in CommonSourceFiles sub-directory of Generator/InputFiles determines both the RIM version to be used and which data types release (determined by the binding in rim.xml) to use in Generation. As presently established, the only way to make an error is to place the wrong rim.xml file in the Generator. This should be changed in future. There should be a Generator parameter(s) that determines RIM and DT versions and then selects the appropiate RIM.xml file to use for generation. Any errant reference must be fixed manually. All references are to codes or names that are guaranteed to be unique within their name space.
Vocabulary and Data Type Content The "CoreMif" file for the vocabulary and PubXml for Data Types. Erroneous manual entry in a PubDb or static model design. Any errant reference must be fixed manually. All references are to codes or names that are guaranteed to be unique within their name space.
Graphic File Names The reference should be to the name of the graphic file itself Erroneous manual entry in a PubDb or static model design. Any errant reference must be fixed manually. The facilitator must assure uniqueness of these file names within their domain.

Review And Correction of Alignment Issues

The detection and reporting of content alignment issues is performed by the Quality Assurance Preview process that is the first step in all Generator runs and that can be invoked using publishing target 03.10...Process_Initial_QA.

The Publishing process first assures that all PubXml, MIF, and Static Model VisioXml are positioned in the InputFiles directories of the Generator. Next it "kicks off" the Generator "Preview Reference Integrity".

Assemble Ballot Cross-reference File

The first set of steps performed under "Preview Reference Integrity" are processes that build a single "sress-reference" file of all content submitted for publication in the Ballot. The file, MessageReferences.xml, is created in following sequence:

  • (as part of ScenarioSetup)
    • Working from InputFiles/CommonSourceFiles it analyzes rim.xml and vocabulary.mif to determine:
      • the rimVersion to be used from the generator (the release or version of rim.xml);
      • the rimDatatypeVersion to be used (determined by the data types used in rim.xml); and
      • the vocabRealm and vocabUVVersion identifier (determined from vocabulary.mif)
      (Although these can be over-ridden as other targets are invoked, it is likely to lead to an invalid rim.coremif in which the declarted data types binding may disagree with the actual data types used in the RIM.)
    • Loads the config.txt properties and prepares all directories
  • (as part of BuildModelDefinitionReferenceSource)
    • Analyzes the individual content of all domain-created PubXml files to produce "statusPackage" files that hold the definition (type, name, and combined Id) of artifacts - RMIMs, HMDs, MTs and interactions (for their references to other MTs.
      [Suspect that this should be extended to include ALL artifacts - Story Boards, Application Roles, Trigger Events, the references from interactions to TEs and ARs, and the references from interaction/receiverResponsibility to other, potentially external TEs and interactions.]
    • Assembles (concatenates) the status packages into "intermediateDynamicModelReferences.xml" file that holds all of the PubDb definitions.
  • (as part of BuildApprovalStatusPackage) done but not used
  • (as part of BuildCMETReferenceSource)
    • Builds CmetInfo.xml from CmetInfoExport.txt
    • Assembles a Cmet "statusPackage" to provide reference access to the CMET listings
  • (as part of BuildNonModelReferenceSource)
    • Analyzes the individual content of all non-domain PubXml files to produce "statusPackage" files that references these files use as graphic references spec-references and URL links.
  • (as part of BuildMessageReferenceSources)
    • Assembles a "statusPackage" for each staticModelPackage submitted listing:
      • the model' that they are the design for;
      • all CMET references that they contain;
      • every vocabulary reference used that is unique within that model
      • fatal serialization errors, principally associations that are not blocked in one direction (done only for VisioXml models); and
      • anticipated MIF validation errors, principally invalid class and association names or names that are too long (done only for VisioXml models).
      In this process, the "intermediateDynamicModelReferences" and "CMET status package are used when analyzing VisioXml models in order to correct the reference version numbers and to correct the CMET name references in the message statusPackages.. This correction is not applied to the analysis of designs submitted as HMD files, but could be.
  • Assembles a mimimal vocabulary file (for speed of access) that contains only the elements that can be referenced from models - Concept Domains (by name), Concepts (by CodeSystem Name and Code), and Value Sets (by name).
  • Concatenates all of the "statusPackage" (dynamic, non-domain, cmet, and static model) files into a single MessageReferences.xml file. This file is the foundation for detailed error analysis, alignment as source files are converted to MIF, and quality assurance reporting.

Analysis of Ballot Cross-Reference File

The analysis of the Ballot reference data proceeds in two stages. The first of these is contained within the V3 Generator, and functions in support of the Generator independently of the publishing process. The second step provides a more comprehensive analysis of the data from the generator and is an element of the publishing process. The following sub-sections document each of these.

Step 1: Generator Analysis of MessageReferences File

Thus the MessageReferences.xml file represents all of the content that has been submitted for generation in a single package that can be analyzed for errors, lost dependencies, etc. The analysis steps applied to this include (not in the order executed):

  • Create visioHref.ini and msgToHmdRefs.xml files that supports Visio in producing html graphic overlays for use with the few older models still maintained as "hmd" files.
  • Creates a "global" CMET usage file (Reports/CMET-Usage.xml) that is used by publishing to list all CMEts used and/or defined in a given domain.
  • Creates MessageReferenceTestResults.xml. This file is a detailed analysis of the errors exposed from the Generator perspective. It includes findings on: duplicate static models, non-serializable models, CMETs (missing references, unlisted, etc.), static models not in a PubDb and vice-versa, any unsatisfied message reference (such as from an interaction), MIF validity errors (mostly name anomalies), vocabulary bindings that are not satisfied in vocabulary MIF, and binding failures. This transform (AnalyzeConcatenatedMsgRefSource.xsl) uses the MessageReferences.xml as its primary source, but also uses the CommonSourceFile rim.xml and the previously generated "minimal vocabulary" files as reference sources.
  • Creates console summary of contents of MessageReferenceTestResults.xml.
  • Builds a Proposed Replacement for cmetInfoExport

Builds a Proposed Replacement for cmetInfoExport

The same transform that generates the CMET-Usage.xml file (see above) also computes a proposed replacement file for cmetInfoExport.txt. This proposed file is not incorporated into the Generator, unless the user specifically uses it to replace the cmetInforExport.txt in CommonSourceFiles. (The option to allow a property in configuration.txt to make the swap is worth considering, but if this happened at this point in the transformations, other, previously generated content would need to be regenerated. The key goal of this replacement file is to remove content from cmetInfoExport.txt that is no used at all, and to add content that should be there owing to a CMET reference in a static model that does not come close to matching an entry.

The primary issue is to determine the correct name/attributionLevel combination for this listing. The process proceeds as:

  • Sources: MessageReferences.xml (as primary); CMETrefSource.xml - a proper xml rendition of cmetInfo (as parameter); and CmetNameOverRides.xml - a CommonSourceFile to enforce name changes on previously published CMETs, mostly from non-COCT domains (as parameter).
  • Assembles a node set of mergedSources - message designs that should be listed based either on their being referenced in other designs or being fully listed in COCT domain. The set contains:
    • All message designs (msgElement[@elemType='msg']) where:
      • the message design has been referenced as a CMET in another design or
      • the message design is defined in CMET pubDB but not referenced as a CMET in another design
    • (and) All CMETs defined in CMET PubDb (msgElement[@elemType='msgdef']) and referenced as a CMET in another design, BUT for which there is no message design (msgElement[@elemType='msg'])
  • Assembles a node set theResult of <cmet/> items, one for each entry in mergedSources with most attributes as in the mergedSources entry, but with the name and attribution assembled as follows:
    • Sources:
      • definitionName - @name from the matching msgElement[@elemType='msgdef'] (the PubDb entry)
      • designName - @name from the static model design (from visioXml)
      • starter name - from the matching entry in the previous cmetInfo (CMETrefSource.xml) file
    • Chooses selectedName from following choose/when structure
      • use @name|@attributionLevel from starter name, when starter name exists.
      • use designName when designName contains a space (presumed to be attribution separator)
      • use designName when designName contains underscore (_) as second character
      • use definitionName when definitionName contains a space (presumed to be attribution separator)
      • use definitionName when definitionName contains underscore (_) as second character
      • use designName when it exists
      • use definitionName when it exists
      • use nothing otherwise
        Note: I question the preference (above) for design name (from the VisioXML file) over the definition name (from the PubDb. I would have thought the latter was more reliable, but I made the choice, so ... GWBeeler
    • constructs preferredName from selectedName as: selectedName, unless ((baseId is not from COCT) and (selectedName does not contain either a "|" or a "_")) in which case it prepends the first character of rimRoot (root class of the design) and an underscore (_) (like "R_") to selectedName.
    • finally, it constructs splitName (name|attributionLevel) from preferredName from following choose/when structure
        • when preferred name contains a pipe (|) use preferredName .
        • when an entry exists in CmetNameOverRides.xml for the given base identifier use name from CmetNameOverRides.xml .
        • when preferred name contains a space, use preferredName with first space replaced by "|", and everything after the split - (a) changed to lower case, (b) hyphen converted to "/" and (c) parentheses removed.
        • otherwise use preferredName with a pipe (|) appended.
    • There is also logic to assign a correct Role type for R_ CMETs (defaults to "performed only") and to determine the root classCode from the design or the previous cmetInfo, otherwise will default to the root code for the class (like "ACT" for Act).

Step 2: Publishing Detailed Quality Analysis and Manifest Generation

This section documents the remaining Analysis steps undertaken as part of 03...V3 PubProcess - Align, Check, Revise and Constrain Source Content.

Build Manifests (in Publishing)

The first step is to build a pair of manifest files that can subsequently be used by Publioshing to "package" domain content for submission, but also can be used (with additional processing) to constrain the content submitted to the Generator or, arguably, could be the basis for selective generation.

Arguably, this step should, perhaps, be part of the Generator tools, rather than part of Publishing tools. The rational for this argument is that the only source data comes from the Generator-provided MessageReferences.xml, and it stores its results files in the Generator OutputFiles/Reports and places temporary material in TemporaryFiles/specificManifests. On the other hand, a key feature of this functionality is the ability to test for the existence of source files in the "publishing" input directories. For now, will leave it on the publishing side.

The BuildManifests uses OutputFiles/Reports/MessageReferences.xml and a set of controlling parameters, including the names of its output files and their target directories, and two optional comma-delimited lists of domain codes (like "pa,ct") and non-domain codes (like "rim,vocabulary,its_iso_r2").

In sequence, it analyzes the MessageReference entries (from domain or non-domain pubXML) for every domain or non-domain in the two lists. (If the lists are empty, all of the entries in MessageReferences of the given type are analyzed.) For each such specification, it creates a <manifest/> entry with:

  • attributes for the name (like "uvab"); the directory name (also "uvab") and the directory parent (like "domains")
  • one <file/> element for each of the following:
    • "manifest" file (to be stored)
    • "specification" file (like "uvab.xml")
    • possible "databases" files (like "databases/*_PB*.mdb") (Note use of wild cards)
    • expected static "models" files (like both "sourcegraphics/POCG_*000011*.vsd" and "sourcegraphics/POCG_*000011*")
    • expected "other" file (like "otherdistribution/Final Clinical-Genomics-DAM.zip")
    • each expected graphic file (like "outputgraphics/ExpressionProperty.code.gif")
    • attributes of the <file/> element include:
      • type like "manifest", "models", "graphics", etc.
      • file which is the path string to the file relative to the directory name in the manifest element, with wild cards
      • fileExists a Boolean indicating where the manifest builder was able to identify a candidate file that matches this entry. This may be "false" even when the file is present, because the transform cannot try all possible combinations of the wild cards.
      • note an optional attribute that conveys information from the transform like "Existence of database file untested."

It starts by initiating a further CMET refinement process in the Generator which step includes converting all static models in CMETandWrapperHMDs and VisioModelXmlFiles to MIF format (initial conversion). In these processes:

  • VisioToMif.xslt uses both packageApprovalStatus.mif and IntermediateDynamicModelReferences.xml to convert the "version" element of the identifiers of the static model itself and of all its CMET references to agree with the content of the "pubDb" definitions.
  • pubDbXmlToMif.xslt uses MessageReferences.xml to convert the "version" element of the identifier references to static models in the interactions (transmission wrapper, control act wrapper, pay load, and query) to agree with their PubDb definitions.