VML Processing Widget
Contents
Summary
One day soon, this page will document the VML Processing Widget. This an ANT-scripted process that:
- consumes proposals for change expressed in the HL7 Vocabulary Maintenance Language (VML),
- splits these proposals into two source streams targeted at the Access repository in which the HL7 Vocabulary content is stored::
- an SQL update stream for updating files of properties assigned to vocabulary objects, and
- an XML stream processed in Java to update the primary Access tables
- together, these streams update the vocabulary content stored in the Access repository,
- an SQL update stream for updating files of properties assigned to vocabulary objects, and
- uses a Java process to "clean up" the Access content,
- invokes RoseTree to express the complete vocabulary content in HL7 Model Interchange Format (MIF), and
- combines this MIF content from the Access repository with an externally maintained MiF extension file to arrive at a final, complete expression of HL7's vocabulary in MIF.
Background
Management of HL7 Vocabulary Content
In brief, the HL7 Vocabulary Content:
- Is published and releases as a formal expression of the content in Model Interchange Format (MIF)files; This format:
- Documents the data that define each of the artifacts (Concept Domains, Code Systems, Value Sets, etc.)
- Documents the relationships between the artifacts (such as bindings);
- Provides a vehicle for using XML X-path logic to query and/or analyze the content;
- Provides the source for transforms used to publish the content in HL7 Ballots and Normative Editions.
- Documents the data that define each of the artifacts (Concept Domains, Code Systems, Value Sets, etc.)
- Is archived and persisted two complimentary forms:
- an Access data base, designed over 13 years ago, that serves as the "repositiry" for this content; plus
- “Extensions files,” represented in MIF format, that primarily document those Value Sets that are defined against external code systems) expressed in MIF
- an Access data base, designed over 13 years ago, that serves as the "repositiry" for this content; plus
- Vocabulary Content is updated through the Vocabulary_and_RIM_Harmonization_Process, the results of which drive updates to the Repository and/or extension files.
Prior to the development of this widget, the means for processing updates to the HL7 Vocabulary Content involved:
- Creating update files expressed in the HL7 Vocabulary Maintenance Language (VML) files.
- "Posting the data in these files to the repository database using the Java-encoded procedures compiled as “VocMaint_and_dependencies.jar”. Weaknesses of this process include:
- Some updates require manual changes (table/row updates) in Access – but only when VML cannot be used. This requirement arose primarily when it was necessary to update concept properties and “object properties” on Concept Domains and Value Sets
- The original VML schema encapsulates “descriptive markup” (html formatting of descriptions), such as the definition of a concept or the description of a value set's content and purpose)in a CDATA wrapper. This encapsulation precluded validation of the markup prior to its being posted to the data base with the too-frequent requirement to cretae a subsequent correction update.
- Some updates require manual changes (table/row updates) in Access – but only when VML cannot be used. This requirement arose primarily when it was necessary to update concept properties and “object properties” on Concept Domains and Value Sets
- Once the content was posted to the database, the definitive “coremif” was assembled from the Access data by RoseTree, with the “Extensions” merged in by transform.
Capabilities of the New Widget
What does this Widget provide?
- A revision of the VML Schema and its placement in the “mif namespace”, that:
- Allows descriptive markup to be validated as the VML files are created
- Extends the VML to include the ability to “update” and “delete” concept properties
- Extends the VML to add the ability to “add”, “update”, and “remove/replace” object properties
- Extends the VML to enable the creation of Value Sets using "extensional definition" against External Code Systems from VML
These extensions are not processed through the extant Java-process. Rather it relies upon pre-defined queries and code (activated from the widget) that perform the necessary data base updates without interfering with the ability to use the original VML for its intended purpose. This is accomplished by:
- Using XSLT transforms to convert and split “revised VML files” into:
- “Standard” VML that can be posted to the data base using the current vocabulary maintenance Java programs, AND
- Creation a standard data import table (in pipe-delimited format) that defines the data and control codes needed to perform the “extension” queries,
- “Standard” VML that can be posted to the data base using the current vocabulary maintenance Java programs, AND
- establishing queries and logic in the Access repository (using Visual Basic fr Applications - VBA - in Access) to post these changes.
- Process controlled by the Java-based ANT scripting language that are controlled from a Work List, and that automate all non-editorial processes
System Requirements and Installation
System requirements
- Windows - 7 or 8 32- or 64-bit: Although the core elements Script SHOULD work in a Unix or Mac environment, they have not been tested other than in Windows amd other key elements woill NOT work outside of Windows. Specifically:
- RoseTree, a Windows application, is needed to complete the extraction from the repository and the expression in MIF
- Access is required in order to post the VML "extension" content into access.
- RoseTree, a Windows application, is needed to complete the extraction from the repository and the expression in MIF
- Memory should exceed 2 (preferably 4( GB as the RoseTree memory demands while converting to MIF are sizable.
- 32-bit Java JRE in environment: Even if your installation is a 64-but machine, there must be a 32-bit JRE installed in order for the Java VML posting process to connect to the data base. The 32-bit JRE can be installed side-by-side with a 64-bit JRE.
- XML editor that will validate from schema: is highly beneficial when creating VML source files for processing. The author uses XML Spy.
- Microsoft Access: Access is required to process the SQL updates that implement the "VML Extensions". This can be either a 32=bit or 64-bit Office installation.
- RoseTree III: This application performs the final extraction and MIF expression step. It can be downloaded from the executables in the RoseTree project on Gforge.
- Text Editor: (your choice) for editing the "working list" that governs the "widget processes." The ANT script will activate whichever text editor is the default for the *.txt" file type on your system,
Widget Installation
The Widget is distributed as a release of Vocabulary Maintenance Widget on HL7 Gforge, and is available as a hyperlink under "Quick Downloads" on the Gforge home page. The widget is distributed in a ZIP archive named like hl7_vocabulary-vmlwidget-m.n.o.zip, where m.n.o is the current release identifier.
Base Installation
- Download the archive
- Place the archive in a home directory of your choosing, but do not place it inside “Program Files”
- The archive includes its own “root” which will appear in the target directory as it is extracted.
- Extract file, maintaining the directory structure; the root directory will appear as named vmlwidget-m.n.o (as before where m.n.o is the release identifier.
- In the root directory find the file 00.00_1st_INSTALL_ALL_Change_to_dot_bat_and_RUN_this
- Rename it by adding the “.bat” extension at the end.
- Run the renamed ".bat file (by double-clicking)
- The installation will list the license terms and then pause for you to accept them. Accept the license agreement by responding YES
Directories and Initial Content Post-installation
The initial installation results in a file and directory structure as seen in the "screen-shot" shown at right. A summary of these follows, and a more complete explanation of the directory structures is found on a separate VML Processing Widget - Directory Structure page.
Specific Sub-directories
The specific Sub-directories used by the tool include:
- output - as its name implies, it contains further sub-directories into which the resulting vocabulary repositories and MIF files will be placed. It is worth noting that although the directories are empty (newly created) at the beginning, they are not cleared out by the Widget. Rather it writes new files in, perhaps over-writing old.
- source - has sub-directories that represent
- user-generated source, such as new VML files
- intermediate results such as delimited SQL source tables, and "traditional" VML files, and
- MIF Extension files that are source content to be updated for value sets defined against external code systems
- user-generated source, such as new VML files
- support - whose sub-directories hold the ANT scripts, schema files for the VML content, XSLT transforms for converting and preparing files, etc.
- working - whose database sub-directory holds both the initial source of truth repository for the prior vocabulary releases, and the new repository being built by the widget.
- zip archives - these may be treated as directories, but here, they simply hold the source material for the various "batch" files that invoke the widget processes.
Batch (*.bat) Files
The function of this widget is invoked entirely by "running" one of the four "batch" files (ending in ".bat") whose names begin with an integer. (The exact function of these is covered in detail in a separate section of this manual.) Executing or running these files is done by: "double=clicking" on the file in Windows Explorer; "right-clicking" and selecting "Open" from the right-click drop down menu; or selecting the file in Windows Explorer by clicking on it, and then pressing "enter".
The two "batch" files (ending in ".bat") whose names begin with "Run" are required intermediate files that are "called" by one of the primary batch files noted above.
The remaining "batch file (supp_00.00...ExtractSupportingBatFiles.bat) is a utility that will unzip and present additional "supplemental" batch files that are used in maintaining the widget, but are not needed for ordinary use.
Remaining Files
- defined-environment.properties This file is a "configuration" file for the tool whose function and settings are discussed under section Configuring and Providing Sources for the Widget
- installation_...log - is a log of the installation steps
- InstallationGuide.txt is a quick-start guide, just like section Base Installation above.
- LICENSE.txt Lists the terms you agreed to when you accepted the license.
Configuring and Providing Sources for the Widget
Whenever the VML Processing Widget is taken up for use, it is necessary to establish the parameters that define the content to be defined and the environment in which the it will be used. This involves four considerations:
- Determining the Release Identifier and Release Date for the vocabulary release towards which the changes are targeted
- Collecting the Source of Truth Repository and the latest Vocabulary EXTENSION files from which to start the changes and placing these files in the appropriate directories, and
- Setting any necessary properties in the defined-environment.properties file.
Each of these topics is covered in a sub-section below.
Determining the Release Identifier and Release Date
Each set of vocabulary proposals to be implemented (by preparing and posting VML files) is processed within the context of a planned vocabulary "release". By custom, HL7 process three vocabulary releases a year, each resulting from the Harmonization meetings scheduled between Working Group Meetings.
Using this tool suite, the Release Identifier is designated in a pattern of yyyyTn where yyyy is the calendar year, and n is the trimester number, with "1" starting at the beginning of the January Working Group Meeting and continuing to the start of the May Working Group Meeting when trimester "2" begins. If there is a need for extra releases between two releases, they will be designated as yyyyTnCm where yyyyTn is the identifier for the most recently completed trimester release, and m is the sequence number of the special (usually corrective) releases.
The Release Date is a secondary, but important parameter for any release. For value set versions and code system released versions that are dated, this is the "version date" that will be assigned. In the case of the planned trimester releases, this date will be the day before the ballot opening for the ballot held at the end of that trimester. For any extra releases, a "close date" will need to be determined and assigned prior to the initiation of processing for this release.
Collecting and Placing Initial Repository and MIF EXTENSION files
Since each vocabulary release builds upon its predecessor, the next critical step is to collect the source of truth for the vocabulary repository (in Access) and for the vocabulary EXTENSION file. Together these are the authoritative sources fromwhich to begin.
Both of these files can be found by downloading the latest hl7_rimRepos....zip file from the rimRepos releases in the Design Repository project on Gforge. The full release file name will be something like hl7_rimRepos-2.44.7.zip (where the 2.44.7 reflects point releases for RIM release 2.44). The contents of the zip archive will contain a variety of files, but two among them are needed here:
- Vocabulary repository in Access named like rim_none_Voc1287_20140516_Repository20140610.mdb where the release numbers (like 1287_20140516) and dates (like 20140610) vary from release to release.
- Place a copy of this file in the Widget sub-directory working\database
- AND Rename the file SourceOfTruthRepository.mdb
- (renaming is optional, but if the file is not renamed the property file.source.of.truth.database will need to be set in the defined.environment.properties file)
- Vocabulary EXTENSION file in MIF is contained within a zip archive named like EXTENSION=UV=VO=1287-20140516.zip where the release number (like 1287-20140516) varies from =release to release. Inside the arvhiceve is a single *.coremif file with the same name as the archive.
- Extract the coremif file and place a copy of it in the Widget sub-directory source\extensionCoremif
- AND Rename the file StartingEXTENSION.coremif
- (renaming is optional, but if the file is not renamed the property file.initial.extension.coremif will need to be set in the defined.environment.properties file)
Setting properties in the defined-environment.properties file
The final preparatory step is to set several critical elements in the file defined-environment.properties in the root directory of the Widget. This file is, in effect, a configuration file for the Widget determining a number of critical properties. The following shows the opening lines of this file.
############################## # Users Software Environment # ############################## #env.hl7.tools.directory - the root directory UNDER the 32 bit "program Files" directory, and #======================= in which RoseTreeIII and other HL7 programs are installed #Default: HL7 ######################## # Vocabulary Release ID # ######################## #Default: releaseId=2014T2_2014-08-07 ######################## # DATA BASE FILE NAMES # ######################## #file.source.of.truth.database - File name for source-of-truth data base #============================= #Default: file.source.of.truth.database=SourceOfTruthRepository.mdb
As is customary in most property file, any row beginning with the "#" (hash mark) will be ignored, and thus is used for documentation. Blank lines are similarly ignored. true property lines start with thr property name (like file.initial.extension.coremif followed by an "=" (equal sign) and the value of the property.
In this particular file, if the two "source of truth" files were renamed as suggested in section Collecting and Placing Initial Repository and MIF EXTENSION files, the default values for file.source.of.truth.database and file.initial.extension.coremif are correct. Otherwise these these properties will need to be set to the actual file names.
Thus, the only property that will almost certainly need be changed here is the releaseId property. The value of this property is the Release Identifier (listed above) concatenated with the Release Date (expressed as an XML date yyyy-mm-dd) with an underscore ("_") separating the two elements. The default value is 2014T2_2014-08-07 which is correct for the summer trimester of 2014.
Creating VML Files From Proposals
The task of creating VML files based on the Harmonization proposals is straightforward, albeit tedious and challenging when one discovers that the data in the proposal is incomplete.
Editing and Validating VML Files
The extended VML language is documented in detail elsewhere. The complete schema for the VML is distributed with the widget in the file: support/xsd/VocabularyRevisionMif.xsd.
It is strongly recommended that VML authors use an XML editor that can continually validate the file against the VML schema and can "prompt" for the XML elements and attributes that might be used at any point in the file. XML Spy is frequently used to this end, but other validating XML editors are equally usable.
VML Example Template
An example file that can be used as a template for building VML files is distributed here as file support\templates\VML-MIF starter template.vmif This template serves two primary purposes:
|
An example of the required header data for VML posting can be seen in the following example. The screen-shot at right shows the top elements from a Harmonization proposal. The elements marked with green highlighter are those that will be positioned within the actual VML file (as seen in the figure below). |
Specifically, the elements and attributes of <editDescription/> come from:
- attr:creationDate is the date on which the VML file is created
- attr:primaryContact is the name of the first person in the proposal "Editor/Author" field
- attr:proposalId is the "Recommendation ID" at upper right on the proposal header
- attr:committee is the Work Group listed as "Sponsored by:" in the second box on the left of the proposal header
- elem:proposalName holds the "PROPOSAL NAME" field of the proposal header
- elem:descriptionHolds the text (optionally with html markup) that appears under "SUMMARY RECOMMENDATION" in the proposal
Working List to Manage Posting Process
The sequence of tasks that is undertaken by the Widget is determined by a working list file. This file is a simple ASCII text file that is stored in directory support\manifests\WorkingList.txt. (Underneath the covers, the working list is maintained in xml (documented elsewhere), but the entire user interface centers around the text version.) The following is an example Working List
#> Manifest: 201406102313 # ResetDb #> Start example second trimester proposals #> ============================== # VML-MIFstarterTemplate.vmif # VML-MIFstep2Template.vmif # VML-MIFcloseTemplate.vmif #> end t2 #> Start example third trimester proposals #> ============================= VML-MIFstep3Template.vmif VML-MIFclose2014T3.vmif #> end T3 MakeNested #> ## The End ##
Rules for Working List Entrries
- The first row SHALL open with the string #> Manifest: followed by a space and a date time stamp
- The last row SHOULD be #> ## The End ##
- The remaining (intervening) rows are of one of four types:
- blank row is recognized by its being blank or empty, These may be used to segregate or group content in the other rows;
- annotation row is a row that starts with "#>". These may be used to document the working list;
- file row starts with a File Pattern (defined below) that is followed by the file name for an extended VML file.
- command row starts with a Command Pattern (defined below) that is followed one of two command strings:
- ResetDb - SHOULD be the first non-blank, non-annotation row; or
- MakeNested - SHOULD be the last non-blank, non-annotation row
- ResetDb - SHOULD be the first non-blank, non-annotation row; or
- These commands initiate processes to either reset the data base to the "source of truth", or to complete processing of the repository, including extracting coremif files.
- blank row is recognized by its being blank or empty, These may be used to segregate or group content in the other rows;
- With the exception of the preferred locations for the "command" lines, the other types can be in any sequence or number that the author prefers.
File Pattern
The file pattern that may be used to open a line of type file provides values for two Boolean properties, skipProcessing and tested. The default value (signified by the absence of an indicator in the pattern) for both properties is false.
Specifically, the pattern may include spaces and further may include:
- # (if present) indicates that skipProcessing is "true", and if present, must be placed in the first non-space position on the line
- $ (if present) indicates that tested is "true", and must beplaced after # (if that is present) and before the file name
Thus, the following patterns are interpreted:
- fileName - Untested file, and do not skip processing
- $fileName - Tested file to be processed
- # fileName - Untested file, and skip processing
- # $ fileName - Tested, but skip proessing.
- Note that extra spaces, as in the last example, are optional at any place in the opening; they will be dropped when the line is “normalized”
Command Pattern
The command pattern may be used to open a line of type command and provides a value for the Boolean property skipProcessing. The default value (signified by the absence of an indicator in the pattern) for the properties is false.
Specifically, the pattern may include spaces and further may include:
- # (if present) indicates that skipProcessing is "true", and if present, must be placed in the first non-space position on the line