This wiki has undergone a migration to Confluence found Here
<meta name="googlebot" content="noindex">

VML Processing Widget

From HL7Wiki
Jump to navigation Jump to search

Summary

One day soon, this page will document the VML Processing Widget. This an ANT-scripted process that:

  • consumes proposals for change expressed in the HL7 Vocabulary Maintenance Language (VML),
  • splits these proposals into two source streams targeted at the Access repository in which the HL7 Vocabulary content is stored::
    • an SQL update stream for updating files of properties assigned to vocabulary objects, and
    • an XML stream processed in Java to update the primary Access tables
    together, these streams update the vocabulary content stored in the Access repository,
  • uses a Java process to "clean up" the Access content,
  • invokes RoseTree to express the complete vocabulary content in HL7 Model Interchange Format (MIF), and
  • combines this MIF content from the Access repository with an externally maintained MiF extension file to arrive at a final, complete expression of HL7's vocabulary in MIF.
Jump to top of page

Background

Management of HL7 Vocabulary Content

In brief, the HL7 Vocabulary Content:

  • Is published and releases as a formal expression of the content in Model Interchange Format (MIF)files; This format:
    • Documents the data that define each of the artifacts (Concept Domains, Code Systems, Value Sets, etc.)
    • Documents the relationships between the artifacts (such as bindings);
    • Provides a vehicle for using XML X-path logic to query and/or analyze the content;
    • Provides the source for transforms used to publish the content in HL7 Ballots and Normative Editions.
  • Is archived and persisted two complimentary forms:
    • an Access data base, designed over 13 years ago, that serves as the "repositiry" for this content; plus
    • “Extensions files,” represented in MIF format, that primarily document those Value Sets that are defined against external code systems) expressed in MIF
  • Vocabulary Content is updated through the Vocabulary_and_RIM_Harmonization_Process, the results of which drive updates to the Repository and/or extension files.

Prior to the development of this widget, the means for processing updates to the HL7 Vocabulary Content involved:

  • Creating update files expressed in the HL7 Vocabulary Maintenance Language (VML) files.
  • "Posting the data in these files to the repository database using the Java-encoded procedures compiled as “VocMaint_and_dependencies.jar”. Weaknesses of this process include:
    • Some updates require manual changes (table/row updates) in Access – but only when VML cannot be used. This requirement arose primarily when it was necessary to update concept properties and “object properties” on Concept Domains and Value Sets
    • The original VML schema encapsulates “descriptive markup” (html formatting of descriptions), such as the definition of a concept or the description of a value set's content and purpose)in a CDATA wrapper. This encapsulation precluded validation of the markup prior to its being posted to the data base with the too-frequent requirement to cretae a subsequent correction update.
  • Once the content was posted to the database, the definitive “coremif” was assembled from the Access data by RoseTree, with the “Extensions” merged in by transform.

Capabilities of the New Widget

What does this Widget provide?

  • A revision of the VML Schema and its placement in the “mif namespace”, that:
  • Allows descriptive markup to be validated as the VML files are created
  • Extends the VML to include the ability to “update” and “delete” concept properties
  • Extends the VML to add the ability to “add”, “update”, and “remove/replace” object properties
  • Extends the VML to enable the creation of Value Sets using "extensional definition" against External Code Systems from VML

These extensions are not processed through the extant Java-process. Rather it relies upon pre-defined queries and code (activated from the widget) that perform the necessary data base updates without interfering with the ability to use the original VML for its intended purpose. This is accomplished by:

  • Using XSLT transforms to convert and split “revised VML files” into:
    • “Standard” VML that can be posted to the data base using the current vocabulary maintenance Java programs, AND
    • Creation a standard data import table (in pipe-delimited format) that defines the data and control codes needed to perform the “extension” queries,
  • establishing queries and logic in the Access repository (using Visual Basic fr Applications - VBA - in Access) to post these changes.
  • Process controlled by the Java-based ANT scripting language that are controlled from a Work List, and that automate all non-editorial processes
Jump to top of page

System Requirements and Installation

System requirements

  • Windows - 7 or 8 32- or 64-bit: Although the core elements Script SHOULD work in a Unix or Mac environment, they have not been tested other than in Windows amd other key elements woill NOT work outside of Windows. Specifically:
    • RoseTree, a Windows application, is needed to complete the extraction from the repository and the expression in MIF
    • Access is required in order to post the VML "extension" content into access.
  • Memory should exceed 2 (preferably 4( GB as the RoseTree memory demands while converting to MIF are sizable.
  • 32-bit Java JRE in environment: Even if your installation is a 64-but machine, there must be a 32-bit JRE installed in order for the Java VML posting process to connect to the data base. The 32-bit JRE can be installed side-by-side with a 64-bit JRE.
  • XML editor that will validate from schema: is highly beneficial when creating VML source files for processing. The author uses XML Spy.
  • Microsoft Access: Access is required to process the SQL updates that implement the "VML Extensions". This can be either a 32=bit or 64-bit Office installation.
  • RoseTree III: This application performs the final extraction and MIF expression step. It can be downloaded from the executables in the RoseTree project on Gforge.
  • Text Editor: (your choice) for editing the "working list" that governs the "widget processes." The ANT script will activate whichever text editor is the default for the *.txt" file type on your system,

Widget Installation

The Widget is distributed as a release of Vocabulary Maintenance Widget on HL7 Gforge, and is available as a hyperlink under "Quick Downloads" on the Gforge home page. The widget is distributed in a ZIP archive named like hl7_vocabulary-vmlwidget-m.n.o.zip, where m.n.o is the current release identifier.

Base Installation

  1. Download the archive
  2. Place the archive in a home directory of your choosing, but do not place it inside “Program Files”
    The archive includes its own “root” which will appear in the target directory as it is extracted.
  3. Extract file, maintaining the directory structure; the root directory will appear as named vmlwidget-m.n.o (as before where m.n.o is the release identifier.
  4. In the root directory find the file 00.00_1st_INSTALL_ALL_Change_to_dot_bat_and_RUN_this
  5. Rename it by adding the “.bat” extension at the end.
  6. Run the renamed ".bat file (by double-clicking)
  7. The installation will list the license terms and then pause for you to accept them. Accept the license agreement by responding YES

Directories and Initial Content Post-installation

Post-installation Directory and File Structure

The initial installation results in a file and directory structure as seen in the "screen-shot" shown at right. A summary of these follows, and a more complete explanation of the directory structures is found on a separate VML Processing Widget - Directory Structure page.

Specific Sub-directories

The specific Sub-directories used by the tool include:

  • output - as its name implies, it contains further sub-directories into which the resulting vocabulary repositories and MIF files will be placed. It is worth noting that although the directories are empty (newly created) at the beginning, they are not cleared out by the Widget. Rather it writes new files in, perhaps over-writing old.
  • source - has sub-directories that represent
    • user-generated source, such as new VML files
    • intermediate results such as delimited SQL source tables, and "traditional" VML files, and
    • MIF Extension files that are source content to be updated for value sets defined against external code systems
  • support - whose sub-directories hold the ANT scripts, schema files for the VML content, XSLT transforms for converting and preparing files, etc.
  • working - whose database sub-directory holds both the initial source of truth repository for the prior vocabulary releases, and the new repository being built by the widget.
  • zip archives - these may be treated as directories, but here, they simply hold the source material for the various "batch" files that invoke the widget processes.

Batch (*.bat) Files

The function of this widget is invoked entirely by "running" one of the four "batch" files (ending in ".bat") whose names begin with an integer. (The exact function of these is covered in detail in a separate section of this manual.) Executing or running these files is done by: "double=clicking" on the file in Windows Explorer; "right-clicking" and selecting "Open" from the right-click drop down menu; or selecting the file in Windows Explorer by clicking on it, and then pressing "enter".

The two "batch" files (ending in ".bat") whose names begin with "Run" are required intermediate files that are "called" by one of the primary batch files noted above.

The remaining "batch file (supp_00.00...ExtractSupportingBatFiles.bat) is a utility that will unzip and present additional "supplemental" batch files that are used in maintaining the widget, but are not needed for ordinary use.

Remaining Files

  • defined-environment.properties This file is a "configuration" file for the tool whose function and settings are discussed under section Configuring and Providing Sources for the Widget
  • installation_...log - is a log of the installation steps
  • InstallationGuide.txt is a quick-start guide, just like section Base Installation above.
  • LICENSE.txt Lists the terms you agreed to when you accepted the license.
Jump to top of page

Configuring and Providing Sources for the Widget

Whenever the VML Processing Widget is taken up for use, it is necessary to establish the parameters that define the content to be defined and the environment in which the it will be used. This involves four considerations:

  1. Determining the Release Identifier and Release Date for the vocabulary release towards which the changes are targeted
  2. Collecting the Source of Truth Repository and the latest Vocabulary EXTENSION files from which to start the changes and placing these files in the appropriate directories, and
  3. Setting any necessary properties in the defined-environment.properties file.

Each of these topics is covered in a sub-section below.

Determining the Release Identifier and Release Date

Each set of vocabulary proposals to be implemented (by preparing and posting VML files) is processed within the context of a planned vocabulary "release". By custom, HL7 process three vocabulary releases a year, each resulting from the Harmonization meetings scheduled between Working Group Meetings.

Using this tool suite, the Release Identifier is designated in a pattern of yyyyTn where yyyy is the calendar year, and n is the trimester number, with "1" starting at the beginning of the January Working Group Meeting and continuing to the start of the May Working Group Meeting when trimester "2" begins. If there is a need for extra releases between two releases, they will be designated as yyyyTnCm where yyyyTn is the identifier for the most recently completed trimester release, and m is the sequence number of the special (usually corrective) releases.

The Release Date is a secondary, but important parameter for any release. For value set versions and code system released versions that are dated, this is the "version date" that will be assigned. In the case of the planned trimester releases, this date will be the day before the ballot opening for the ballot held at the end of that trimester. For any extra releases, a "close date" will need to be determined and assigned prior to the initiation of processing for this release.

Collecting and Placing Initial Repository and MIF EXTENSION files

Since each vocabulary release builds upon its predecessor, the next critical step is to collect the source of truth for the vocabulary repository (in Access) and for the vocabulary EXTENSION file. Together these are the authoritative sources fromwhich to begin.

Both of these files can be found by downloading the latest hl7_rimRepos....zip file from the rimRepos releases in the Design Repository project on Gforge. The full release file name will be something like hl7_rimRepos-2.44.7.zip (where the 2.44.7 reflects point releases for RIM release 2.44). The contents of the zip archive will contain a variety of files, but two among them are needed here:

  1. Vocabulary repository in Access named like rim_none_Voc1287_20140516_Repository20140610.mdb where the release numbers (like 1287_20140516) and dates (like 20140610) vary from release to release.
    Place a copy of this file in the Widget sub-directory working\database
    AND Rename the file SourceOfTruthRepository.mdb
    (renaming is optional, but if the file is not renamed the property file.source.of.truth.database will need to be set in the defined.environment.properties file)
  2. Vocabulary EXTENSION file in MIF is contained within a zip archive named like EXTENSION=UV=VO=1287-20140516.zip where the release number (like 1287-20140516) varies from =release to release. Inside the arvhiceve is a single *.coremif file with the same name as the archive.
    Extract the coremif file and place a copy of it in the Widget sub-directory source\extensionCoremif
    AND Rename the file StartingEXTENSION.coremif
    (renaming is optional, but if the file is not renamed the property file.initial.extension.coremif will need to be set in the defined.environment.properties file)

Setting properties in the defined-environment.properties file

The final preparatory step is to set several critical elements in the file defined-environment.properties in the root directory of the Widget. This file is, in effect, a configuration file for the Widget determining a number of critical properties. The following shows the opening lines of this file.

##############################
# Users Software Environment #
##############################

#env.hl7.tools.directory  - the root directory UNDER the 32 bit "program Files" directory, and 
#=======================    in which RoseTreeIII and other HL7 programs are installed
#Default: HL7

########################
# Vocabulary Release ID #
########################
#Default: releaseId=2014T2_2014-08-07

########################
# DATA BASE FILE NAMES #
########################

#file.source.of.truth.database - File name for source-of-truth data base 
#=============================
#Default: file.source.of.truth.database=SourceOfTruthRepository.mdb

As is customary in most property file, any row beginning with the "#" (hash mark) will be ignored, and thus is used for documentation. Blank lines are similarly ignored. true property lines start with thr property name (like file.initial.extension.coremif followed by an "=" (equal sign) and the value of the property.

In this particular file, if the two "source of truth" files were renamed as suggested in section Collecting and Placing Initial Repository and MIF EXTENSION files, the default values for file.source.of.truth.database and file.initial.extension.coremif are correct. Otherwise these these properties will need to be set to the actual file names.

Thus, the only property that will almost certainly need be changed here is the releaseId property. The value of this property is the Release Identifier (listed above) concatenated with the Release Date (expressed as an XML date yyyy-mm-dd) with an underscore ("_") separating the two elements. The default value is 2014T2_2014-08-07 which is correct for the summer trimester of 2014.

Jump to top of page