Proposed Subversion Structure

From HL7Wiki
Jump to navigation Jump to search

Background

Over ten days ago, we discussed appropriate directory structures for use in managing Publishing content in the Subversion version control system. At that time, Lloyd McKenzie proposed a structure, and I undertook to analyze this and propose a variant that I felt we could both use for publishing in the near future, and use as the foundation for long-term versioning of V3 content and publishing.

This took longer than I had anticipated, in part because I undertook a detailed analysis of the transforms and scripts currently used to publish the content in order to assure Don Lloyd and myself that the conversion would not mandate a line-by-line review of each transform. My methodology was to do a global search for each of the directory and sub-directory names in the scripts, bat files, and transforms and then analyze these and place them in three categories : (1) those which are managed as "properties" or xml "entities", and can therefore be re-defined; (2) those that are listed as in-line text and therefore must be changed in order to preserve current operations; and (3) those which are part of comments and can be treated either by search/replace, or ignored. A spreadsheet listing all such inclusions by file is available to guide conversion.

Objectives

With that information in hand, I undertook to revise Lloyd's proposal while seeking to balance the principles that his structure supports, and the requirements I saw for continuing publication. The primary objectives I sought to meet were:

  1. Use version control for "source" artifacts, NOT for documents that are generated from the source. [Note: "source" may be ambiguous depending on whether you define it from the perspective of the author (that for which the author is the source) or from the perspective of the publishing or generator process (that which is source to these processes). The ambiguity arises PRIMARILY with regard to the two HL7 databases - PubDb and DesignRepository - that hold the source from the author, but which need to be externally expressed into source for the Generator.
  2. Preserve publishing's "two-up, two-down" directory structure that states that navigation from one topic (domain or specification) to another is reached as "../../grouper/otherTopic"

The secondary objectives were to:

  1. Avoid change for change's sake, or to preserve existing directory names where they are already "pretty good"
  2. Provide for the future MIF-based publishing which separates definitional content (static, vocabulary and dynamic models) from the publication packaging documents that contain a large fraction of the overall textual content of a domain.
  3. Document carefully wherever the changes will cause a major over-haul of a collection of content.
  4. Provide a discussion on where the new proposal, the original publishing structure, and Lloyd's proposal diverge.

Results

The results are displayed in Excel from a downloadable ZIP archive. NOTE: There are two displays of the same content on different "sheets". The first is named "StructureDiagrammed" should come up first. It has a fixed row height to emphasize the structure and the mapping from Left to right. The second display is named "StructureDocumented" designed to make it easy to read the notes and discussions for each row.

The columns in both sheets are grouped as three columns on left titled "Current Publication Directories", and three on the right titled "Future Publication Directories"

The three columns under "Current Publication Directories" are:

  • Directory lists the directory names with their hierarchy shown as indented below. Note that there are several patterns (labels A, B, C, and D) that are reused underneath specific topics. The structure has three primary levels, and numerous secondary levels below that:
    • General level is things like "input", "output" and "lib" that segregate major categories;
    • Categories within the general (domains, infrastructure, help, support, etc) separate major categories of subject areas that are distinguished by how they are managed and published; and
    • Subject areas within categories (uvab, uvct, rim, vocabulary) that represent a single specification or domain managed by a single Work Group. The secondary levels below this classify different kinds of information within the specification.
  • Notes annotate the content of the directories and note were specific sub-directory patterns may occur
  • Map left-to-right is a mapping of each directory in the old structure to its future locale in the new structure. These are color-coded (Legend at the top of the sheet) indicating the "level of challenge" in converting the old to the new.

The three columns under "FuturePublication Directories" are:

  • Directory lists the directory names with their hierarchy shown as indented below. Note that there are several patterns (labels A, B, C, and D) that are reused underneath specific topics. The structure has three primary levels, and numerous secondary levels below that:
    • General level is things like "input", "output" and "workgroups" that segregate major categories;
    • Categories within the general (domains, otherspecifications, packagedocuments, support, etc) separate major categories of subject areas that are distinguished by how they are managed and published; and
    • Subject areas within categories (uvab, uvct, rim, vocabulary) that represent a single specification or domain managed by a single Work Group. The secondary levels below this classify different kinds of information within the specification.Secondary levels are listed only once, but may appear in any of the specifications.
  • Notes specify the content of the directories and note were specific sub-directory patterns may occur.
  • Discussion is the place where issues arose in creating the new structure. Issues may have to do with uncertainty about treatment, conflict with Lloyd's proposals, or conflicts with prior use.

Remaining Issues

The "Discussion" column lists a number of "known issues" that should considered. Notable among these are the following:

  1. Retention of "databases" sub-directory - the "databases" sub-directory has been retained as one of the sub-directories beneath a domain. The intention is that these will be dropped as soon as the Publication Database (PubDb) is replaced and "design repositories" are no longer used to store static models (HMDs). This directory has not and will not hold any other content.
  2. Extraction of data base content to source directories - The intention is to use "pre-publication" tools (centered around RoseTree) to extract the content from the PubDb and the design repositories to other sub-directories of the domain (specifically, "behavioral" and "models". This assures that similar content is in one location for the domain and in the place that the content will be when the databases themselves are dropped.
  3. Addition of "publicationpackage" directory - Future MIF2 publishing will split the publication content into two parts. The first part is a "publication package" file that organizes the content and provides information about authors and scope; the topics for the domain, and textual material that characterizes the domain and topics. The second part is a set of "definition" files that are referenced from the publication package. In this structure, the publication package(s) will be in one sub-directory, while the definition files are in other directories such as "models" and "behavior". An alternative would be to place the package files at the domain root. The latter is where the primary domain content is stored today.
  4. Retention of "configuration" directory - I chose to create a separate configuration directory, as Lloyd hd proposed, but it would be equally logical to combine this with the "publicationpackage" directory under a name like "publicationconfiguration."
  5. Name of "models" directory - Lloyd proposed calling this "staticmodels", but I made it more generic in that this is the directory where I would also place the vocabulary model and datatypes model, which are the "definitional" files for their associated specifications.