CTS2/doc/CTS2 SFM EntityArea
- 1 Alignment between the CTS2 SFM Concept Classes and the CTS2 PIM Entity Classes
Alignment between the CTS2 SFM Concept Classes and the CTS2 PIM Entity Classes
This section describes the alignment between the portion of the CTS2 SFM that focuses on "concepts", "association types" and "defined entity properties" and their analogs in the CTS2 PIM. We begin by applying a number of simplifying transformations to the existing SFM model and then aligning the result with the CTS2 PIM analog.
Refactoring the SFM Model
Original SFM Model
We begin with the class layout of the PIM as defined in the SFM.
Step 1: Factoring out CodeSystemNode
The SFM model includes the notion of CodeSystemEntity, "an abstract class that represents either a node (CodeSystemNode) or an association (CodeSystemEntityVersionAssociation) within an overall ontology for a code system". As this class has only one (optional) attribute, id, the PIM authors concluded that its primary intent was to assert the following invariants:
- Two different concepts cannot have the same id within different versions of the same code system
- Two different associations cannot have the same id within different versions of the same code system
Unfortunately, declaring id as optional effectively defeats the first invariant. Changing it to required, (as it is shown in the original SFM UML diagram), also ends up asserting that every association in a code system must have a unique id, which is not the situation in a significant number of code systems that exist today. The CTS2 PIM authors decided that the intent of the class should be represented differently. Making note of the invariants that it was attempting to preserve, they eliminated CodeSystemEntity from the PIM, moving the requirement for an id into CodeSystemNode and addressing the association id separately.
Step 2: Fold CodeSystemEntityVersion into CodeSystemVersion
The SFM model explicitly states that "some kinds of changes at the lower 'entity' level [may occur] that do not require the release of a new overall Code System Version, so potentially there may be several versions at the CodeSystemEntity level associated with a single CodeSystemVersion". All attempts to model this in the CTS2 PIM resulted an explosion of model complexity. This assertion means that you can no longer ask "What is the definition of Appendicitis according to the SNOMED CT core release on 20120131?" A person (or service) would have to ask, "What version of the definition of 'Appendicitis' do you want?". Determining what version of SNOMED CT that an organization was using would require both the base edition and an enumeration of all of the concept versions (e.g. SNOMED CT Core 20120131 + Appendicitis V17 + Apnea V3 + Pnemonia V22), and queries would have to say "What does SNOMED CT Core 20120131 say about Appendicitis V16?"
The PIM authors decided that there was simply no way to make sense of this approach as it was modeled in the SFM. Instead, they proposed the following:
- A code system version can either be in a "FINAL" or "OPEN" state.
- The contents code system version in a "FINAL" state cannot undergo further change. Knowledge of the version identifier and release format should sufficient to know everything about its content.
- The contents of an code system version in an OPEN state is unknown. The contents are in a state of flux and the only way to ask meaningful questions is to fix a date (what did the OPEN SNOMED CT Core 20120731 release look like as of March 1, 2012?) or to identify a particular group of changes.
These two assertions allows a second simplification:
- From a logical perspective, a CodeSystemVersion contains (determines) the description of a given CodeSystemEntityVersion
- The id attribute for any given description must be unique within a containing CodeSystemVersion
A logical consequence of this simplification is the elimination of the CodeSystemNodeVersionMembership class, as the relationship is no longer many to many. If needed the isConceptInitiator attribute can be assigned to the CodeSystemEntityVersion objects themselves. The diagram below shows the result of this set of simplifications:
Step 3: Define AssociationType and DefinedEntityProperty as first class objects
We now observe that we have a couple of "orphan" classes - AssociationType and DefinedEntityProperty. While the SFM model clearly states that AssociationType is "not a part of a code system, but a separate entity which is used to further characterize associations.", this runs counter to almost all of the terminology and ontology models examined by the PIM authors:
- SNOMED CT defines the typeid as first class concepts within the SNOMED CT coding system.
- All RDF and OWL ontologies define "properties" ("predicates") as first ontology components, and always (with the exception, perhaps of RDF itself) in terms of "properties" defined in other ontologies
- The NCI Thesaurus defines all of its properties as first class members of the ontology
- The UMLS defines its relationship types with CUIs
While we could proceed with this list, the bottom line is that the PIM authors disagree with the notion that AssociationTypes should (or even can) exist without being "a part of a [some] code system".
A similar argument applies to DefinedEntityProperty. While some code systems to not explicitly treat properties as first class entities, this practice is becoming less common. OWL2 has enabled subtyping of AnnotationProperty, the NCI Thesaurus and UMLS already treat properties as first class entities and the RF2 version of SNOMED CT has acknowledged this as well, the PIM authors believe that all properties should have identifiers and some sort of definition - that properties should be represented as first class entities or "concepts". When a definition isn't present, the property is still represented as a URI (http://id.my.org/codesystem/property/propx) and a local identifier (propx). Hopefully someone will come along and define the intent and purpose of "propx" using (surprise)... a code system.
The final adjustment that the PIM authors made to the SFM model involved CodeSystemConceptCode node. The model, as it stands, indicates that a CodeSystem may consist of nothing more than a collection of codes. Each code necessarily has at least one CodeSystemEntityVersion which, in turn has at least one Designation. If a CodeSystemConcept was associated with, say, two CodeSystemConceptCodes, there would be three (distinct?) ids, and a minimum of three designations - one for each code and one for the concept itself.
It appears that this element was introduced to address the assertion that "Designations have no such restriction or intent [with respect to uniqueness] and are merely alternate display formats." The AG believes that this requirement can be accomplished by adjusting the SFM model by:
- adding an optional code attribute to the CodeSystemEntityVersion itself and
- asserting that code must be unique within the context of the (indirectly) containing CodeSystemVersion
The final CTS2 PIM weakens the second assertion slightly, as it is one of many possible business rules and the introduction of namespaces allows a more refined approach.
SFM / PIM Comparison
It is this final image that we then compare to the CTS2 PIM analog
|CodeSystemEntityVersion||EntityDescriptionBase||This was renamed because:
|CodeSystemConcept||ClassDescription or NamedEntityDescription||Which this corresponds to depends on the specific definition of "Concept". Some code systems treat classes, individuals, properties, etc. as "Concepts". Others view "Class" and "Concept" as synonymous.|
|AssociationType||ObjectPropertyDescription||The CTS2 PIM chose the name because it corresponded more closely with the OWL and ODM models.|
|DefinedEntityProperty||AnnotationPropertyDescription||The containment relationship with CodeSystem is not directly present in the CTS2 PIM. The CTS2 PIM provides no direct way to determine what properties are used in a given code system (version). Instead, it maintains a list of properties that are known to the service. Note that this does not preclude implementations from providing this information, but they are were not included because it was thought that that they added complexity while delivering limited value.|