Context SMIRFs for RIM based relational databases
Original word version: Context SMIRFs for RIM based relational databasesFile:RIM DB SMIRF.doc
Wiki version (incomplete)
Author: Peter Hendler MD, Kaiser Permanante/HL7 RIMBAA Co-chair
Storing and Querying Large Interconnected HL7 V3 RIM Object Nets - SMIRFs to the rescue
Large transactional instances of RIM based clinical information, such as an instance of a CDA clinical summary document, can be stored in relational databases using Object to Relational Mapping (ORM) tools such as Hibernate. This is the approach taken by various applications that use a RIM-based persistence layer (e.g. the JAVA SIG reference API). All incoming RIM based documents or messages are stored ?as is?. In this paper we will use the example of a relational database with table names such as Act, Participation, Role and Entity. This is how the current JAVA SIG persists RIM data. No pre processing is done on the CDA documents; each document can be easily recreated from the database.
In a CDA document, the Subject of Record and various other parts of the contextual information are established in the root Act (a.k.a. the ?Entry Point?, the class that?s used as the starting point for serialization of a model) of ?ClinicalDocument?. If many CDA documents related to a given patient are accumulated and persisted in a database, it becomes difficult to query for all of the Observations of a given kind for a given patient. This is because the contextual information related to the Observation may be distant from the Observation itself, and it may have been overridden somewhere between the ?Entry Point? of the instance and the Observation we are focusing on.
The problem is that it?s hard to establish the contextual information for an Act that is contained somewhere in a CDA document.
We can illustrate the problem using the example of a relational database system that will store multiple CDA documents. In this example, the entry point is a child of Act called ?ClinicalDocument?. Attached to this entry point are multiple relationships that set the ?context? of the entire structure (see Figure 1, left oval). These include Record Target (i.e. Patient), Author, (Legal) Authenticator and Custodian. In most cases this context is valid as you navigate distally from the entry point from the entry point through many Act to ActRelationship connections. The context may apply to the Structured Body, to Section, to Entries, as well as to the Clinical Statement Acts that carry the software processable data.
The problem is that as we traverse this tree distally, we can change or override the context at various points. For example, a section might be called Family History and the Subject might change from the Patient of Record to a family member (see Figure 1, middle oval). Or perhaps, a section or entry might be authored by a different provider. Certainly the Performer of various procedures (see Figure 1, right oval) will not always be the same person who authored the document. The Act Relationships will also change. The components of organizers (such as the individual tests in a lab battery like a CBC) will not be the same, the reference ranges, the authorizations, related documents, and inFullFillmentOfs will all be different as the tree is walked.
Assume that we have stored an entire CDA into a RIM database with table names such as Act, Participation, Role and Entity. Now assume that we want to find all of the Glucose measurements for a given patient with a given data range.
The Observation of the Glucose measurement (Figure 1, right oval) is an entry in the Clinical Statement, and it is separated from the Patient ID (in most cases this would be a Medical Record Number or Patient Id) by at least seven table joins. There are seven associations between RIM classes to navigate from the Observation in the CDA Entry up to the Patient role associated with the ClinicalDocument act. It's even harder than this, because we can?t just navigate to the root ClinicalDocument. How do we know where and if the Subject of Record, Author, Performer or other important context has been over ridden in a Section (Figure 1, middle oval)? You would have to check each Act as you navigated towards the root to see if context had been over ridden. This, along with the large number of joins is both time consuming and difficult.
It would be beneficial if there was a ?near by? reliable structure to inform us of all the correct context.
Option 1: SMIRFs
In order to solve this issue we?ll use the concept of SMIRFs. A SMIRF is a ?Small Isolated RIM Fragment?. Examples include: a Clinical Statement, an Observation, an Encounter or a Substance Administration. A ?Context SMIRF? is a special kind of SMIRF in that its sole purpose is to encapsulate the context of a SMIRF. For example: a ControlAct with an associated Author and RecordTarget. The entire structure of a Context SMIRF is self contained and does not rely on any connections to other structures.
Each SMIRF (core clinical data; red rectangles on the left hand side of Figure 2) is associated with exactly one Context SMIRF (context of that data; red rectangles and associations shown on the right hand side of Figure 2).
The use of SMIRFs and Context SMIRFs require that we perform the following processing steps upon receipt of a CDA document: 1. We can use the classic computer science idea of a stack (First On, First Off) of Context SMIRF. As we enter the CDA we create a Context SMIRF with all the ActRelationships and Participations as found attached to the entry point (the ClinicalDocument class). We attach these to a stub Act (probably a ControlAct) whose sole purpose is to have an identifier and be the ?stand in? for the Act we are parsing at the moment. We put this context SMIRF on top of a ?context stack?. Every Act we come to will be stored in the database along with a link to this Context SMIRF that is currently on the top of the ?context stack?. In figure 2 the original Context SMIRF is the one at the bottom right of the figure. 2. Every time we parse a new Act (and I am referring to all children of Acts here), we check to see if any context is overridden ? note that the methodology described here will work regardless of the style of Context Conduction that?s being used in the model. If it is we make a copy of the previous Context SMIRF on the top of the stack and modify it according to the new relationships (ActRelationships and Participations). We then put this new modified context SMIRF on the top of the context stack. Any Act (or children of Acts) that are stored in the database distal to this branch are linked in the database to this new Context SMIRF on the top of the context stack. This is applied every step of the way down to the leaves of the CDA tree. The leaves are often the individual clinical statement entries. 3. In the case where the Query first ?selects? observations of a particular type, for example, glucose measurements, we will be selecting ?leaves? of the CDA tree. These leaves will not be connected directly to their context. In other words, the Subject of Record may be attached to the root Act of clone type ?ClinicalDocument?, or the subject of record may have been changed somewhere between the root and the leaf, for example in a section Act as illustrated by the middle oval of Fig 2. In these cases the tree has to be navigated towards the root while checking for context. This is both time consuming and difficult. If instead, context SMIRFs are linked to the leaves (observations, patient encounters, substance administrations etc), then this entire process can be avoided. The context would be found in the context SMIRF that was linked at the time of parsing and persisting rather than leaving a difficult problem to the time of querying.
The final result of this technique is that the entire CDA can be stored in the relational database just as before, but with one very important improvement. Now when we want to query for all of the glucose measurements for a given Patient ID (e.g. MRN), we know that the correct Patient Id can be found by following the link to the Context SMIRF that was in effect when the entry was parsed.
Option 2: Decompose conducted context
There is a known alternative way of solving the issue, which is to decompose the conducted context prior to persisting the document contents. Let?s again consider the example where the Author and Subject participations, as well as the Act Relationship with the encompassing encounter are present in the CDA header. They may be overridden at any point in the distal object structure.
Decomposing the conducted context requires that we perform the following processing steps upon receipt of a CDA document: 1. Starting at the root Act (ClinicalDocument) we determine what the associations are that are conducted to the children of that Act. Let?s call those the ?contextual associations?. 2. Every time we parse a new Act (and I am referring to all children of Acts here), we check to see if any contextual association is overridden. If the association is not overridden, the contextual association will be copied (duplicated) on the new act. 3. In the case where the Query first ?selects? observations of a particular type, for example, glucose measurements, we will be selecting ?leaves? of the CDA tree. These leaves will be connected directly to their context.
Option 1 vs Option 2 By reference vs by value
Option 1 and option 2 are quite similar. In both cases you conduct the context as you are parsing and prior to persisting the data. The difference becomes important with large structures where there are many clinical statements (leaves) that all share similar context. In option 1, there is only one copy of the context SMIRF in the database that is pointed to (referred to) by many individual leaves. In option 2 each individual clinical statement (leave) is directly attached to it's own copy of context (by value). The result is that the size of the database is larger and denormalized more because there are multiple copies of the similar context. There is redundancy in the database because for each leaf (clinical statement) that has the same context as at least one other one, there is an ?extra copy? of that context. In option 1 this redundancy is eliminated, for any given context, there is only one copy in the database that is pointed to by multiple leaves. Option 2 is not anymore ?safe? than option 1, it is just cleaner to implement and has less redundancy.
Dr Bob Dolin has expressed the concern that some of the implementations are not ?safe? in that they do not preserve correct context in some of the cases where it is overridden. Although both the above technique can not guarantee complete safety, they can decrease the errors created by misinterpreting the Subject of Record and other key contextual pieces of information associated with any clinical statement.
The SMIRF option is preferable over the decomposition option because of simplicity and decreased redundancy in the database. Both option 1 and option 2 are equivalent as far as context interpretation is concerned. The actual context elements that are chosen to be included in a context SMIRF are a design decision. Since the most common type of query clinically is something along the lines of ?SELECT <<Some Observation or Procedure>> WHERE MRN = 1234567? we can address the most common safety issues simply. By assuring that the context SMIRF has the correct ?Subject of Record? and perhaps making sure it records negation indicators, and significant context such as limiting the Acts to EVN mood, we can avoid the most likely and most dangerous errors in interpreting the clinical statements. A ?personal history of?, a ?family history of? a ?negated? or an alternate ?subject of record? are probably the most significant types of context errors one can make while performing this common type of clinical query. (Note: ?Personal history of? and ?family history of? may also be addressed at the vocabulary level but this is another subject. )
By making sure the context SMIRFs at a minimum take these context items into account we can eliminate or very significantly reduce these dangerous types of context errors that might otherwise lead to clinically relevant mistakes in interpreting clinical statements.
The SMIRF technique needs to be implemented and tested but would likely reduce the problem of unsafe storage and querying of large nested RIM structures such as CDA documents.