This wiki has undergone a migration to Confluence found Here
<meta name="googlebot" content="noindex">

Implementation aspects of RIM based database models

From HL7Wiki
Revision as of 19:01, 13 August 2007 by Rene spronk (talk | contribs)
Jump to navigation Jump to search

Although the RIM wasn't designed to be used as such there are those that opt to re-use RIM-based-models as a basis for their relational database. Those implementations seem to either take an approach whereby they use all of the RIM classes as the basis for their database, or they use D-MIM structures (from a set of domains) as the basic model for their database. This page discusses some of the implementation aspects when one chooses to do so.

Also see this IBM paper on the use of RIM databases.

RIM vs D-MIM approach

Should one use a database based on RIM classes, or one based on D-MIMs?

RIM

From a software architecture perspective, the key difference between a pure RIM-based approach and a D-MIM-based approach is reuse. A D-MIM is made up of nothing more than RIM components meaning that one can safely assume that a complete RIM implementation will cover any possible D-MIM. Not so the other way around. So, in the large picture, a RIM approach is more likely to address what the next project needs whereas a D-MIM approach probably will not.

The RIM-based model has virtually unlimited flexibility. One can say: "If you define a valid R-MIM, I can store and query it in my RIM-based database." In other words, if you crack the RIM problem, you're done.

(AMS??): I have used both the RIM and DMIM as a starting place for design of relational databases. The RIM offers greater flexibility to persist data from a variety of domains but at a cost of a very abstract structure. D-MIMs provide less flexibility but because they are more domain specific they are a better fit for their limited domain space. The primary thing is that neither the RIM nor DMIM should be used as-is for a relational design. Consideration has to be given to performance considerations, coding complexity, and functional requirements.

(??:) Based on my experiments with an OLTP performance objective of "sub-second" for queries, a normalized RIM-based data model does quite well. Now, I must point out that coming up with that data model requires some serious experience and a deep understanding of the RIM. It's not just a transcription of the RIM into an ER diagram. But then again, a D-MIM approach doesn't mitigate this issue. In either approach, you'll need to face cardinality, datatype, and many other challenges. More recently, I decided to up the ante and put my performance requirements at 20-50ms for queries (above 100ms is now considered a bug). This level is needed for more advanced application requirements (basically more data in less time). At this level, a pure relational RIM begins to break down. But I did not jump to the D-MIM approach because that would be no better, just more work.

(Lloyd:) I think the fundamental question is what you want to do with it. If you're looking at running a production, 1000+ transactions/minute system, using a relational database structured around RIM tables seems like suicide. However, if you're creating a datamart on the side for ad-hoc queries, mining and exploration of all sorts of data from all sorts of different sources, then a RIM structure might give you power and flexibility to compare and analyze across domains that a D-MIM structure wouldn't.

(Yeb:) We've been using a using a RIM-variant based model for three years now for 3 clinical applications with over 75M records.

    • Advantage: less tables, so less programming work, easier to learn, easier to add new functionality.
    • Disadvantange: more data in each table, so the moment at which the the database administrator must tune queries and tables and perhaps introduce a database cluster will come sooner (but in a succesfull application this moment will come eventually).

D-MIM

The D-MIM approach is "focused". That means that if you crack a particular CMET, you still have dozens more to go. The D-MIM approach encourages the status quo of needing to figure out how to get different applications, databases and schemas to talk to one another. The "divide and conquer" approach that worked so well at the functional design level creates an implementation nightmare (and there you have the state of healthcare IT today).

If you consider abstraction a bad thing, a D-MIM approach is closer to what you seek. If you consider abstraction a good thing, a RIM-based approach probably the choice for you. In any case, you will need to tune the model. If you think tuning a few tables is good, go for a RIM-based design. If you think tuning a lot of tables is good, a D-MIM-based approach is going to appeal.

But what about a single, focused project (if there were such a thing)? One might assume that a D-MIM uses a "subset" of the RIM. I don't agree. Most D-MIMs touch pretty much all of the RIM from a structural perspective. So I don't think there is any such thing as "limiting" an implementation to a D-MIM. Even a single D-MIM will reuse the same RIM object over and over. Starting with a D-MIM design will ensure a lifetime of "cut-and-paste" programming, making it hard to achieve a maintainable, durable design.

Think of a D-MIM or R-MIM as a relational "view" on the underlying RIM structures. This approach is more in line with the V3 philosophy: Complex elements are worked out once and reused as needed. For example, how many times would you want to have to figure out the best way to store and access a physical quantity (PQ) or entity name (EN) or to relate two acts via an act relationship?

Despite the particular design issues faced when using the RIM and DMIM for the bases of a relational database design, I feel that the benefits derived from the thoroughness of the requirements analysis that goes into a DMIM and the RIM are invaluable.

Data type aspects

At a lower level, the devil is in datatypes and cardinality. It is important to understand the nature of modern database technology, application servers, and how to tell the difference between marketing hype and what actually works. In other words, theory needs to be backed up my measurements.

Let's take the performance angle. Years ago, it was popular to arrange the physical layout of a database to match specific application needs. Of course that limited the usefulness of that data leading to multiple copies of the same data in different databases. The database vendors have worked hard to allow a "third-normal-form" schema design to work acceptably fast. And to a large degree, they've done that.

The problem is that the RIM (ie healthcare) really pushes that point and no more so than in the datatypes. Amazingly, at the datatype level, there's virtually no difference between a RIM and a D-MIM approach. For example, let's say 9 out of 10 R-MIMs avoid the particularly complex CD datatype by using CE instead. You're still faced with implementing a CD datatype to cover the one place it is needed. That means that such a system needs to support both CE and CD datatypes - not a simplification at all! Personally, I'd rather spend twice the time to make CD work fast and efficient than half the time on two separate datatypes.

Cardinality is a smaller issue compared to datatypes but interesting anyway: As you know, R-MIMs have 1:1 relationships that could just as easily be implemented in a single record - what I would call an "optimization". Tempting but, is it worth it? Well, to know, one must be able to measure both ways. Guesswork is really not that helpful in this arena.

Enforcement of Business Rules

The biggest issue I have had has been the enforcement of business rules. A relational database structure can provide fairly good enforcement of data integrity rules. The abstractness of the RIM makes enforcement of data integrity rules very difficult making it necessary to rely on stored procedures and application logic to enforce rules. This problem is reduced by using D-MIMs but not eliminated. The biggest issues faced when using D-MIMs is dealing with how to implement choice boxes and CMETs. I have found that CMETs often included baggage not relevant to the particular DMIM and when more than one CMET is in a DMIM there is often overlap in content. This makes it necessary to be very careful in doing the analysis of the DMIM in preparation for design of the database. Choice boxes in D-MIMS are not much different from the inheritance structures in the RIM. In both cases design considerations need to be made regarding the enforcement of referential integrity.