ORM best practices
This page documents a number of best practices when it comes to persisting RIM-based objects, and *O-CP/RP cell transitions.
ER database
This section discusses some of the best practices for those applications that map RIM objects to an E-R database structure.
Note to readers: the reader is assumed to be aware of the characteristics of the object data model, the entity-relationship (ER) data model and the entity-attribute-value (EAV) data model.
- See ORM for references.
Specific References
Introduction
The purpose of this white paper is to describe a process for transforming the Health Level Seven (HL7) Reference Information Model (RIM) into a Logical Data Model (LDM) and a Physical Data Model (PDM). The HL7 RIM is a conceptual data model built and maintained by HL7 to be used as a source of the data content portion of HL7 standards. Because it is a conceptual data model, the RIM is not well suited to be used as the design of a relational database schema. The RIM contains abstract classes, complex datatypes, and generic business rules that need to be resolved and reconciled to form a LDM. LDM can be mapped to its PDM via Hibernate o-r mapping tool for the target databases MySQL and Oracle.
A series of refinement steps is required to transform the RIM into a LDM that would be useful as input to design of a database schema. The steps are designed to render a less abstract model with normalized data structures and realm specific business rules. The resulting LDM is well suited for use in database design activities. The LDM is technology independent and to some extent application independent. As application design begins the LDM will be used as input and further transformed into a PDM. The PDM will consider the technological implications of the database schema including items such as the capabilities of the target database management system (DBMS), tuned for a typical OLTP application with 300 concurrent end users logged in.
Scope
The classes and datatypes in the HL7 RIM are grouped into logical packages. There are six core packages in the RIM, other packages are used for administrative and model management purposes only. The Entities, Roles, and Acts packages contain the respective entity, role, and act generalization hierarchies and dependent classes. The Datatypes package contains all of the defined version 3.0 datatypes. The Structured Documents and Message Control packages contain classes that represent artifacts peculiar to HL7 messaging and clinical document architecture standards. For the purpose of our example model only a subset of classes of Acts, Datatype, Entities, and Roles packages will be considered within scope of the paper.
To keep this paper simple, let's select only the relevant attributes:
Resolving Generic Datatypes
These Data Types are included into the JavaSIG package: org.hl7.types.impl The relative java wrappers (WrappingBagAccessor, WrappingSetAccessor, WrappingListAccessor) are included into the package: org.hl7.hibernate
“IVL”, “EIVL” or “PIVL” Datatype
- Entity.existence_time : IVL <TS>
- Entity.existence_start_time : TS
- Entity/existence_end_time : TS
- Manufactured_material.stability_time : IVL <TS>
- Manufactured_material.stability_time_start : TS
- Manufactured_material.stability_time_stop : TS
- Act.repeat_number : IVL<TS>
- Act.repeat_number_start : TS
- Act.repeat_number_stop : TS
- Supply.expected_use_time : IVL<TS>
- Supply.expected_use_time_start : TS
- Supply.expected_use_time_stop : TS
- SubstanceAdmnistration.rate_qty: IVL<PQ>
- SubstanceAdmnistration.rate_qty_start : PQ
- SubstanceAdmnistration.rate_qty_stop : PQ
- SubstanceAdmnistration.dose_qty: IVL<PQ>
- SubstanceAdmnistration.dose_qty_start : PQ
- SubstanceAdmnistration.dose_qty_stop : PQ
- Account.allowed_balance_qty: IVL<MO>
- Account.allowed_balance_qty_start: MO
- Account.allowed_balance_qty_stop: MO
- RoleLink.effective_time: IVL<TS>
- RoleLink.effective_time_start: TS
- RoleLink.effective_time_stop: TS
- Role.effective_time: IVL<TS>
- Role.effective_time_start: TS
- Role.effective_time_stop: TS
- Participation.time:IVL<TS>
- Participation.time_start: TS
- Participation.time_stop: TS
“SET”, “DSET”, “QSET” or “List” Datatypes
- Entity.id : SET<II> ==> Enity_Identifier.
- Entity.status_cd : SET<CS> ==> Entity.status_cd : CS
- Person.ethnic_group_cd : SET<CE> ==> Person.ethnic_group_cd : CE
- Person.race_cd : SET<CE> ==> Person_Race.
- Entity.qty : SET<PQ> ==> Entity.qty PQ
- Person.religious_affiliation : SET<CE> ==> Person.religious_affiliation : CE
- Act.id: SET<II> ==> Act_Identifier
- Act.priority: SET<CE> ==> Act_Priority
- Act.confidentiality_cd: SET<CE> ==> Act_Confidentiality
- Act.reason_cd: SET<CE> ==> Act_Reason
- PatientEncounter.special_courtesies_cd: SET<CE> ==> PatientEncounter_Courtesies
- PatientEncounter.special_arrangement_cd: SET<CE> ==> PatientEncounter_Arrangement
- Procedure.method_cd: SET<CE> ==> Procedure_method
- Procedure.approach_site_cd: SET<CE> ==> Procedure_approach_site
- Procedure.target_site_cd: SET<CE> ==> Procedure_target_site
- Observation.interpretation_cd: SET<CE> ==> Observation_interpretation
- Observation.method_cd: SET<CE> ==> Observation_method
- Observation.target_site_cd: SET<CD> ==> Observation_target
- SubstanceAdministration.approach_site_cd: SET<CD> ==> SubstanceAdministration_approach_site
- SubstanceAdministration.dose_check_qty: SET<RTO> ==> SubstanceAdministration_dose_check
- SubstanceAdministration.max_dose_qty: SET<RTO> ==> SubstanceAdministration_max_dose
- SubstanceAdministration.method_cd: SET<CD> ==> SubstanceAdministration_method
- InvoiceElement.modifier_cd: SET<CE> ==> InvoiceElement_modifier
- DeviceTask.parameter_value: LIST<ANY> ==> DeviceTask_parameters
- Role.id: SET<II> ==> Role_identifier
- Role.confidentiality_cd: SET<CE> ==> Role_confidentiality
- Role.position_number: LIST<INT> ==> Role_position
- ManagedParticipation.id: SET<II> ==> ManagedParticipation_identifier
- Document.bibliographic_designation_txt: SET<ED> ==> Document_Designation
Note: SET<CS> type for Entity.status_cd has been simplified to a single CS value (except Person.race_cd).
“BAG” Datatypes
- Entity.nm : BAG<EN> ==> Enity_Name.
- Entity.telecom : BAG<TEL> ==> Entity_Telecom_Address.
- Organization.addr : BAG<AD> ==> Organization_Address.
- Person.addr : BAG<AD> ==> Person_Address.
- Role.name BAG<EN> ==> Role_name
- Role.addr: BAG<AD> ==> Role_address
Add Association Attributes
- Entity_identifier./identified_entity : ENTITY
- Entity_name./named_entity : ENTITY
- Entity_telcom_address./addressed_entity : ENTITY
- Organization_address./addressed_organization : ORGANIZATION
- Person_address./addressed_person : PERSON
- Person_race./attributed_person : PERSON
- Act_reason.act: Act
- Act_Confidentiality: Act
- Act_Priority: Act
- Act_identifier: Act
- PatientEncounter_Courtesies.pencounter: PatientEncounter
- PatientEncounter_Arragement.pencounter: PatientEncounter
- Procedure_method.procedure: Procedure
- Procedure_approach_site.procedure: Procedure
- Procedure_target_site.procedure: Procedure
- Observation_interpretation.observation: Observation
- Observation_target.observation: Observation
- Observation_method.observation: Observation
- SubstanceAdministration_approach_site.sub_admin: SubstanceAdministration
- SubstanceAdministration_dose_check.sub_admin: SubstanceAdministration
- SubstanceAdministration_method.sub_admin: SubstanceAdministration
- SubstanceAdministration_max_dose.sub_admin: SubstanceAdministration
- Role_identifier. identified_role : Role
- Role_confidentiality.confidentiality_role : Role
- Role_position. positioned_role : Role
- Role_name.named_role: Role
- Role_address.addressed_role: Role
- Role_telcom.addressed_role: Role
- ManagedParticipation.identified_mparticipation: ManagedParticipation
- Document_Designation.document: Document
Delete Generalization Relationships
- Entity § Living_subject
- Entity § Organization
- Entity § Place
- Entity § Material
- Entity § Group
- Living_subject § Person
- Living_subject § Non_Person_Living_subject
- Material § Manufactured_material
- Manufactured_material § Container
- Manufactured_material § Device
- Device § Imaging_modality
- Act § PatientEncounter
- Act § ControlAct
- Act § Supply
- Act § WorkingList
- Act § Procedure
- Act § Observation
- Act § DeviceTask
- Act § SubstanceAdministration
- Act § FinancialContract
- Act § Account
- Act § FinancialTransaction
- Act § InvoiceElement
- Diet § Supply
- Observation § PublicHealthCase
- Observation § DiagnosticImage
- Role § Employee
- Role § Access
- Role § QualifiedEntity
- Role § LicensedEntity
- Role § Patient
- Participation § ManagedParticipation
- ContextStructure § Document
Propagate Inherited Attributes
Propagate Inherited Associations
Simplify datatypes
- Person.nm: EN ==> PN
- Organization.nm: EN ==> ON
- All attributes of type CE ==> CV
Datatype expansion
All classes with composed attributes are expanded in the composing parts. The types interested in this process are PN, ON, EN, AD, TEL, II, PQ.
EN
Every attribute of type EN is expanded as the following schema.
This schema models the compositional nature of the attribute EN expanding in a container relationship.
Tables that have a EN attribute, create a table of name tablename_Name_Part table then this table refers the Entity_Name table.
ON and EN
ON (Organization Name) and PN (Person Name) are specializations of EN that doesn’t add any attribute, so they are expanded in the same way as EN.
AD
AD represents the concept of address and defines it in terms of parts and subparts.
It is expanded like in the in schema:
The Person entity then refer the Person_Person_Address.
TEL
The TEL datatypes represents a set of telephone reference, so it is modeled as a multipart relation that expands in a schema like the following, in this case is showed the relation applied to an Organization entity:
II
All attributes of type II are expanded in the three compositing sub attributes:
- root: UID
- extension: ST
- authority: ST
PQ
This type of attribute specify a physical quantity, and has many sub attributes as shows in the following schema:
Every table with a PQ attribute has reference to the table PhysicalQuantity.
Datatype Promotion
In this phase the CV (Coded Value) and CS (Coded Simple Value) data types are expanded in the corresponding UML class schema, so the all attributes of this types implicitly refers to this inserted element.
LDM-2-PDM
Given a clean LDM, this guide assumes that Hibernate is used to generate the PDM. Hibernate offers 3 different O-R strategies.
We have chosen:
- table per hierarchy for the 6 foundation classes (and relative sub-classes) and code values: Act, Entity, Role,Participation,ActRelationship & CodeValue.
- table per concrete class for all other related classes
- There is a group of 5 tables dedicated to Vocabolaries and DataTypes
if you are going to use JavaSIG to persist RIM data into the PDM mind that you will have to adapt the wrappers.
PDM 4 MySQL
DATATYPE Mapping
- REAL = DECIMANL(9,3)
- ST = VARCHAR(5000)
- UID = INT(11)
- ED = VARCHAR(5000)
- GTS = DATETIME
- TS = TIMESTAMP
- BL = BIT(1)
- INT = INT(11)
PDM 4 ORACLE 11gR1
DATATYPE Mapping
- REAL = NUMBER
- ST = VARCHAR(5000)
- UID = NUMBER
- ED = VARCHAR(5000)
- GTS = DATE
- TS = DATE
- BL = BOOLEAN
- INT = NUMBER
WARNINGS
- Object names cannot be longer than 30 characters, thus some table and index names have to be shorten.
- "auto increment" capabilities are implemented via the creation of SEQUENCES, incremented by PRE-INSERT db triggers.
- Mind that SORTing clause is managed differently by the 2 DB technologies:
SELECT * FROM table ORDER BY col1 is allowed in MySQL but forbidden in Oracle, which asks for a more precise statement: SELECT col1, col2 FROM table ORDER BY col1
- MySQL is CASE INSENSITIVE while Oracle is CASE SENSITIVE, that means re-writing in Oracle the WHERE clause and ORDER BY by making use of UPPER() function in order to make the indexed search retrieve data stored in mixed case.
- set properly the characterset
CHARACTER SET AL32UTF8; NATIONAL CHARACTER SET AL16UTF16;
ORACLE ENVIRONMENT
Oracle environment sul server Linux: EDITOR=vi export EDITOR ORACLE_SID=phidb export ORACLE_SID ORACLE_BASE=/oracle/app export ORACLE_BASE ORACLE_HOME=$ORACLE_BASE/product/11.1.0/db_1 export ORACLE_HOME ORA_CRS_HOME=$ORACLE_BASE/product/11.1.0/crs_1 export ORA_CRS_HOME LD_LIBRARY_PATH=$ORACLE_HOME/lib export LD_LIBRARY_PATH PATH=$ORACLE_BASE/admin/rman/bin:$ORACLE_HOME/bin: $ORA_CRS_HOME/bin:/bin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/X11R6/bin:. export PATH umask 022
Place all DATAFILES in $ORACLE_BASE/oradata/phidb Place ".bash_profile" and ".oracle_profile" into /home/oracle
# .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/bin export PATH if [ -f ~/.oracle_profile ]; then . ~/.oracle_profile fi
.oracle_profile PS1="`/bin/hostname `-> " export PS1 EDITOR=vi export EDITOR ORACLE_SID=phidb export ORACLE_SID ORACLE_BASE=/oracle/app export ORACLE_BASE ORACLE_HOME=$ORACLE_BASE/product/11.1.0/db_1 export ORACLE_HOME ORA_CRS_HOME=$ORACLE_BASE/product/11.1.0/crs_1 export ORA_CRS_HOME LD_LIBRARY_PATH=$ORACLE_HOME/lib export LD_LIBRARY_PATH PATH=$ORACLE_BASE/admin/rman/bin:$ORACLE_HOME/bin:$ORA_CRS_HOME/bin:/bin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/X11R6/bin:. export PATH umask 022
$ORACLE_BASE/admin has the following structure: phidb |-- adump |-- ddl | |-- arrdb.sql | |-- dropall.sql | |-- jbpm.sql | |-- mkall.log | |-- mkall.sql | |-- mktablespaces.sql | |-- mkusers.sql | |-- qrtz.sql | `-- rimpdm2.sql |-- dpdump | `-- dp.log |-- pfile | |-- crspfile.sql | `-- init.ora `-- scripts |-- CreateDB.sql |-- CreateDBCatalog.sql |-- CreateDBFiles.sql |-- JServer.sql |-- apex.sql |-- context.sql |-- cwmlite.sql |-- emRepository.sql | -- init.ora |-- interMedia.sql |-- lockAccount.sql |-- ordinst.sql |-- owb.sql |-- phidb.sh |-- phidb.sql |-- postDBCreation.sql |-- spatial.sql |-- ultraSearch.sql |-- ultraSearchCfg.sql `-- xdb_protocol.sql
XML database
This section discusses some of the best practices for those applications that map RIM objects to a XML database.
Note to readers: the reader is assumed to have basic knowledge of XML Databases.
- It has the advantage that full fidelity (of the XML) is maintained. The use of an XML database makes sense if one only has to support CDA, in which case the XML version has to persisted, even if the data contained therein is stored in a structured fashion.
- It has the advantage that one doesn't need any joins (depending on how big of an XML snippet one persists, could be the entire message/CDA message type, or it could just be a series of derived XML snippets, e.g. the CDA part which identifies the Patient).
- It has the disadvantage that XQuery is a standard unfamiliar to most, one implementation had to export data to ER-based Datamarts - these Datamarts where queried using SQL to do statistical/clinical analysis of the data.
Best practices:
- If one received data according to different Message Types, the clone names used for classes will be different. To use XQuery, one has to transform the XML (as received) into an alternative representation which is based on names of RIM classes. This allows for XQuerying across all data received, regardless of the clone names used in any particular model. If persistence isn't limited to one single message type (e.g. the CDA R-MIM for some applications) the use of the RIM ITS should be seen as a necessity.