Database with native ISO datatypes

From HL7Wiki
Jump to navigation Jump to search

Summary

The ISO (or: Datatypes R2) data types can be implemented as 'native datatypes' at the database level. This may be an attractive option for RIMBAA applications.

  • We're not talking about "simple" UDT's. They can be declared using SQL and are based on the standard SQL types. E.g. you could define a TelephoneNumber-type that is a "VARCHAR(30) NOT NULL". We're talking about CLR User Defined Types (see [1]). This is also possible in some other database, e.g. Postgres with C++.

Levels

Level 1: dumb, II table
Level 2: UDT without functionality
Level 3: UDT with functionality, e.g. CD with SNOMED, II with OID check, or PQ with UCUM

You could have a Level 1 database and have a Level 3 Object Model, for example. It would be nice to add this to the Technology Matrix pages. Level 3 would be very interesting if you plan to do research on the clinical data. But if you just use the database to store forms (DCM's), than maybe Level 1 of Level 2 is sufficient.

Discussion

  • It would be great to have an Ad Hoc query and don't bother with the details of e.g. a PQ.
  • Advantage: can express constraints at the database level
  • Performance is an issue.
    • SET operations will be much faster(optimized indexing)
  • There might be a problem with current ORM implementations witch cannot currently handle UDT's.
    • Need to change serialization format used by Hibernate, instead use UDT datatype literal format.

MySQL/MariaDB

  • Very similar experience to others with trying to do a native data type model with fidelity.
  • Attempted by using different tables linked by an internal identifier to implement polymorphism
    • One column contained the name of the table that held the "variable" part of the data element's content
    • One column contained a unique identifier (in this case, used an unsigned BIGINT, so things would have broken with the 18,446,744,073,709,551,615ᵗʰ entry)
    • Needed to write queries to use a local variable to hold the name of the table
    • Lots and lots of joins-->performance was poor, and queries were torture to write, even worse to read.
  • Alternatives (not tested)
    • MariaDB supports Cassandra tables
    • CONNECT engine
    • Use dynamic columns (from MariaDB documentation)
      • Dynamic columns allow one to store different sets of columns for each row in a table. It works by storing a set of columns in a blob and having a small set of functions to manipulate it.
      • Dynamic columns should be used when it is not possible to use regular columns.
      • A typical use case is when one needs to store items that may have many different attributes (like size, color, weight, etc), and the set of possible attributes is very large and/or unknown in advance. In that case, attributes can be put into dynamic columns.

SQLServer implementation

  • Grahame tried to implement UDT's in SQL-Server, but he did not like it because
    • no support for inheritence/substituability
    • the data is stored serialized as strings. --Michael 12:38, 14 February 2009 (UTC) Is this really true? I serialized to a binary stream.
    • very difficult to address nested content (lists/sets)
    • Dennis Nguyen: I look at CLR User defined types as suggested, and attempt to do a prototype with IsoANY and IsoBL. Unfortunately, the result doesn't look very good. The biggest hurdle is that SQL server doesn't support polymorphism. In SQL server 2005, I created a table with two columns, IsoANY and IsoBL. But when insert an IsoBL value into IsoANY, it either causes an error (when using Transact-SQL), or remove the boolean value of IsoBL (via table view in SQL server management studio), effectively turn it into a basic IsoANY. As I've been told, this can be a significant problem for ISO datatypes, as IsoANY and IsoQTY is used as place holder for a lot of thing.
  • Ravi Luthra (Axolotl): They are not really native in the DB just have hooks to the datatype's definition, and SQL Server and .NETs data access layer provides a facade at runtime. It's the same level of abstraction you already get with Hibernate (or NHibernate if you are using .Net), just a form of ORM implemented in the wrong place - because it would only be supported by one DB rather than all.

PostgreSQL implementation

  • Yeb Havinga and Willem Dijkstra implemented UDT's in the PostgreSQL database management system, after a feasibility test with a few types in september '09 that turned out positive.
  • As of yet we did not perform comparative tests, though we expect the PostgreSQL implementation to outperform implementations in the application layer.
  • They've studied the use of Hibernate as the ORM on top of the 'database with Datatypes R2 UDTs'. For the object representation itself the Java SIG code is used. In order to use the datatype definitions it is sufficient to create a Hibernate mapping which maps the javasig data types to our Postgres literal representation. This doesn't just involve changing the hibernate configuration, one also has to develop a mapping in Java.
  1. (From the RIMBAA minutes of the 20091027 meeting) Keynote: Native support for Datatypes R2 as user defined database types (Willem Dijkstra, MGRID, The Netherlands). See http://www.ringholm.de/persist/20091027_RIMBAA_MGRID_HL7v3_Datatypes.pdf for slides.
    • Background: see Database_with_native_ISO_datatypes. Yeb Havinga (not present today due to illness) and his colleagues have implemented (almost) all of Datatypes R2 in Postgres. Most of it created in C.
    • Willem: Fast database access, ability for database to reason about content. Could use ORM instead (e.g. Hibernate) - could lead to full tablescans (i.e. scaling and performance issues).
    • Davide: Hibernate suggests various approaches to ORM, he's now using table per hierarchy. It all depends on your requirements what the best approach is. ORM performs well.
    • Willem: what we're proposing is to include datatypes in database, and use ORM of top of that. What you get is native support for HL7 datatypes in SQL.
      • We put a lot a time in implementation of CV (allows for SNOMED support, subsumption queries on SNOMED codes). With support for versioning of 'concept domains'.
      • Shows examples of use (in SQL) of v3 native datatypes. Full support for PQ & UCUM. Supports intervals and a 'contains' operator for intervals. The support for units is done in such a way that they can be easily converted, and that sameness as well as literal equality can be tested for. For interval datatypes they introduced a new operator in SQL to test if an item of data, opr an interval of data, is contained within another (set of) interval of data. PQ (because of the units) and IVL<T> are especially nice, those are relatively hard to accomplish using standard SQL.
      • Support for datatype flavors as a means to validate user input. Has additional constraints on top of the standard R2 datatypes.
      • Support for NullFlavors. Logic with NullFlavors has lead to lots of questions about the Datatypes R2 specification and e-mail exchanges with Grahame Grieve, the main editor of the HL7 datatypes specification.
    • Henk: tried to use different approaches to storing physical quantities. (slide 19) Using a combination of the JavaSIG materials on top of the Postgress UDTs is easy.
      • All UDTs are indexed, did a test, performance gain [during his simple benchmark testing] because of indexing vs non-indexed tables (all UDTs are indexed), 1:100. Probably more dramatic differences in really big data sets. Lots of gains expected because of logic on time intervals.
      • Literal form = textual string version of a ISO R2 dataype as defined in the Abstract datatypes spec.
    • Question for the attendees: is the use of UDTs (with ORM on top of it) a good idea? Davide: yes. Bertil: would be nice to have a standard SQL definition for use of HL7 datatypes. Could attempt to standardize SQL across the board. Hans: performance gain is nice, but as a programmer I don't want to deal with the database level, usink Link. Andy: nail being hit squarily on the head - PQ and IVL are the important bits when doing RIMBAA implementations. Ewout: if it's there I'd use this - but I don't use it enough to merit to create it myself.
    • Other databases?
      • Andy: tried this in SQL server? Others: Grahame tried this, SQL server doesn't have all features required to define all datatypes. Something like PQ is possible in SQL server though. Ewout: requires deserializing in SQL server.
      • Bertil: tried this in Oracle? They have a fairly advanced datatype system. Andy: HTB doesn't use this, they have three columns to represent GTS.
  • 2010-03-17: Whitepaper on their implementation: [2]
  • I'm interested to know, as one of the authors of a database-R2 datatype implementation with SNOMED support - http://arxiv.org/abs/1003.3370, where I believe we have solved the most common SNOMED classification problem, the 'implies' relation. We can perform 'implies' classifications with log(n) time complexity, where n is the size of the relation being queried: I do not have a mathematical proof available, but I have a strong suspicion it is theoretically impossible to classify faster than this, and it is so fast that interactive setups/user interfaces can perform querying / constraint checking of expressions containing SNOMED terms. It also neatly integrates with a a RIM database, query for observations of any kind of Diabetes mellitus is SELECT * FROM observation WHERE code << '73211009|Diabetes Mellitus'::cv('SNOMED-CT');

NCI

  • John Koisch (NCI): For what it is worth, the NCI's adoption of the ISO datatypes has been focused on them as a wire format standard with implications about persistence, and not as a persistence model per se. That is, our service interfaces expose ISO datatypes with the contractual understanding that what ultimately resides in the persistence layer MUST be able to map to those. We arrived at this strategy in part because we encountered issues as well with modeling the ISO's in RDBMS. Depending on this group, we may modify that strategy in certain circumstances, but we are just as comfortable having the ISO's being the wire format of choice as we are having it be a feature of our persistence layers.
  • [15/09/2008 21:24:44] Grahame Grieve says: NCI is also working on this. They are struggling with the recursion of translations. Recursion is something they otherwise work hard to keep out of their persistent stores. There's recognised patterns for handling that, so I don't have much to say about that. NCI have also struggled with how to do nullFlavor. This is because they don't want to do V3 as a primary construct, rather than inherently because of persistence issues. So I would think this is less of a problem here.

Skype chat

(extract to be made for relevant content)

  • [15/09/2008 22:48:22] Grahame Grieve says: Javasig have their idiosyncratic data types working with hibernate. I don't know what trickery was required, but it seems as if there was some required
  • [15/09/2008 22:48:44] Grahame Grieve says: the ISO data types would map directly to hibernate - they are normal UML class models with attribute
  • [16/09/2008 03:05:52] Michael van der Zel says: Yes, but some datatypes have logic behind them. For example de PQ with ucum. It "understands" that 1g is the same as 1000mg.
  • [16/09/2008 03:06:15] Grahame Grieve says: this has nothing to do with persistence - that's just an operation you add
  • [16/09/2008 03:07:15] Michael van der Zel says: Exactly. This discussion is not only about persisting, but in general about v3 (ISO) datatypes in a database.
  • [16/09/2008 03:07:54] Grahame Grieve says: well, the database won't generally understand the semantics like PQ.canonical unless you specifically choose to persist them.
  • [16/09/2008 03:08:21] Grahame Grieve says: I can't speak for javaSIG, but the ISO data types won't persist PQ.canonical unless you make special arrangments
  • [16/09/2008 03:08:48] Grahame Grieve says: unless you do, you can't use canonical values in sql joins
  • [16/09/2008 18:55:05] Peter Hendler says: Gunther has hand written most of our datatypes and has hand mapped them with Hibernate.