This wiki has undergone a migration to Confluence found Here

Rationale for datatypes requirements

From HL7Wiki
Revision as of 09:56, 14 March 2007 by MatthewStephens (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

ED I have been looking at the datatypes schema, I am now less clear on the intended use of ED. While ED can contain mixed content -- there is no provision in the schema for it to contain a flavour of "any" - whereas I had assumed that if the content of an ED was well-formed XML it could be included without having to escape the mark-up. Indeed this is what would be required to support processing of inline xml signatures, which are described in the schema comments as appropriate content.

What have I missed? [Charlie McCay, 20020702]

 > Gunther > > > > An unqualified xsd:any after the other elements gives an > ambiguous content > > model (no-no for xsd) -- xsd:any with an > namespace="##other" works -- but > > rules out hl7 defined content -- which is probably not what we want. > > > STOP! Charlie, isn't this precisely what we want? I mean, in a message > we are in a namespace that's specific for that one message, right? So, > then we can put any other namespace inside and then this > ##other whould > be *precisely* what we want. A good example would be a CDA document, > a DICOM XML thingie, an HL7 v2.x thingie or even another HL7 > v3 snippet. > All can, and most likely will anyway, live in it's separate namespace! > > For the signatures, we can easlily put those in a different namespace > too. > > THANK YOU for pointing out the perfect solution (even if you didn't > believe it was so good :-). > > > > So I would go for an unqualified xsd:any with reference and > thumbnail > > defined as global elements in the HL7 namespace with > processing rules to say > > that they must be the first children of an ED. Not pretty > but could be > > worse: > > > an acceptable fallback > > > > The other options are: > > 1 the element that you understandably do not want -- > ST to ED would > > be ugly -- but this is second choice > > 2 CDATA - lose lots of ease of use -- very ugly > > 2 get rid of the child elements: put the reference info in > as attributes > > (but this breaks the TEL datatype), and the thumbnail as an xref to > > somewhere else in the message - don't know where. This > would be REALLY > > ugly, but included for completeness. > > > very good differential therapy approach: always have at least 5 > alternatives on hand to choose from :-). [Gunther Schadow, 20020703]

The sample messages and schemas that Paul has produced show a single hl7 namespace with everything in it. This shows that (subject to resolution of the ED content issue) we can do it that way, and now must choose whether to have one big namespace, one per message or per dmim (or both), maybe one for datatypes etc.

I have done a good deal of XSLT against the ballot1 messages - and the need to use local-name() all over the place makes the code ugly, and is a price paid for writing stylesheets that work across multiple namespaces. One namespace per message will improve this - but only a bit. It may in a similar vein restrict the options for the representation of templates and conformance profiles

Having a message in its own namespace may help the processor to identify the schema and would allow folk to say which messages are permitted in ##any sections. However if the root element of the message identifies it uniquely within the HL7 namespace, then this could be used without using ##any at all.


I know that this choice was one of the reasons that Paul has been repeating his request for us to say what processing we believe should be supported by the messages, so that a rational decision can be made on this.

Since we do not appear to need them to avoid name collisions we should have a very clear rationale for introducing segmentation of the hl7 space. [Charlie McCay, 20020703]


II In our efforts in the Conformance SIG, specifically with trying to define a unique identifier for HL7 V2.x conformance profiles, we have come up with the question of using UUIDs rather than OIDs. There are a couple of reasons for this:

1. UUIDs are easy to generate; there are many programs that can generate these making it easier to create these as unique identifiers for message profile developers. 2. There is the issue of the 'politics' for determining the 'official' OID for a particular concept (e.g. Alberta Provincial Health Number). There is a desire by some jurisdictions to see 'their' identifiers registered under 'their' hierarchy. This substantially increases the wait time/discussion time needed to get an identifier for a particular concept. 3. Registries such as ebXML (UDDI?) use UUIDs as registry object identifiers already; having the ability to use UUIDs over OID, for example, in registering conformance profiles, would align us with registry efforts like ebXML. 4. The Templates group is also looking at using UUIDs to uniquely identify templates and for registering them in a registry. 5. The HL7 HD (and EI) datatype already allow for GUIDs [John Silva, 20020906]

I have recently spent too much time on a "which OID tree should an identifier scheme be named under" discussion, and so would support a move to allow for the use of GUIDs/UUIDs in their place -- although I see that you are here limiting your proposal to using UUIDs rather than OIDs in the new area of registering conformance profiles / templates.

As an aside - what is the distinction that you are implicitly making between GUIDs and UUIDs -- are they the same? do we need only one or the other? or should UUIDs as well as GUIDs be allowed in the datatypes. Most of the references that I have found appear to treat them as interchangable. [Charlie McCay, 20020909]

Do we know *why* HL7 has chosen the OID root as preferable to the GUID/UUID route? (I believe GUID and UUID are interchangeable, with GUIDs being the Windows implementation of the generic UUID concept.) The only rationale I can come up with is the extremely remote possibility of uniqueness between two independently generated GUIDs. However, seeing as GUIDs are generated by pre-coded libraries, while OIDs must be created using custom code (and human-designed hierarchies), I would expect that the real-world chances of duplicates would be on the higher end. [Lloyd Mckenzie, 20020909]

I do not know - I recall a number of long conversations with Gunther and Mark Shafarman when we proposed allowing G/UUIDs in the II datatype. I suspect that the answer is the weight of history, and the fact that the UUID algorithms were not as well trusted in the past as they are now. I would agree that UUIDs are a lower risk identifier [Charlie McCay, 20020910]

There is a strong proposal being pushed forwards (that Paul Biron, David Markwell, Bob Dolin and I are supporting) that there should be one or two DSTU subtypes of II that allow you to specify that the II should be treated as a reference or as a target. The proposal is to add a datatypes component that extends II. The component would be of type CS, and would indicate use, with a default value of not specified.

Lloyd and Gunther are not against this suggestion.

This would not prevent the rest of the document moving forwards to normative status -- but would also allow us to deal with a number of the referencing issues. [Charlie McCay, 20040123]

New subelement of data type II proposed to meet major issues in the NPfIT.

idtype

Values

"ref" = Reference to a information identifier subtypes may indicate scope (e.g. message, community or global)

"target" = Id of either a real world instance or a data instance "real" = Real world identifier "data"= Identifies the specific data instance [David Markwell, 20040709] Should be camelCase -- ie idType

I will sleep on the "should real be a child of ref -- ie since only idType="data" identifiers can be pointed to, -- so I would suggest

ref -- real -- dataRef


message


community


global

target -- data


Lloyd's suggestion in one corridor chat was to have a flag that indicated "More info available". This has a similar usability to a reference, without the implicit instruction as to how the information should be used. This is closer to the "ActRef" classCode approach -- and does not involve breaking the datatypes -- just adding something to infrastructureRoot :)

Values would be:

Final (no more info anywhere -- I am giving you all that there is to know -- used to assign information item identifiers) Ref - there is more info -- abstract entry (ie should not be used) -- real - loads more info -- but leave the computer and look in the real world -- dataRef - more info somewhere in cyberspace


message - more info in this message


MessageSpec - source of further info defined by HL7 (or affiliate) in normaive document (eg with query message)


Community - look in the implementation profile (see message.ProfileId) for where to find more info


global - well-known identifier scheme, message designer / sender not providing any further guidence as to how to dereference

Default value is "No Information" -- ie there may be more info somewhere but this does not need to be communicated. In practice this may be acceptable for when information instance identifiers are used -- ie there is not a requirement to send "Final"

Attribute name: Completeness

Name issue -- children of dataRef do not work well -- eg Completeness="message" does not read well

Note that this works on the class, not the identifier -- which is where the problem of identifier scope comes in -- this can be a property of the class iff all identifiers have the same scope -- but for a lab test or a patient role the RealWorldIdentifer and the Information Item Identifier may have different identity criteria -- ie you may have a different information item identifier with the same real world identifier.

For the NHS usecase the class level is probably fine. This is probably worth considering [Charlie McCay, 20040709]

I think real identifiers can also be pointed to. Its just the pointer is to whatever the most recent instances of information about that are.

What is an "nhs number"? You can certainly point to information that is associated with it but what about when you supply information about it. Is it still a reference and if so to what.

It seems to be that if only "data" can be pointed to then a real identifier cannot really be a "ref" since it does not point to "data" in that sense.

I suspect we really have:

Identifiers of immutable data - when the data changes so does the identifier. Issues of containment apply here.

Identifiers of mutable data - e.g. the information about a person may change while the person is still the same and thus the person identification is the same (applies similarly to patient as a role).

Then we can have references to either type. If the reference is to an immutable type then it is clear what it refers to. If the reference is to a mutable type then there is a need for some rules on what it applies to.

It could apply to the most recent information at any point in time.

The most recent information at the time the reference was created.

The most recent information known to the referencer at the time the reference was created (very tricky).

OR

It could apply to all information in all its stages of mutation. Either ...

At any point in time when it is read

Up to the time of reference creation

Either globally or as known at the time of refernce creation.

---

Not sure whether this helps except that we should be careful looking from one perspective and perhaps should limit ourselves to answering the three questions immediately relevant to the NHS

A) Is it a reference to information

B) Is it a unique identifier of contained information

C) Is it an identifier that exists in the real world

Perhaps the hierarchy will cause more problems than it solves since I think A B and C are orthogonal. I think trying to argue that an identifier that exists in the real world is either a reference or a target is a distraction - since neither is unequivocally true. In SNOMED terms it fails the URU test.

BTW: The URU test is that unless it a rule is "Understandable, Reproducabile and Useful" it should not be applied. [David Markwell, 20040709]

Many thanks for this -- I find it agreeable, understandable, reproducible, and useful

It addresses the first part of my email

The second deals with the question as to whether it is the id attribute or the class that is the reference. This is where having identifiers of different scopes becomes interesting -- the NHS usecase only needs a class to be a pointer to anonother class.

Since it is the class that we want to be further populated from the class that is pointed to, there needs to be an unambiguous (ie understandable/reproducible) and testable way to determine how to dereference the pointers

However if the identifier contains the flag indicating that it is a pointer what are the rules: If the NHS number and the UUID are both flagged as pointers, but the schemes have different dereferencing rules there is a problem

Somewhere the rules for dereferencing need to be written out, and they can only depend upon the value of the idType attribute (or RefType if it becomes an attribute of the class -- ie part of infrastructureRoot) and the idScheme used. I do not think that there will be a single set of rules for all schemes. I am not certain that it will be possible to have a global set of rules for every idType/idScheme combination -- but for now and the NHS I am sure that that will be enough (esp if DATA UUIDs are the only ones used)

I think that there is a remaining question as to whether we should put the attribute into infrastructureRoot rather than ii -- I would support such a move. [Charlie McCay, 20040712]


AD In the datatypes schema that is distributed with 1.6 schema generator there is no way to represent an ADXP without an AddressPartType attribute within an address.

This is how we have been representing unstructured address lines, and is preferable to having text nodes separated by tags as elements are easier to map with some tools than mixed context.

I would therefore line another option in the choice within AD that should be called "UnstructuredAddressLine" and have no addressPartType attribute. [Charlie McCay, 20030120]

Hmm, I want to be nice. I agree to the idea of the most light- weight form od unstructured addresses. But I don't understand: when you say that having text separated by tags (presumably you mean , then what is that different from mixed content? I thought AD was mixed content giving you an optional chance to mark up certain parts specially or just let them be text node. Is this not how you can interpret it? [Gunther Schadow, 20030121]


Gunther What we did for the GP2GP datatypes profile last year was define AD as containing <ADXP> elements of type ADXP, without a partType attribute (which is optional in the full HL7 datatypes schema). I realise in hindsight that this was not a valid subset of the HL7 AD datatype, since despite partType being optional in the ADXP type definition it is fixed in all the address parts defined in AD. I do not want to define the address as a string of text separated by tags as it makes the xPaths less intuitive. In particular we have a requirement to maintain addressline1-4 in transfers, where address line 3 may be blank. Here it is more important that it is Line3 than that it is a City.

eg -- if WorkAddress is of type AD

<WorkAddress> <ADXP>23D Dogpole</ADXP> <ADXP null="UNK"/> <ADXP>Shrewsbury</ADXP> <ADXP>Shropshire</ADXP> </WorkAddress>

is better to my mind than

<WorkAddress> 23D Dogpole Shrewsbury Shropshire </WorkAddress>

I now need 4 xPaths -- one to point to each address line

WorkAddress/ADXP[1] WorkAddress/ADXP[2] WorkAddress/ADXP[3] WorkAddress/ADXP[4]

is better than WorkAddress/text()[1] WorkAddress/text()[2] WorkAddress/text()[3] WorkAddress/text()[4] which is less intuative to a newbie developer (and I tested this one but not the first, showing a lower level of comfort), further it would not be guaranteed to work as the text nodes may get split by the xPath processor [Charlie McCay, 20030127]

Based on Charlie's definition, I'll assume the intention is

to explicitly
leave a blank line on the envelope. If that's not the intention, then
we're not really dealing with lines.

I agree that at present, the 'correct' way to represent 
Charlie's scenario
is:
<someElement>
23D Dogpole

Shrewsbury
Shropshire
</someElement>

(The empty DEL element is always interpreted as a line break)

However Charlie's proposed xPath should actually be:

line1 = 
someElement/node()[count(preceding-sibling::DEL[normalize-space(.)
=])=0]
line2 = 
someElement/node()[count(preceding-sibling::DEL[normalize-space(.)
=])=1]
line3 = 
someElement/node()[count(preceding-sibling::DEL[normalize-space(.)
=])=2]
line4 = 
someElement/node()[count(preceding-sibling::DEL[normalize-space(.)
=])=3]

This would ensure compatibility with applications that 
actually did use
some of the other tags. However, I expect Charlie's vendors 
would be even
less impressed :>

An alternative would be to create a new Address Part called 
"Address Line".
While it could be argued that line breaks are merely 
delimiters, creating
an explicit part type would suggest that there are semantics 
to something
being on a particular line. This is what Charlie's use-case seems to
suggest. In the vocabulary hierarchy, everything would be a 
'potential
component of' AddressLine (ADL). I.e. ADL would be the root 
vocabulary
term

If we do this, then the XML would look like this:

<someElement>
<ADXP type="ADL">23D Dogpole</ADXP>
<ADXP type="ADL"/></ADXP>
<ADXP type="ADL">Shrewsbury</ADXP>
<ADXP type="ADL">Shropshire</ADXP>
</someElement>

[Lloyd McKenzie, 20030128] You are both right that I would be embarrassed to take to my suppliers the complex processing that would be required to rely on the DELs, or parsing for newlines. This really is something where they are saying that international standards are adding error-inducing baggage.

I can live with Lloyds suggestion, esp if AddressLine was the element name within AD, with a fixed partType attribute as is done for all the current components of that datatype. This would leave me with: <someElement> <AddressLine>23D Dogpole</AddressLine> <AddressLine></AddressLine> <AddressLine>Shrewsbury</AddressLine> <AddressLine>Shropshire</AddressLine> </someElement>

However this is still a bit of a cludge in the Vocab for partType since one of the named address parts could go over multiple lines. I think that it would be better to allow an address to either contain AddressLines OR identified address parts, using the DEL delimiter to identify line breaks in the latter case. In the former case the delimiter is implied after every AddressLine, so I have: <someElement> <AddressLine>23D Dogpole</AddressLine> <AddressLine></AddressLine> <AddressLine>Shrewsbury</AddressLine> <AddressLine>Shropshire</AddressLine> </someElement> It can then be specified in the conformance profile whether only one of those choices is permitted.


To go beyond my local use-case -- I do not think that the "label view" and "Data view" of addresses should (always) be merged -- if I have a system with 4 address lines in the database, or with named address parts that have come from some address look-up I may have no information as to the best way to display that address -- so the sending system is no better informed than the receiving one as to how the address parts should be printed. Thus always sending layout information (in the form of tags) is at best redundant, and at worst implies a "preferred layout" has been selected when there is no such thing.

So the minimal change would be to add an attribute to AD to indicate whether or not layout information is included, and my preference would be for it to default to "not included" as I think that that reflects the case for most systems which are used here -- but I would certainly not fight over the default. Where it is not included there is no need for AD or any of its parts to be mixed content, hence removing the bug-welcoming significant whitespace. For this reason it may be worth going a step further and defining a different type, but I would not go that far (in a schema for a profile that only allows "Data view" of addresses this could be done anyway) [Charlie McCay, 20030128]

Can you give an example of an address part that would ever span multiple lines? [Lloyd McKenzie, 20030129] Joe Bloggs Sourby Centre for Healthcare Informatics Newcastle University Newcastle

The first line of this address would be wrapped onto two lines by a system that has address lines of length 30, and I would not have to look much further for an organisation name that would wrap on a line length of 40 (and would be included as a delivery point in a persons postal address) -- esp for systems that collect addresses to print on labels and so are more worried about getting untruncated information that they are of maintaining the integrity of address parts. I have certainly written word-wrap algorithms for addresses in clinical system interfaces that have done line wraps (and concatenation) within address parts received rather than truncation.

That said -- in the interests of stability and consensus I would rather go at this stage for your suggestion of having AddressLine as one of the permitted parts, and then worry in 3.1 or 4.0 about the nicety of whether it should be a alternative to the other parts. Whether we create a new partType code for this as a parent or sibling of the current partType codes I have no strong feelings (or even allow it to be blank, which is what I did in the GP2GP profile)

I would line the flag to indicate whether there is layout information in the address, but could even live without that if you called stability on me. [Charlie McCay, 20030129]

There is a requirement in the UK to be able to add a description to an address such as "holiday home for summer months only", or "Used in school holidays".

There does not seem to be a way to add this to the current address type -- though it could be seen as an original text for either addressUse or for validTime, and I know that there is some talk of expanding the use of original text in the next release. [Charlie McCay, 20040211]

>There is a related problem with address.usableperiod -- this is a GTS -- but >"UK school summer holiday" is not something that can be expressed in GTS -- >unless we do a UK extension to the codes used in GTS. Even if we did that I >am not sure that we would get the right strings in there -- and I would >rather be able to support textual usablePeriod values.

we can - and should - add this to GTS - though just "Summer Holiday". If you want to differentiate between different countries' summer holidays, the whole GTS would become (more) unwieldy.

>The NPfIT models have a single (non-approved extension) attribute >"description" that merges the concepts of address.use and >address.usableperiod to meet the usecase of "text advising the user which >address in a set of like addresses to select for a given purpose at a given >time"

Is there more to the definition? and ISO 11179 type definition that you can send? The problem with this definition is that it overlaps with the coded values in use and useablePeriod. The problem with them is that there is no support for non coded values. And it's a reasonable request to allow non-coded values here. In the current structure, for instance, I do not see how we can say that an address is a known past address, but we do not know the time range during which this was a past address (or possibly we only know a single point in time when it was valid). We can continue to extending AD in a format structured fashion, but we will end up chasing the wind as more and more esoteric use cases come up. But if we allow free text comments, then the whole structure will be less processible [Grahame Grieve, 20040219]

The description that I gave was taken from the language of the HL7 abstract datatypes -- I leave it to those in HL7UK who have done the business analysis to describe the requirements clearly -- and then to work them through at the HL7UK meeting next week before I create any more wind on the HL7.org lists.

My expectation is that extending the structure in a considered way will increase usability at the affordable price of some processability -- but I wait for the business analysis to give us the full rationale for this. [Charlie McCay, 20040219]


PQ How is this supposed to be done in the the current datatypes -- I know that the translation component has to be used -- but since it is only the text for the units, what should I put in for the codeSystem oid, and for the code attribute -- is there a scheme identifier for the empty coding scheme -- also do I duplicate these in the translation sub-element?

I know that there was a ballot comment about this -- but cannot find the proposed representation [Charlie McCay, 20040222]

In case the earlier question was too open -- here is my reading of what needs to be done:

<value value="6" unit="1" xmlns="urn:hl7-org:v3"> <translation value="6" code="1" codeSystem="2.16.840.1.113883.6.8"> <originalText>packets each containing 21 tablets of A and 7 tablets of B</originalText> </translation> </value>

Where 2.16.840.1.113883.6.8 is the registered OID for UCUM.

This is a little ugly, so I would appreciate either confirmation that this is as pretty as it gets, or an alternative representation of the same information.

Note that since the originalText of the translation is defined as "The text or phrase used as the basis for the coding." in the abstract datatypes, I take it that this should not include the value -- ie it is the original text of the unit, not the original text of the whole translation

Note also that this is similar to the question from Stan in his ballot comment (line 48 in the combined spreadsheet) where he is concerned about representing an original text that is mapped into a UCUM value -- in which case the "1" would be replaces by the UCUM code. [Charlie McCay, 20040203]



TS

I have a customer asking me why HL7 did not go with the w3C datetime datatype format. Can anyone direct me to some reasons/documents discussing why this was chosen? (Beyond or elaborating on the "NOTE" in the in the DT ITS, that is.) [Wendy Ver Hoef, 20050214] The reason was purely one of the need to support varying levels of precision. In v3, we needed to be able to support: YYYY YYYYMM YYYYMMDD YYYYMMDDhh YYYYMMDDhhmm YYYYMMDDhhmmss YYYYMMDDhhmmss.s YYYYMMDDhhmmss.ss YYYYMMDDhhmmss.sss YYYYMMDDhhmmss.ssss All of the above with or without timezone (though timezone is of limited usefulness for the first 3). The XML Schema types did not support this flexibility. While in theory we could have created our own type and retained the XML Schema approach to separators, we figured that would only lead to increased confusion. Therefore we decided to stick with the same format as was used in v2. (I believe the format is used in a few non-HL7 places as well, but memory fails as to where . . .) [Lloyd Mckenzie, 20050214] Lloyd is correct -- it was a long and uncomfortable learning exercise -- some of the earlier versions of the datatypes included a union of all the W3C schema date-time types -- but this still did not give the level of imprecision that much healthcare data is recorded at (YYYYMMDDhhmm) -- so we were left with either having to have an imprecison flag and sending xs:datetime (YYYYMMDDhhmmss.s*)over the wire, or sticking with iso -- after much discussion the choice was to use the short iso format rather than a constructed type on top of the w3c type. [Charlie McCay, 20050215]