This wiki has undergone a migration to Confluence found Here
<meta name="googlebot" content="noindex">

Datatypes R2 Issue 110

From HL7Wiki
Jump to navigation Jump to search

BIN/ED/ST Problem

Introduction

There is a problem with BIN/ED/ST related to the concept of list. BIN is a list of bits (BN). ED specialises this, and then ST specialises this.

So the length of the ST is the number of bits in a representation of the string.


However.....

This is from the javasig code:

in class STjlStringAdapter (implements ST)

 private String _data;
 
 public INT length() {
   return INTlongAdapter.valueOf(this._data.length());
 }

and then this, from class BINbyteArrayImpl (implements ED)

 byte[] _bits;
 
 public INT length() {
   return INTlongAdapter.valueOf(this._bits.length);
 }


There is two ways to look at this. The first is that the return value of STjlStringAdapter.length is wrong - it's not the value of the number of bits to represent the string.

The problem with this is that it's obvious that the length of a string is the number of characters, not the number of bits to represent it. There's little bit more to this. The number of bits to represent the string depends on the character set, but the java implementation does not know the original character set.

Either way, the length of a string will depend on implementation questions.

Gunther's take on this is that the length of an ED is the number of "things" in the ED, where the type of thing is defined by the mediatype. (Which becomes real interesting for jpegs...)

There's a corollary here. The length of bits depends on the representation, but the equality definition doesn't - in the equality definition, it says that the character set must be "accounted for" in equality.

Which leads to the following somewhat paradoxical invariant:

invariant(ED x, ED y) {
  x.equal(y).implies(x.length.equal(y.length)).not;
}

(ok, that's not valid, you can't not an implies, but the intent is obvious)

That's the intent of the ED equality definition, however.

Subsetting is difficult to. The subset of a ST is an obvious logical entity, but for some character sets (JIS in particular) you simply can't meaningfully extract a series of bytes - you must extract the series of characters and re-encode in a character set.

The upshot of this is that ED isn't really a BIN - while it has contents, it's really a facade around the notion of BIN, and offers smart operations that depend on the mediatype, where as the BIN operations - which have overlapping definitions, do not depend on mediatype.

Proposal

The effective implication is that ED should not specialise BIN, but should contain a property data:BIN, and should specialise ANY. It will need to add a few properties that it usefully inherits from LIST, and wouldn't in the new arrangement (empty, sublist...) The existing invariants about emptiness still applies.

So the definition would look like this:


type EncapsulatedData alias ED specializes BIN {
  BIN     data;
  BL      empty;
  BL      notEmpty;
  INT     length;
  CS      mediaType;
  CS      charset;
  CS      language;
  CS      compression;
  TEL.Url reference;
  BIN     integrityCheck;
  CS      integrityCheckAlgorithm;
  ST      description;
  ED      thumbnail;
  DSET<ED> translation;
  BL      equal(ANY x);
  ED      subPart(INT start, INT end);
};


This has several definitional advantages.

  • overcomes the problems that subList has in relation to ED
  • allows for a clean definition of equality without creating difficulties between ED and BIN
  • means that ST length implementation in javasig becomes correct

Alternative

An alternative is to break ST off ED. redefine some of these things on ST, and leave ED as it is, but change the definition of ED equality so it is based on BIN equality.

And have a promotion from ST to ED, so ST can be used in place of ED.


Backwards Compatibility

This has no effect; it's a redefinition with the same meaning.


Status

Proposed