This wiki has undergone a migration to Confluence found Here
<meta name="googlebot" content="noindex">

Difference between revisions of "Datatypes R2 Issue 8"

From HL7Wiki
Jump to navigation Jump to search
 
 
(One intermediate revision by the same user not shown)
Line 3: Line 3:
 
== Introduction ==
 
== Introduction ==
  
 +
Create a “translatable string” datatype.  This is a follow-on to (and dependent upon) the proposal to add the “translation” property to ED.
  
? backward compatible.
+
There is a requirement in several realms to be able to send multiple translations for the same text field.  E.g. English and French translations of a title.
 +
 
 +
? backward compatible: This element will be semantically backward compatible. However, making any schema change has the potential to break implementations which use tight schema validation
 +
 
 +
HL7 Canada
  
 
== Discussion ==
 
== Discussion ==
  
 +
Translatable String (STT) specializes ED
 +
   
 +
Definition:      The character string data type stands for text data, primarily intended for machine processing (e.g., sorting, querying, indexing, etc.) Used for names, symbols, and formal expressions.  Translatable strings permit the exchange of multiple versions of the string for multiple languages
 +
   
 +
The character string is a restricted encapsulated data type (ED), whose type property is fixed to text/plain, and whose data must be inlined and not compressed. Thus, the properties compression, reference, integrity check, algorithm, and thumbnail are not applicable. The character string data type is used when the appearance of text does not bear meaning, which is true for formalized text and all kinds of names.
 +
   
 +
Table 10: Property Summary of Character String
 +
Name Type Description
 +
mediaType CS Identifies the type of the encapsulated data and identifies a method to interpret or render the data.
 +
Charset CS For character-based encoding types, this property specifies the character set and character encoding used. The charset shall be identified by an Internet Assigned Numbers Authority (IANA) Charset Registration [] in accordance with RFC 2978 [].
 +
   
 +
The character string (ST) data type interprets the encapsulated data as character data (as opposed to bits), depending on the charset property of the encapsulated data type.
 +
   
 +
Definition 43:
 +
type CharacterString alias ST specializes ED {
 +
    INT  length;
 +
    ST    headCharacter;
 +
    ST    tailString;
 +
};
 +
   
 +
   
 +
NOTE: Because many of the properties of the encapsulated data are bound to a default value, an ITS need not represent these properties at all. In fact, if the character encoding is also fixed, the ITS only represents the encoded character data.
 +
   
 +
The headCharacter and tailString properties define ST as a sequence of entities each of which uniquely identifies one character from the joint set of all characters known by any language of the world. 20 The length of a translatable character string is the sum of the number of characters in the string and the lengths of its translations.
 +
   
 +
The head of a string is a string of only one character. A character string must at least have one character or else it is NULL. A zero-length string is an exceptional value (NULL), not a proper character string value.
 +
   
 +
Definition 44:
 +
invariant(STT x) where x.nonNull {
 +
  x.headCharacter.notEmpty;
 +
  x.headCharacter.length.equal(1);
 +
  x.headCharacter.tailString.isEmpty;
 +
  x.tailString.isEmpty.implies(x.length.equal(1));
 +
  x.tailString.notEmpty.implies(x.length
 +
          .equal(x.tailString.length.successor));
 +
};
 +
   
 +
   
 +
The length of a string is the number of characters, not the number of encoded bytes. Byte encoding is an ITS issue and is not relevant on the application layer.
 +
   
 +
The following rules apply to whitespace contained within character strings:
 +
   
 +
• TAB, space and end-of-line are all considered whitespace characters.
 +
• Both preceding and trailing whitespace is significant.
 +
• Different whitespace characters are not interchangable.
 +
• Different representations of end-of-line are normalised according to the method described in the XML specification [http://www.w3.org/TR/2000/REC-xml-20001006#sec-line-ends]
 +
• Sequences of whitespace cannot be compressed to shorter sequences.
 +
   
 +
Requirement:
 +
ST is a specialization of ED so that any RIM attribute which has the type ED can be constrained to a ST. The most important case is Act.text, which is an ED to cater for the use of references and multimedia data, but is often constrained to plain text.
 +
Character String (ST) specializes ED
 +
   
 +
A string with no translations permitted.
 +
 +
invariant(ST x) where x.nonNull {
 +
  x.translation.isEmpty;
 +
 +
 +
<xsd:complexTypename="ST" mixed="true">
 +
    <xsd:complexContent>
 +
        <xsd:restrictionbase="ED">
 +
            <xsd:sequence>
 +
                <xsd:elementname="reference" type="TEL" FIXED />
 +
                <xsd:elementname="thumbnail" type="ED" FIXED />
 +
                <xsd:elementname="translation" type="SET<ED>" />
 +
            </xsd:sequence>
 +
            <xsd:attributename="representation" type="cs_BinaryDataEncoding" FIXED />
 +
            <xsd:attributename="mediaType" type="cs" FIXED />
 +
            <xsd:attributename="language" type="cs" ... />
 +
            <xsd:attributename="compression" type="cs_CompressionAlgorithm" FIXED />
 +
            <xsd:attributename="integrityCheck" type="bin" FIXED />
 +
            <xsd:attributename="integrityCheckAlgorithm" type="cs_IntegrityCheckAlgorithm" FIXED />
 +
        </xsd:restriction>
 +
    </xsd:complexContent>
 +
</xsd:complexType>
 +
SC should extend STT, not ST
 +
II.assigningAuthority should probably be STT, not “st”
 +
 +
 +
No RIM changes, provided that SC extends STT rather than ST. 
 +
 +
<There are a few SCs where translations probably aren’t appropriate but codes are (AttentionLine.keyWortText, Batch.name, RelationalExpression.elementName, SortControl.elementName).  If deemed appropriate, we could add an SCT (string with code, translatable) instead and change several of the existing RIM attributes from SC to SCT.  (Device.manufacturerModelName, Device.softwareName, Employee.jobTitleName)>
 +
 +
Discussion:
 +
 +
Paul Biron:
 +
The proposal contains the following:
 +
A string with no translations permitted.
 +
 +
invariant(ST x) where x.nonNull {
 +
x.translation.isEmpty;
 +
}
 +
 +
I agree that it should be permitted to not have translations but that invarient says that *all* translations are "empty" *if* the value is non-null.  Wait...now I see, this distinquishes ST from STT.  However, the schema fragment in "XML changes" includes translation in ST...shouldn't it be restricted out?
 +
 +
 +
Lloyd:
 +
Yes.  ST should restrict out "translation".  Must have left it in place as a typo.
 +
 +
 +
== Disposition ==
 +
 +
== Status ==
 +
 +
INM call Mon Jun 18 2007: Add this with a note about it's status, and see what happens at ballot
  
 
== Links ==
 
== Links ==
 
Back to [[Data Types R2 issues]]
 
Back to [[Data Types R2 issues]]

Latest revision as of 11:31, 20 June 2007

Data Types Issue 8: Add STT datatype

Introduction

Create a “translatable string” datatype. This is a follow-on to (and dependent upon) the proposal to add the “translation” property to ED.

There is a requirement in several realms to be able to send multiple translations for the same text field. E.g. English and French translations of a title.

? backward compatible: This element will be semantically backward compatible. However, making any schema change has the potential to break implementations which use tight schema validation

HL7 Canada

Discussion

Translatable String (STT) specializes ED

Definition: The character string data type stands for text data, primarily intended for machine processing (e.g., sorting, querying, indexing, etc.) Used for names, symbols, and formal expressions. Translatable strings permit the exchange of multiple versions of the string for multiple languages

The character string is a restricted encapsulated data type (ED), whose type property is fixed to text/plain, and whose data must be inlined and not compressed. Thus, the properties compression, reference, integrity check, algorithm, and thumbnail are not applicable. The character string data type is used when the appearance of text does not bear meaning, which is true for formalized text and all kinds of names.

Table 10: Property Summary of Character String Name Type Description mediaType CS Identifies the type of the encapsulated data and identifies a method to interpret or render the data. Charset CS For character-based encoding types, this property specifies the character set and character encoding used. The charset shall be identified by an Internet Assigned Numbers Authority (IANA) Charset Registration [] in accordance with RFC 2978 [].

The character string (ST) data type interprets the encapsulated data as character data (as opposed to bits), depending on the charset property of the encapsulated data type.

Definition 43: type CharacterString alias ST specializes ED {

   INT   length;
   ST    headCharacter;
   ST    tailString;

};


NOTE: Because many of the properties of the encapsulated data are bound to a default value, an ITS need not represent these properties at all. In fact, if the character encoding is also fixed, the ITS only represents the encoded character data.

The headCharacter and tailString properties define ST as a sequence of entities each of which uniquely identifies one character from the joint set of all characters known by any language of the world. 20 The length of a translatable character string is the sum of the number of characters in the string and the lengths of its translations.

The head of a string is a string of only one character. A character string must at least have one character or else it is NULL. A zero-length string is an exceptional value (NULL), not a proper character string value.

Definition 44: invariant(STT x) where x.nonNull {

 x.headCharacter.notEmpty;
 x.headCharacter.length.equal(1);
 x.headCharacter.tailString.isEmpty;
 x.tailString.isEmpty.implies(x.length.equal(1));
 x.tailString.notEmpty.implies(x.length
          .equal(x.tailString.length.successor));

};


The length of a string is the number of characters, not the number of encoded bytes. Byte encoding is an ITS issue and is not relevant on the application layer.

The following rules apply to whitespace contained within character strings:

• TAB, space and end-of-line are all considered whitespace characters. • Both preceding and trailing whitespace is significant. • Different whitespace characters are not interchangable. • Different representations of end-of-line are normalised according to the method described in the XML specification [1] • Sequences of whitespace cannot be compressed to shorter sequences.

Requirement: ST is a specialization of ED so that any RIM attribute which has the type ED can be constrained to a ST. The most important case is Act.text, which is an ED to cater for the use of references and multimedia data, but is often constrained to plain text. Character String (ST) specializes ED

A string with no translations permitted.

invariant(ST x) where x.nonNull {

 x.translation.isEmpty;

}

<xsd:complexTypename="ST" mixed="true">

   <xsd:complexContent>
       <xsd:restrictionbase="ED">
           <xsd:sequence>
               <xsd:elementname="reference" type="TEL" FIXED />
               <xsd:elementname="thumbnail" type="ED" FIXED />
               <xsd:elementname="translation" type="SET<ED>" />
           </xsd:sequence>
           <xsd:attributename="representation" type="cs_BinaryDataEncoding" FIXED />
           <xsd:attributename="mediaType" type="cs" FIXED />
           <xsd:attributename="language" type="cs" ... />
           <xsd:attributename="compression" type="cs_CompressionAlgorithm" FIXED />
           <xsd:attributename="integrityCheck" type="bin" FIXED />
           <xsd:attributename="integrityCheckAlgorithm" type="cs_IntegrityCheckAlgorithm" FIXED />
       </xsd:restriction>
   </xsd:complexContent>

</xsd:complexType> SC should extend STT, not ST II.assigningAuthority should probably be STT, not “st”


No RIM changes, provided that SC extends STT rather than ST.

<There are a few SCs where translations probably aren’t appropriate but codes are (AttentionLine.keyWortText, Batch.name, RelationalExpression.elementName, SortControl.elementName). If deemed appropriate, we could add an SCT (string with code, translatable) instead and change several of the existing RIM attributes from SC to SCT. (Device.manufacturerModelName, Device.softwareName, Employee.jobTitleName)>

Discussion:

Paul Biron: The proposal contains the following: A string with no translations permitted.

invariant(ST x) where x.nonNull { x.translation.isEmpty; }

I agree that it should be permitted to not have translations but that invarient says that *all* translations are "empty" *if* the value is non-null. Wait...now I see, this distinquishes ST from STT. However, the schema fragment in "XML changes" includes translation in ST...shouldn't it be restricted out?


Lloyd: Yes. ST should restrict out "translation". Must have left it in place as a typo.


Disposition

Status

INM call Mon Jun 18 2007: Add this with a note about it's status, and see what happens at ballot

Links

Back to Data Types R2 issues