CaBIG Flavors of Null
caBIG Flavors of Null
Some of the code sets being considered by the caBIG VCDE working group included concept codes that didn't fit the same category as the rest of the codes. As an example, the Sex/Gender Candidate Data Standard included the code Unknown, which meant "unknown, not observed, not recorded, or refused". Rather than being a possible value representing gender, Unknown really represented a bit of metadata that:
- indicated that a "meaningful" code was not present
- gave a fairly generic reason why the code wasn't there
Among the disadvantages of the approach above:
- software has to be specialized to recognize the absence of information on an element-by-element basis - Unknown in gender, perhaps NA in second data element, UNK in a third.
- this approach only works with coded fields. How should some represent the fact that an address is missing because they didn't ask for it? How would they differentiate this from a missing address because the patient refused to supply it?
This gives rise to a more general question - should this type of metadata be included in individual code sets at all, or should it be generalized in a way that it could apply to any data element? The focus of the "Flavors of Null" working group is to address the following questions:
- Should the fact that information is not supplied and the reason for its absence be treated differently from other types of information?
- If so, how should such information be treated?
Health Level Seven (HL7) has developed a possible approach and solution to this issues. The Version 3 Data Type Specification includes a both a method to test whether the value is "null" isNull which "Indicates that a value is an exceptional value, or a NULL-value. A null value means that the information does not exist, is not available or cannot be expressed in the data type's normal value set" and an additional field named "nulLFlavor" that "If a value is an exceptional value (NULL-value), this specifies in what way and why proper information is missing." The table of possible values can be found here
Separation of FoN from Valid Values - David submits that we should delete Proposed Decision 1. Harold's email of 4/6/05 lays out a proposed coding policy based on a view that the so-called Flavors of Null are metadata, not data itself. I disagree with this assessment. "Unknown" or "Not reported" are, in my view, legitemate values for many data elements. They don't overlap with other values, say of shoe color, and provide a vehicle to represent the highest quality of information known about shoe color at the time of data collection. I don't see them being "data about data", but are the actual data themselves, and should be available in valid value lists as deemed appropriate by the curator of the CDE.
- Flavors of null should be separate from coded values. Coded values should define "legitimate" value meanings for a data element. Nullability and nullFlavor should be defined as an attribute of any data element
- EVS should be "robustly stocked with terms that cover the different flavors of null".
- All of the flavors should be rooted at a single domain
- All data elements should be "nullable" and should have an optional(?) attribute that allows the reason for omission to be specified
- The data element definition process should allow the reasons for omission to be constrained (restricted) and, potentially, the "nullable" attribute to be set to false
- These characteristics should be specified at the "composite element" or "message building" level rather than the individual element level to avoid having to provide double definitions.
- White paper with recommendations on how caBIG should address Flavors of NULL. This would include:
- a description of the problem
- what other standards bodies have done
- use cases in caBIG
- outstanding issues
- A recommendation on the process to have the white paper acted upon by caBIG
Examples of null flavors
From ISO 639-2
4.1.2 Special situations The language code mul (for multiple languages) should be applied when several languages are used and it is not practical to specify all the appropriate language codes.
The language code und (for undetermined) is provided for those situations in which a language or languages must be indicated but the language cannot be identified.
Examples and Use cases
Here are several examples of so-called Flavors of Null from Ardais clinical information collection systems. ( I will continue shortly)
1. Default values for structured data fields from Ardais:
Not sought Not reported
2. Histochemistry result set from Ardais:
Positive Negative Equivocal Reported as Performed or Reviewed
These 4 values are the controlled vocabulary offered to abstractors of pathology reports, wherein a pathologist has dictated histochemistry results. Analyzing several hundreds of path reports, we decided these appropriately represent the information a genomics researcher wants/needs to know regarding histochemistry tests, and rolled up some vague and/or parochial interpretations into this set. A text note field is available for copy/paste of other descriptive information.