CaBIG Flavors of Null
caBIG Flavors of Null
Some of the code sets being considered by the caBIG VCDE working group included concept codes that didn't fit the same category as the rest of the codes. As an example, the Sex/Gender Candidate Data Standard included the code Unknown, which meant "unknown, not observed, not recorded, or refused". Rather than being a possible value representing gender, Unknown really represented a bit of metadata that:
- indicated that a "meaningful" code was not present
- gave a fairly generic reason why the code wasn't there
Among the disadvantages of the approach above:
- software has to be specialized to recognize the absence of information on an element-by-element basis - Unknown in gender, perhaps NA in second data element, UNK in a third.
- this approach only works with coded fields. How should some represent the fact that an address is missing because they didn't ask for it? How would they differentiate this from a missing address because the patient refused to supply it?
This gives rise to a more general question - should this type of metadata be included in individual code sets at all, or should it be generalized in a way that it could apply to any data element? The focus of the "Flavors of Null" working group is to address the following questions:
- Should the fact that information is not supplied and the reason for its absence be treated differently from other types of information?
- If so, how should such information be treated?
Health Level Seven (HL7) has developed a possible approach and solution to this issues. The Version 3 Data Type Specification includes a both a method to test whether the value is "null" isNull which "Indicates that a value is an exceptional value, or a NULL-value. A null value means that the information does not exist, is not available or cannot be expressed in the data type's normal value set" and an additional field named "nulLFlavor" that "If a value is an exceptional value (NULL-value), this specifies in what way and why proper information is missing." The table of possible values can be found here
Separation of FoN from Valid Values - David submits that we should delete Proposed Decision 1. Harold's email of 4/6/05 lays out a proposed coding policy based on a view that the so-called Flavors of Null are metadata, not data itself. I disagree with this assessment. "Unknown" or "Not reported" are, in my view, legitemate values for many data elements. They represent the highest quality of information known about, say, shoe color at the time of data collection. I don't see them being "data about data", but are the actual data themselves, and should be available in valid value lists as deemed appropriate by the curator of the CDE.
4/27/05 - Having just entered several use cases from the Ardais experience, I would attempt to draw some conclusions.
First, I like the HL7 definitions of "Null" and "NullFlavor": "a null value means that the information does not exist, is not available or cannot be expressed in the data type's normal value set" and a NullFlavor "specifies in what way and why proper information is missing." I use the "Null" definition in the context of populating cells in a relational database, so a NULL means nothing could be posted to that cell.
- Flavors of null should be separate from coded values. Coded values should define "legitimate" value meanings for a data element. Nullability and nullFlavor should be defined as an attribute of any data element
- EVS should be "robustly stocked with terms that cover the different flavors of null".
- All of the flavors should be rooted at a single domain
- All data elements should be "nullable" and should have an optional(?) attribute that allows the reason for omission to be specified
- The data element definition process should allow the reasons for omission to be constrained (restricted) and, potentially, the "nullable" attribute to be set to false
- These characteristics should be specified at the "composite element" or "message building" level rather than the individual element level to avoid having to provide double definitions.
- White paper with recommendations on how caBIG should address Flavors of NULL. This would include:
- a description of the problem
- what other standards bodies have done
- use cases in caBIG
- outstanding issues
- A recommendation on the process to have the white paper acted upon by caBIG
Examples of null flavors
From ISO 639-2
4.1.2 Special situations The language code mul (for multiple languages) should be applied when several languages are used and it is not practical to specify all the appropriate language codes.
The language code und (for undetermined) is provided for those situations in which a language or languages must be indicated but the language cannot be identified.
Examples and Use cases
Here are several examples of so-called Flavors of Null from Ardais clinical information collection systems. In one of our teleconferences I noted that we collect data by abstraction of clinical documents, without access to patients or health care providers, leading to document-centric needs, rather than patient centric.
1. "Default" values for structured data fields:
Not sought Not reported
We avoid NULL values (meaning no posting to the data table cell) for most structured fields. When new records are instantiated, structured data fields are automatically defaulted to the value "Not sought". A data entry person can override this default with "Not reported" or any value from the field-specific controlled vocabulary (CV) pick list. I call "Not reported" a "Default" for our present discussion because it, like "Not sought" is offered for essentially all structured fields.
2. Histochemistry result set:
Positive Negative Equivocal Reported as Performed or Reviewed
These 4 values are the CV offered to abstractors of pathology reports (in addition to "Not sought" and "Not reported) wherein a pathologist has dictated histochemistry results. Analyzing several hundreds of path reports, we decided these appropriately represent the information a genomics researcher wants/needs to know regarding histochemistry tests, and rolled up some vague and/or parochial interpretations into this set. A text note field is available for copy/paste of other descriptive information.
3. Normal-Abnormal-Not Done result set:
Normal Abnormal Not done
These 3 values are the CV offered to abstractors for physical exam summary findings (in addition to "Not sought" and "Not reported"). To the so-called Flavors of Null in #1 above, we have added "Not done" to reflect documentation that the health care provider did not do a specific part of an exam.
4. Treatment Response result set:
Disease stable Disease progression Disease in remission Uncertain
These values represent a high level assessment of reponse to treatment. Sometimes it is documented as uncertain, for which we offer "Uncertain". If this is acceptable in this domain, why not in a domain such as Gender?
5. Clinical Test Result result set:
Normal Abnormal High Abnormal Low Not specified
This value set is intended for either an automated lab data feed or human abstractor to document abnormal result flags that some lab systems provide. "Not specified" is provided in case the data sources does not include such an indicator.