CaBIG Flavors of Null
- 1 caBIG Missing Value Reason (MVR) (formally: Flavors of Null)
- 2 Use Cases
caBIG Missing Value Reason (MVR) (formally: Flavors of Null)
Some of the code sets being considered by the caBIG VCDE working group included concept codes that didn't fit the same category as the rest of the codes. As an example, the Sex/Gender Candidate Data Standard included the code Unknown, which meant "unknown, not observed, not recorded, or refused". Rather than being a possible value representing gender, Unknown really represented a bit of metadata that:
- indicated that a "meaningful" code was not present
- gave a fairly generic reason why the code wasn't there
Among the disadvantages of the approach above:
- software has to be specialized to recognize the absence of information on an element-by-element basis - Unknown in gender, perhaps NA in second data element, UNK in a third.
- this approach only works with coded fields. How should some represent the fact that an address is missing because they didn't ask for it? How would they differentiate this from a missing address because the patient refused to supply it?
This gives rise to a more general question - should this type of metadata be included in individual code sets at all, or should it be generalized in a way that it could apply to any data element? The focus of the "Flavors of Null" working group is to address the following questions:
- Should the fact that information is not supplied and the reason for its absence be treated differently from other types of information?
- If so, how should such information be treated?
Health Level Seven (HL7) has developed a possible approach and solution to this issues. The Version 3 Data Type Specification includes a both a method to test whether the value is "null" isNull which "Indicates that a value is an exceptional value, or a NULL-value. A null value means that the information does not exist, is not available or cannot be expressed in the data type's normal value set" and an additional field named "nulLFlavor" that "If a value is an exceptional value (NULL-value), this specifies in what way and why proper information is missing." The table of possible values can be found here
Separation of FoN from Valid Values - David submits that we should delete Proposed Decision 1. Harold's email of 4/6/05 lays out a proposed coding policy based on a view that the so-called Flavors of Null are metadata, not data itself. I disagree with this assessment. "Unknown" or "Not reported" are, in my view, legitemate values for many data elements. I don't see them being "data about data", but are the actual data themselves, and should be available in valid value lists as deemed appropriate by the curator of the CDE.
Meaningfulness of "FoN" 4/27/05 - Having just entered several use cases from the Ardais experience, I (David) would attempt to draw some conclusions.
I like the HL7 definitions of "Null" and "NullFlavor": "a null value means that the information does not exist, is not available or cannot be expressed in the data type's normal value set" and a NullFlavor "specifies in what way and why proper information is missing." I use the "Null" definition in the context of populating cells in a relational database, so a NULL means nothing could be, or was, posted to that cell.
The issue seems to be whether or not the following are considered "meaningful" values - "Not specified", "Uncertain", "Not done", "Reported as performed or reviewed", "Equivocal", "Not reported", "Not sought". I think that, in and of themselves, they often, maybe always, are meaningful. They allow representation of the highest quality information known about a domain at a specific time.
Each of these values should be available in EVS to be included as Valid Values in caDSR CDE construction, just as values such as "Normal", "Positive" and "Overexpresses".
I would ask, again (I know), that we reconsider calling the illustrative values "Flavors of Null", as cool sounding as that is. PS: the link to the HL7 flavors does not work now.
Update on HL7 Flavors
- Missing Value Reasons are defined only in terms of the 'Grid level' and not in terms of application design and databases
- Define a core MVR termionology for inclusion in EVS which can be added to and extended as needed.
- EVS should be "robustly stocked with terms that cover the different flavors of null".
- ? Should we start with HL7 NullFlavor as the core?
- Make recommendations to the architecture group regarding:
- Linking MVR's to the corresponding CDE field
- Representational forms
- Recommendation to CDE developers: depending on the application context, either allow specification of the reason for the missing value, or do not so specify. If specification is desired, either of these approaches may be used for individual CDEs:
- Include MVR values as Valid Values for that individual enumerated CDE
- Create an associated CDE that enumerates MVR Valid Values for that individual CDE
Other Decision Proposals that need to be agreed on or discarded
- Flavors of null should be separate from coded values. Coded values should define "legitimate" value meanings for a data element. Nullability and nullFlavor should be defined as an attribute of any data element Addressed by 2.1 above
- All of the flavors should be rooted at a single domain Will be decided along w/ 1. above
- All data elements should be "nullable" and should have an optional(?) attribute that allows the reason for omission to be specified Discard - out of scope
- The data element definition process should allow the reasons for omission to be constrained (restricted) and, potentially, the "nullable" attribute to be set to false Absorbed under 3. above (nullability is not relevant)
- These characteristics should be specified at the "composite element" or "message building" level rather than the individual element level to avoid having to provide double definitions. Architectural decision - out of scope
- White paper with recommendations on how caBIG should address MVR's. This would include:
- a description of the problem
- what other standards bodies have done
- use cases in caBIG
- outstanding issues
- A recommendation on the process to have the white paper acted upon by caBIG
Examples of null flavors
From ISO 639-2
4.1.2 Special situations The language code mul (for multiple languages) should be applied when several languages are used and it is not practical to specify all the appropriate language codes.
The language code und (for undetermined) is provided for those situations in which a language or languages must be indicated but the language cannot be identified.
Examples and Use cases
Here are several examples of so-called Flavors of Null from Ardais clinical information collection systems. In one of our teleconferences I noted that we collect data by abstraction of clinical documents, without access to patients or health care providers, leading to document-centric needs, rather than patient centric.
1. "Default" values for structured data fields:
Not sought Not reported
We avoid NULL values (meaning no posting to the data table cell) for most structured fields. When new records are instantiated, structured data fields are automatically defaulted to the value "Not sought". A data entry person can override this default with "Not reported" or any value from the field-specific controlled vocabulary (CV) pick list. I call "Not reported" a "Default" for our present discussion because it, like "Not sought" is offered for essentially all structured fields.
- If (a) the focus is on the 'currency of the exchange' and (b) we view it from the redefined scope of "Missing Value Reason", both of these fields are MVR's.
- The facts that they are represented in the same table cell as the real values and the fact that there is a default value is outside the scope of this group.
2. Histochemistry result set:
Positive Negative Equivocal Reported as Performed or Reviewed
These 4 values are the CV offered to abstractors of pathology reports (in addition to "Not sought" and "Not reported) wherein a pathologist has dictated histochemistry results. Analyzing several hundreds of path reports, we decided these appropriately represent the information a genomics researcher wants/needs to know regarding histochemistry tests, and rolled up some vague and/or parochial interpretations into this set. A text note field is available for copy/paste of other descriptive information.
- David clarified the meaning of the fourth value in that it implied that the results were normal or unremarkable. Based on this, it would not be considered a valid MVR.
3. Normal-Abnormal-Not Done result set:
Normal Abnormal Not done
These 3 values are the CV offered to abstractors for physical exam summary findings (in addition to "Not sought" and "Not reported"). To the so-called Flavors of Null in #1 above, we have added "Not done" to reflect documentation that the health care provider did not do a specific part of an exam.
- Not done is probably an MVR, although the example of 'abdominal palpation not done' might be a little fuzzy.
4. Treatment Response result set:
Disease stable Disease progression Disease in remission Uncertain
These values represent a high level assessment of reponse to treatment. Sometimes it is documented as uncertain, for which we offer "Uncertain". If this is acceptable in this domain, why not in a domain such as Gender?
- The concensus was that "uncertain", when used in this context, was a value, not a MVR. "Unknown", however, isn't as clear and would need to be decided on a case-by-case basis by the CDE author.
5. Clinical Test Result result set:
Normal Abnormal High Abnormal Low Not specified
This value set is intended for either an automated lab data feed or human abstractor to document abnormal result flags that some lab systems provide. "Not specified" is provided in case the data sources does not include such an indicator.
- It was clarified that "Not specified" actually meant that the information wasn't supplied by the source system.
- The consensus was that this was definitely a MVR, although the data element designer would probably want to try to be more specific than just plain "not specified".
CDC Common Data Elements
File:CDC - Data Elements.pdf
You may find pages 13 through 17 useful. Also if you do a cntrl+f and search for “unknown”, “not specified” etc., there will be additional information that you might find useful.
File:AE SPORES (FON-Unkown) Form2171260 v3 0(1).xls
File:CaBIG Lung Study (FON-Varied) Form2193149 v1 0(1).xls
File:CTEP Bone Sarcoma (FON-Unknown) Form0 v3 0(1).xls
File:CTEP Brain Tumor (FON-Unknown) Form0 v3 0(1).xls
File:CTEP Waldenstroms (FON-Varied) Form0 v3 0(2).xls
Self Reported Sex Type
|Data Elements||Definition||Data Type||Permissible Values|
|Self Reported Person Sex Text Type 2200602||The text that describes the assemblage of physical properties or qualities by which male is distinguished from female; the physical difference between male and female within a person.
Explanatory Comment: Identification of sex is usually based upon self-report and may come from a form, questionnaire, interview, etc