Canonical Pedigree Project
- 2011-10-01: The Canonical Pedigree Project is being suspended as a standalone effort. It is now being merged into a larger project to update the existing HL7 pedigree standard; the so-call Family History revision 2 project.
Call For Participation
As of January, 2011, the canonical pedigree project has the blessing of the HL7 Technical Steering Committee. It's time to kick it into high gear.
If you would like to participate, on either the technical end -- the aspects dealing with XML, HL7 messages, XML tools -- or the clinical end -- creating relevant family health histories, and dealing with the inevitable ambiguity -- there are two things you need to do. First, subscribe to HL7's clingenomics (short for Clinical Genomics) mailing list. Second, send a mail message to Scott Bolte, the project leader, and indicate your area of interest. In addition to the mail list, we will periodically use the Clinical Genomics weekly conference call.
To give you a feel for what type of assistance will be needed, here is a partial list:
- Repository of clinical examples: These will be reference messages that can be used for pedigree creation or pedigree exchange testing.
- Text narratives for constructing simple family histories, each with its corresponding XML message and optional pedigree drawing.
- Narratives for unusual, but clinically relevant, family structures. Once again, each narrative will have its corresponding XML message and pedigree drawing.
- Technical how-to guides: Detailed instructions on how to use specific applications to perform a common task. Sample tasks include:
- Pedigree system inventory: A list of applications that do, or plan to, create, exchange, or analyze pedigree messages.
- Clinical procedures: Document clinical practices involving family histories that may not be obvious to the non-practitioner.
Canonical Pedigree Project Overview
The Canonical Pedigree Project (CPP) was conceived by the HL7 Clinical Genomics work group in 2010. At the January 2011 work group meeting in Sydney, the Technical Steering Committee approved its project scope statement. The project is intended to improve adoption of the standard V3 pedigree message. It has three aspects:
- Reference Pedigrees: Provide reference pedigree messages with corresponding text descriptions of the family history. Intended to be a resource for family history collection software verification.
- Interoperability Testing: The internal storage of a pedigree is up to the host system. Furthermore, some interoperability standards want to represent a pedigree using alternate formats (e.g. CCD, vMR, clinical statements, etc.). The canonical pedigree project shall provide test guidance to verify that host systems and alternate formats are able to accurately maintain the relationships in the reference pedigrees. If full fidelity cannot be maintained, the guidance will help quantify the lost of fidelity.
- Clinical Power: Many systems provide support only for simplified family histories. For example, they will capture that there were two instances of aunts with breast cancer. That simplified perspective is in contrast with one that maintains maternal vs. paternal line, the number of available aunts and clinical details such as age of onset. The intent behind this facet of CPP is to quantify the clinical benefits of improving the granularity of family and clinical histories.
To learn more about the goals or status of CPP contact the project leader, currently Scott Bolte of GE Healthcare.
The adoption of the HL7 Pedigree standard message is inhibited by a lack of interoperability testing. That has led to multiple systems that are generating incompatible messages. While wider use of the HL7 provided message schema would partially address that problem, there is still a problem verifying when messages are equivalent. Without requiring detailed knowledge of XML, the markup language used to capture a pedigree message, the following examples illustrate the problem.
- Well-Formed: An XML message is said to be well-formed if it conforms to the high-level XML syntax rules. For people unfamiliar with XML, here is a more familiar example of a street address for a letter that would be considered well-formed:
80 Old Faithful John Ranger
- Valid: An XML message is valid if it conforms to a schema definition that dictates the allowed content and ordering of elements. In the previous example, though a person may be able to guess how to send the letter to Mr. Ranger with its well-formed address, the address is actually invalid. The street element improperly comes before the person name element and the state & zip code elements are missing. An address that is both well-formed and valid according to generally accepted schema rules in the United States is:
John Ranger 80 Old Faithful Trail Yellowstone National Park, WY, 82190
- Equivalent: For interoperability, it is not sufficient that pedigree messages are valid. It is critical to be able to test if two pedigree messages are equivalent. Here are two valid addresses that are subtly different:
John Ranger Old Faithful Visitor Center 80 Old Faithful Trail Yellowstone National Park, WY, 82190
John Ranger 80 Old Faithful Trail Yellowstone National Park, WY, 82190
It is central to the canonical pedigree project to be able verify generated pedigree messages are equivalent to the reference messages.
Degrees of Equivalence
There are three degrees of equivalence between two pedigree messages:
- Full equivalence: Two pedigree messages are fully equivalent if they have identical XML elements and attributes. It may be necessary to sort elements before comparing them -- one pedigree may have the youngest individuals first and the other the oldest. It also will likely be necessary to standardize each XML document's treatment of white space (e.g. spaces, carriage returns) and character encoding (e.g. '<' vs. <). However, after those two transformations are made, if the two messages are identical it is considered full equivalence.
- Potential equivalence: The complexity of human relationships, and the numerous terms to describe them, will result in two messages that potentially are equivalent. For example, in message A Jane may be declared Bob's sister's daughter while in message B Jane is Bob's niece. It is possible that the two pedigrees apply to the same individuals, but Jane might also be Bob's brother's daughter. Therefore, unless there is evidence to the contrary, the two pedigrees are potentially equivalent.
In an ideal world, there would not be ambiguity. However, clinical reality is that sometimes imprecise relationships will be reported by people, and that's fine. What is not acceptable is for a computer system to needlessly degrade unambiguous relationships to be ambiguous, or to convert ambiguous to unambiguous without supporting evidence.
- Partial equivalence: Sometimes pedigrees are simplified. For example, pedigree B may have the same individuals and their relationships as pedigree A, but it may lack age of onset of a disease or demographic information (e.g. date of birth). As long as the information that is in both pedigrees is in agreement, they are partially equivalent. However, the simplified one will have diminished clinical value.
Having three types of equivalence is not a matter of splitting hairs. It will be crucial to demonstrate full equivalence when a pedigree is converted to and from an alternate representation (e.g. clinical statement). It also is expected that legacy systems may simplify detailed family histories and quantifying that loss of information is also important.
The following sample is an elaborated pedigree that supplements the standard specification: Patient has two sisters, a husband a daughter, and a mother and a father (each has two parents): Media:PedigreeSampleElaborated.doc
The need for detailed family health history can easily be demonstrated. For example, the chart for a 30 year old woman may have the statement "family history of breast cancer -- mother and two aunts". Unfortunately that simple statement is insufficient to distinguish between normal and increased risk.
All three of the following scenarios fit the original statement. Pedigrees and risk scores courtesy of Hughes Risk Apps.
If you compare scenarios 1 and 2, it is quite clear that the age of onset of the disease shifted from the 60's to around 40. Similarly, scenario 1 had the incidences of cancer split across family lines, and amongst a large group of siblings. In contrast, scenario 2 was all on one side, and 100% disease penetration included both the three sisters and their brother. A knowledgeable clinician will recognize the first scenario is likely just sporadic cancers that are not terribly surprising given the patients' ages while the second shouts out there is a genetic component.
Now the first two scenarios were crafted to highlight the hazard of drawing conclusions from a simplified family history. However, it is the third situation that is more likely, and more difficult to assess without decision support. By the time 3rd degree relatives are included it becomes quite difficult to assess risk by hand. However, the high clinical value to patients makes it clear that collecting, analyzing, and acting upon a detailed structured family history is worth improving.