201801 Automated Profiling From Domain Models

From HL7Wiki
Jump to navigation Jump to search

Track Name

Automated Profiling From Domain Models

FHIR Version

FHIR Release 3 (STU) with 1 technical errata, aka, FHIR 3.0.1

Submitting WG/Project/Implementer Group

Clinical Interoperability Council, Cancer Interoperability Group


There is a practical gap between documenting the clinical content in a healthcare domain via a domain analysis model, and the ability to deliver a corresponding FHIR implementation. While it is possible with existing manual tools to create individual FHIR Profiles in small numbers, clinical domains may require large numbers of profiles. There are over 100 clinical specialties and many more sub-specialties, and 50,000 human diseases that may require profiling in some fashion. Moreover, healthcare domains are not wholly separable; if done correctly, the same set of FHIR profiles for wound care could apply in nursing, geriatric medicine, podiatry, diabetes, and emergency care. The inevitable conclusion is that much more efficient, distributed, governed, and content-coordinated methods of profile development are required.

This track will test and compare multiple methodologies for creating FHIR profiles that make it possible to convert a domain analysis model (a so-called logical model) into a set of FHIR profiles in a relatively simple, efficient manner. For this Connectathon, the focus will be on modeling breast cancer staging. Focusing on one domain will allow for comparison of the modeling efforts more easily. However, if you have a different domain you wish to model, and have a complete workflow for going from input to FHIR profiles, you are also welcome to join this track. This track will also attempt to make some hypotheses about what constitutes a "good profile".

NOTE: This Connectathon track is agnostic to the approaches to logical modeling and FHIR Profiling. It is being supported by the MITRE Standard Health Record team, but any and all methods of modeling and profiling are welcomed and encouraged.

Known Approaches to Capturing FHIR profiles

Who Logical Model Capture Logical Model Representation Mapping Language
FHIR Excel Spreadsheets Structure Definition FHIR Mapping Language
MITRE Standard Health Record (SHR) Open Source Tools Clinical Information Modeling and Profiling Language (CIMPL) Clinical Information Model COmputable Representation (CIMCORE) CIMPL
FHIM Any UML development tool XMI Open Health Tools Model Driven Message Interoperability (MDMI)
Intermountain Healthcare Clinical Element Modeling Language (CEML) ? ?
CIMI MagicDraw and ADL Workbench Archetype Description Language (ADL) and Basic MetaModel (BMM) FHIR Mapping Language

Proposed Track Lead

Mark Kramer, mkramer@mitre.org
Michael O'Keefe, mokeefe@mitre.org
See Connectathon_Track_Lead_Responsibilities

Expected participants

Participants that we have reached out to, or who have expressed interest in the connectathon topic:

  • Clinical Interoperability Council (CIC)
  • Clinical Information Modeling Initiative (CIMI)
  • Federal Health Architecture (FHA) and Federal Health Information Model (FHIM)
  • Model-Driven Health Tools (MDHT)
  • College of American Pathologists (CAP)
  • Wayne Kubic (HL7 CTO)
  • Integrating the Healthcare Enterprise (IHE)
  • HL7 Cancer Interoperability Group

Profiling Problem Description

The goal is to create FHIR profiles representing a simplified version of breast cancer staging. This section contains a high-level description of breast cancer staging. It is not intended to be 100% clinically accurate or complete. The following five data structures are in scope for the Connectathon:

  1. TNM Staging Elements
  2. ER Status and subelements
  3. PR Status and subelements
  4. HER2 Status and subelements
  5. Tumor Grade and subelements

The challenge can be viewed in two steps:

  1. Formalize the domain description as an information model
  2. Produce profiles from that description


The treatment of breast cancer is driven by several factors influencing the prognosis of each patient. The process of aggregating these factors is called staging. Staging is ultimately represented by a single overall score (0, IA, IB, IIA, IIB, IIIA, IIIB, IIIC, or IV), but many macroscopic, microscopic, and biologic factors go into staging. Note that the goal of the current exercise is not to calculate the overall stage, nor validate consistency among the staging categories, but only to define FHIR profiles to contain staging information.

The American Joint Commission on Cancer 8th Edition Staging Manual (AJCC-8) defines the elements used in staging breast cancer. These include the traditional TNM staging components (anatomic staging), as well as other factors that are known to influence the prognosis of breast cancer patients, including: tumor grade, hormone receptor status (progesterone and estrogen), as well as human epidermal growth factor 2 (HER 2) status.

TNM Staging Elements

There are three staging components:

  • T (tumor): defines categories for the primary tumor based on the tumor size.
  • N (regional nodes): defines categories for lymph node involvement in the cancer, based on the status of lymph nodes in proximity to the breast.
  • M (distant metastases): defines categories for presence or absence of distant metastases.

Each component has a value, referred to as a “category,” whose allowable values are given below.

TNM staging can be done at several points over the patient’s course of care:

  • Clinical (c) refers to staging done before tissue samples have been obtained from the tumor. The T, N and M components are prefixed with “c” to denote clinical staging (e.g., cT, cN).
  • Pathologic (p) refers applies to staging in patients who had surgery as their initial course of treatment. It includes all data used for clinical staging, data from surgical exploration and resection, and results of pathological examination (gross and microscopic) of the primary carcinoma, regional lymph nodes and metastatic sites. The T, N, and M components are used with the “p” prefix to denote pathologic staging (e.g., pT, pN)
  • Post-neoadjuvant (y) refers to staging done after the patient has completed therapy such as chemotherapy, radiation, or hormone therapy. The T and N components are prefixed with “y” denote post-treatment staging without or with surgery (e.g., ycT or ypT).

Conceptually, the staging logical data structure might look something like this:

StageGroup [1..1]: either 0, IA, IB, IIA, IIB, IIIA, IIIB, IIIC, or IV
StageTiming: either clinical (c), pathologic (post-surgical) (p), post-neoadjuvant clinical (yc), or post-neoadjuvant pathologic(yp)
T [1..1]: either TX, Tis (DCIS), Tis (Paget), T1, T1mi, T1a, T1b, T1c, T2, T3, T4, T4a, T4b, T4c, or T4d
N* [1..1]: either cNX, cN0, cN1, cN2, cN2a, cN2b, cN3, cN3a, cN3b, cN3c (for clinical staging) 
or pNX, pN0, pN1, pN1mi, pN1a, pn1b, pN1c, pN2, pN2a, pN2b, pN3 (for pathologic staging)
M [1..1]: either M0 or M1

*For post-neoadjuvant staging, the N categories are preceded by “y” (e.g. ycNX, ypN1)

Estrogen receptor (ER) status

Estrogen receptor alpha is the predominant estrogen receptor expressed in breast tissue and is overexpressed in around 50% of breast carcinomas. The determination of ER status depends on several observations:

  • Nuclear positivity: the percentage of cells that test (stain) positive
  • Average staining intensity
  • Primary antibody reacting with the ER receptor

A logical data structure for ER status conceptually could look like this:

EstrogenReceptorStatus [1..1]: either positive, indeterminate, or negative
NuclearPositivity [0..1]: percentage
AverageStainingIntensity [0..1]: either None/Negative (0), Weak (1), Moderate (2), or Strong (3)
PrimaryAntibody [0..1]: either SP1, 6F11, or 1D5

Progesterone receptor (PR) status

Progesterone receptor (PR) is expressed in 65% of breast carcinomas. Oncologists use PR status and related details to inform their treatment decision. The determination of PR status depends on several observations:

  • Nuclear positivity: the percentage of cells that test (stain) positive
  • Average staining intensity
  • Primary antibody reacting with the PR receptor

Conceptually, a data structure for PR status could look like this:

ProgesteroneReceptorStatus [1..1]: either positive, indeterminate, or negative
NuclearPositivity [0..1]: percentage
AverageStainingIntensity [0..1]: either None/Negative (0), Weak (1), Moderate (2), or Strong (3)
PrimaryAntibody [0..1]: either IE2, 636, 16, SP2, 1A6, 1294, or 312

Human epidermal growth factor 2 (HER2) status

Human epidermal growth factor receptor 2 (HER2) status can impact the prognosis of breast cancer patients. The determination of HER2 status can be achieved through immunohistochemistry (IHC) or in situ hybridization (ISH). Each method results in a different set of observations used to determine HER2 receptor status:

  • HER2 by IHC:
    • Score: 0, 1+, 2+, or 3+. A score of 0 or 1+ translates to a negative HER2 status; a score of 2+ is considered equivocal, and a score of 3+ is considered positive;
    • Percentage of cells with complete staining: 0-100%;
  • HER2 by single-probe ISH:
    • Average number of HER2 signals per cell
  • HER2 by dual-probe ISH:
    • Average number of HER2 signals per cell
    • Average number of CEP17 signals per cell
    • HER2/CEP17 ratio: ratio of HER2 signals per cell over CEP17 signals per cell

A data structure for HER2 status conceptually could look like:

HER2ReceptorStatus [1..1]: either positive, negative, or indeterminate
Method [0..1]: either ICH, ISH single probe, or ISH dual probe
HER2byIHC [0..1]:
    Score [1..1]: either 0, 1+, 2+, or 3+
    CompleteStaining [0..1]: percentage
HER2byISH [0..1]:
    AverageHER2SignalsPerCell [1..1]: decimal
    AverageCEP17SignalsPerCell [0..1]: decimal
    HER2CEP17Ratio [0..1]: decimal

Tumor Grade

All invasive breast carcinomas should be assigned a histologic grade. The College of American Pathologists (CAP) stipulates the Nottingham combined histologic grade should be used. The grade is determined by assessing morphologic features (tubule formation, nuclear pleomorphism, and mitotic count), and assigning a score of 1 to 3 to each feature. The combined score of the three categories is then divided into categories to produce the overall histologic grade.

Possible data structure for histologic grade:

HistologicGrade: either grade cannot be assessed (GX), low combined histologic grade (G1), intermediate combined histologic grade (G2), 
or high combined histologic grade (G3)
TubuleFormation [0..1]: either 1, 2, or 3
NuclearPleomorphism [0..1]: either 1, 2, or 3
MitoticCount [0..1]: either 1, 2, or 3


Profile Creator

The Profile Creator will come to the Connectathon with an domain analysis model of breast cancer staging. (Alternatively, participants may bring a model from a different healthcare domain.

Prior to the Connectathon, the Profile Creator will model the example using their own methodology, and execute the complete set of steps necessary to generate FHIR Profiles.

At the Connectathon, each Profile Creator will summarize their pre-work, from start to finish, with all other participants. We’ll look together at the simplicity, extensibility, repeatability and accuracy of the various workflows.

The Profile Creators will then take what they learn from fellow participants to iterate their own workflows, redo the exercise and share the outcomes.

If time permits, the Profile Creator will then upload these profiles to a FHIR server, either a publicly-available one controlled by the Creator, or one of the ones listed on the Connectathon page

Resources for Profile Creators

Profile Server

The Profile Server is the actual FHIR server that accepts the FHIR profiles. As a stretch goal, the Profile Creator will generate and upload FHIR patients that conform to the generated FHIR Profiles to the Profile Server.

http://clinfhir.com/ is an option for building patients to upload to the Profile Server, as it allows for building linked resources that conform to profiles.

Patient Endpoint (optional)

The Patient Endpoint is a FHIR client that requests the FHIR patients from the Profile Server, and validates their adherence to the FHIR profiles generated by the Profile Creator. This may be done in a manual or automated fashion.


Develop Clinical Domain Model

Action: Profile Creator creates a logical representation (SHR or otherwise) of their clinical model
Precondition: Profile creator must be familiar with the breast cancer staging model prior to the Connectathon (or must have developed an unstructured clinical model in another domain)
Success Criteria: tooling formally represents the clinical model
Bonus point:

Create Mapping Logic

Action: Relate the logical model components to FHIR resources
Precondition: Profile creator must have developed the logical model
Success Criteria: Formal representation of mapping logic
Bonus point:

Run Tooling and Create FHIR Profiles

Action: Profile Creator creates structure definitions for FHIR Profiles corresponding to the domain model being built
Precondition: Profile creator should have a formal domain model and mapping logic
Success Criteria: FHIR Profiles produced
Bonus point:

Create Implementation Guide and Validate FHIR Profiles

Action: Generate a FHIR Implementation Guide and validate the FHIR profiles
Precondition: FHIR profiles generated
Success Criteria: IG generated and visually validated
Bonus point: Profiles loaded into a FHIR server and computationally validated