This wiki has undergone a migration to Confluence found Here
<meta name="googlebot" content="noindex">

HL7 Data Provenance Project Space

From HL7Wiki
Jump to navigation Jump to search

Back to: Back to CBCC Main Page - Project Section

Meeting Info

  • Call start date: Jul 7, 2014 to Dec 29, 2014 3:00 PM - 4:00 PM EST
  • To join the calls, please register at:
  • Once registered you will receive an email confirming your registration with information you need to join the Webinar. VOIP & Local Access Numbers provided.
  • NOTE - we will not be using the HL7 call line that is provided by default for this call.


This page provides a focal point for the HL7 Data Provenance CDA Project: a hub for connecting to its artifacts, discussions, status and links to related projects and work groups. Most of the technical content for this project will be contained within its artifacts which will be linked via this page and stored on GForge.

Overview and Status

The US Office of the National Coordinator (ONC) in HHS, Office of the Chief Privacy Officer, has initiated the data provenance project within the ONC Standards and Interoperability Framework (S&I) to develop standards and guidance required to establish the means by which health information technology can be used to support clinical, organizational, and jurisdictional requirements to capture, manage, and convey provenance about health information. A key driver for this initiative that ONC references is the recommendation made in the President’s Council of Advisors on Science and Technology (PCAST)report “Realizing the Full Potential of Health Information Technology to Improve Healthcare for Americans: The Path Forward”, December 2010 that US healthcare more fully leveraging the value of provenance to optimize patient safety, privacy, and trust in nationwide health information exchange.

We recognize that there are existing literature and projects, which lend themselves to alignment and harmonization. This project will create a reusable building block that may serve as a core component for these projects going forward. The HL7 Data Provenance project has begun development of an implementation guide that constrains the CDA in support of the ONC Data Provenance Initiative use cases, which may be used in conjunction with other the HL7 CDA Implementation Guides. The project will consider and support future development of additional FHIR artifacts and HL7 codes and value sets as needed. Much of the technical development work, including use case artifacts, requirements documentation, vocabulary harmonization and new value sets, and CDA modeling, will be conducted under the auspices of the ONC Data Provenance Initiative within the Standards and Interoperability Framework [S&I]. Artifacts, project calendar, call logistics, and other resources are available on the S&I project page.

HL7 Data Provenance CDA IG DSTU Publication Version

HL7 Data Provenance CDA IG DSTU Pre-Publication Version

HL7 Data Provenance CDA IG September 2014 Ballot Draft

HL7 Data Provenance Project Scope Statement

HL7 Data Provenance Project September 2014 Ballot Material

HL7 DPROV Resources: Presentations, Tutorials, & Data Provenance Papers and Articles

DPROV CDA IG September 2014 Ballot Spreadsheet

S&I Data Provenance All-Hands and Tiger Team calls

Use Cases

Related HL7 Projects and Work Groups

DPROV Relationship of Provenance and Privacy

  • Helen Nissenbaum PRIVACY AS CONTEXTUAL INTEGRITY Paper is foundational for relationship between Provenance and Privacy by specifying a Privacy Framework that ties societal norms for appropriateness and disclosure of information to the context in which it was generated, and the normative expectation of actors that the context integrity of the domain and purpose of the information generated will dictate its further collection, access, use and disclosure: "The central thesis of this Article is that the benchmark of privacy is contextual integrity; that in any given situation, a complaint that privacy has been violated is sound in the event that one or the other types of the informational norms has been transgressed." "Norms of appropriateness dictate what information about persons is appropriate, or fitting, to reveal in a particular context. Generally, these norms circumscribe the type or nature of information about various individuals that, within a given context, is allowable, expected, or even demanded to be revealed." Norms of flow or distribution are the expectations of actors in a context about societal expectations about the appropriate flow/distribution/re-disclosure where doing otherwise is a violation of privacy. This paper underscores the importance of provenance in determining whether context integrity of collection, access, use, and disclosure maintain contextual integrity - the benchmark for privacy.

W3C PROV Project

Provenance refers to the sources of information, such as entities and processes, involved in producing or delivering an artifact. The provenance of information is crucial in deciding whether information is to be trusted, how it should be integrated with other diverse information sources, and how to give credit to its originators when reusing it. In an open and inclusive environment such as the Web, users find information that is often contradictory or questionable. People make trust judgments based on provenance that may or may not be explicitly offered to them. Reasoners in the Semantic Web would benefit from explicit representations of provenance to make informed trust judgments about the information they use. With the arrival of massive amounts of Semantic Web data (eg, Linked Open Data) information about the origin of that data, i.e., provenance, becomes an important factor in developing new Semantic Web applications. Therefore, a crucial enabler of the Semantic Web deployment is the ability to the explicitly express provenance that is accessible and understandable to machines and humans.

•Who played what role in creating the data. •Who owned the data. •Who contributed to the data. •How data was modified from its first revision. •How other data affected the current data. •What tools where used to generate each version of the data •etc. W3C Provenance Definition: "Provenance of a resource is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource. Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility. Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance."

Dissemination Exemplar Use Cases: privacy, engineering

  • M-Diss-UR 1: Verify that data, disseminated to some entity for processing, was processed for a purpose which was valid under some generally applied rules of validity, or as stated by the entity upon requesting the data. (privacy)
  • M-Diss-TR 1.1: Represent purposes of using data in a way which can be compared against the provenance of its usage.
  • M-Diss-TR 1.2: Represent the provenance of disseminated data in a way which allows its usage to be checked against pre-stated purposes.
  • M-Diss-TR 1.3: Provide mechanisms to examine data's provenance to check for correct usage according to pre-stated purpose.
  • M-Diss-TR 1.4: Make the provenance representation non forgeable (non-repudation, no man-in-the-middle attack)
  • M-Diss-UR 2: Verify that data, disseminated to some entity for processing, was processed only by that entity. (privacy)
  • M-Diss-TR 2.1: Ensure that any data's provenance information includes the verifiable identity of the entities by which it has been received or has been processed.
  • M-Diss-TR 2.2: Provide mechanisms to verify that data was processed by a particular entity only, by examining the provenance of that data.
  • M-Diss-UR 3: Verify that all of a set of data, disseminated to some entity for processing, was used in that processing. (privacy)
  • M-Diss-TR 3.1: Provide mechanisms to parse provenance information and determine whether all of a set of data was used in a well-defined process
  • Use Case private data use

Background and Current Practice Scenario

A lot of on-line facilities offer personalised services by requesting private information to their users. Such private information must be used under a set of rules that describe which processing can and cannot be performed over such data. If these usage rules are not followed, personal data could be exposed and used against the interest of its owner.

Evidence of the importance of this issue can be seen in legislative frameworks related to the use of private information, such as the Data Protection Act in the UK, the European Directive on Private Data, and Hippa and Safe Harbor in the US.


Here, we adopt Weitzner's notion of accountability [1]: "accountability must become a primary means through which society addresses appropriate use..." Information accountability means the use of information should be transparent so it is possible to determine whether a particular use is appropriate under a given set of rules, and that the system enables individuals and institutions to be held accountable for misuse.

The goal of this use case is to perform auditing tasks about previous usage of private data, and check that such usage is compliant with rules regulating the use of private information. Inspired by the UK Data Protection act, we identified the following specific tasks: Legal Purpose: To verify that a set of data was processed for a valid purpose. Declared Purpose Compliant: To verify that a set of data was used in a processing that is compatible with the purpose by which was collected. Authentication: To verify that a set of data, which was collected from a user, was used by processes that initiated such collection. Minimal Set: To verify that all the data that was collected from a user was used at some point.

Use Case Scenario

The general scenario structure is as follows.

1. Alice wants to interact with an online service. In order to do so, she needs to provide personal information.

2. The online service uses that personal information for a particular, pre-stated purpose.

3. Later, Alice suspects that the personal information was used in a way other than the pre-stated purpose.

4. Upon request, an independent authority determines Alice's doubts are founded and performs equivalent check across many individuals who have used the service.

It can be applied to a particular domain below, which gave the inspiration for this use case.

1. Alice wants to buy some medicine from an on-line pharmacy. In order to get her medicine, she needs to provide her name, address, date of birth, gender, social security number, the number of her clinic and her doctor’s name.

2. The pharmacy collected that set of data with the purpose of "on-line sales". So her name, address, date of birth, social security number, the number of her clinic and her doctor’s name are used to register the sale of that medicine with the Health Service. The name and address are used to send the medicine to Alice.

3. Later, the pharmacy creates a record of the monthly sales, which includes the medicine’s name and the quantity sold.

What if the pharmacy decides to include the Alice's name next to the medicine she bought in the record of monthly sales? Alice does not provide her name to be used in a record that could be used to find specific individuals that suffer from certain illnesses related to the medicines they bought. How can Alice be sure that her information was used in a way compatible with the purpose by which she initially send it? 

In practice, independent institutions, as the Information Commissioner in UK, make audits to verify that individuals or institutions that manage personal information are following the data protection rules, in that way they can be held accountable for information misuse.

If the pharmacy creates a register containing the information that plans to collect from Alice, the processes to be performed over it and the purpose of such information collection, then we can use that register as a set of rules that the pharmacy should follow when using Alice’s information.

If, at the same time, Alice and the pharmacy are asserting provenance information related to their actions, later, such provenance information can be compared against the registered set of rules to verify if the pharmacy effectively used Alice’s information in the right way.

Thus, if the pharmacy registers the creation of a record of monthly sales that includes medicines' name and the quantity sold related to the on-line sales purpose, then it can create the record but it cannot use Alice's name on it. If, despite this, the pharmacy does that, we can find it out by checking the provenance information related to such an activity to later make the pharmacy accountable for misusing Alice’s information.

Many of alternative on-line scenarios can be considered in this use case, such as, Universities, Facebook, Google, Governmental services, etc.

Problems and Limitations

Here are described the main technical challenges in this use case.

Institutions or individuals that manage personal information (in this case the pharmacy) should register in a well defined fashion the purposes and the way by which they plan to collect and use users’ information. This process is similar to the Notification Process established by the Information Commissioner Officer (see [2]). This registered information will be treated as the rules that such institutions should follow while processing personal information. An example of the document produced during the notification by a pharmacy can be found in [3]. This problem can be addressed by using semantic web technologies to represent purposes of collection, tasks performed over users' information and the set of information that will be collected from users. This is a metadata representation issue. 
All the entities involved (in this case the on-line pharmacy and Alice) need to capture in a standard way the provenance information related to their actions. In that way, the analysis of the actions of the entities can be automated. This is a provenance content and management issue. 
To effectively make entities accountable for misuse of information, we need to guarantee that the provenance information created by the involved entities implements some form of entity identification and provenance integrity. Then, if a problem is found in the processing of personal information, the right entity can be made accountable by checking its identity. At the same time, if provenance integrity is guaranteed, entities can be sure that the actions that they asserted are represented in the provenance information and any other entity was able to change it. This problem can be addressed by the use of cryptographic techniques, such as signatures to verify the entities’ identity and cryptographic hashes to check the integrity of provenance chains. This is a provenance content and management issue. 
Provenance information created by the entities involved in a processing can be compared against the registered rules to verify if they used personal information in the right way. This is a provenance use issue. 

Existing Work (optional)

Aldeco-Pérez, R. & Moreau, L. Provenance-based Auditing of Private Data Use International Academic Research Conference, Visions of Computer Science, 2008 [4]

Back to: Back to CBCC Main Page - Project Section