This wiki has undergone a migration to Confluence found Here
<meta name="googlebot" content="noindex">

Difference between revisions of "201801 Bulk Data"

From HL7Wiki
Jump to navigation Jump to search
 
(20 intermediate revisions by 6 users not shown)
Line 4: Line 4:
 
=Track Name=
 
=Track Name=
  
Bulk Data Access
+
Bulk Data Access ([http://www.healthintersections.com.au/?p=2689 see blog post])
 +
 
 +
=Track Overview=
 +
 
 +
Please review: [https://docs.google.com/presentation/d/1QpMUIohFEJJcxxrcWfx80sRtXDWFIQBCIKw1BXsy454/present Slide Presentation from Kick-Off] and [https://youtu.be/y9_VUgv9q84 Orientation Meeting Recording]
 +
 
 +
Please fill out: [https://docs.google.com/spreadsheets/d/11QbX8iB49s_-YMP6GmEej41uT58ir4f6W2EvMEe4cUg/edit#gid=1088922430 Track Registration Spreadsheet]
 +
 
 +
Bulk Data API reference implementations:
 +
* [https://github.com/smart-on-fhir/bulk-data-server Bulk Data Server (NodeJs)] ([https://bulk-data.smarthealthit.org online demo])
 +
* [https://github.com/smart-on-fhir/sample-apps-stu3/tree/master/fhir-downloader Sample command line client (NodeJs)] 
  
 
==Submitting WG/Project/Implementer Group==
 
==Submitting WG/Project/Implementer Group==
Line 14: Line 24:
 
This track is created at the request of the ONC. ONC provided this justification:
 
This track is created at the request of the ONC. ONC provided this justification:
  
* Ecosystem outcome expected to enable many specific use case/business needs: Providers and organizations accountable for managing the health of populations can efficiently access to large volumes of informationon a specified group of individuals without having to access one record at a time. This population-level access would enable these stakeholders to: assess the value of the care provided, conduct population analyses, identify at-risk populations, and track progress on quality improvement.
+
* Ecosystem outcome expected to enable many specific use case/business needs: Providers and organizations accountable for managing the health of populations can efficiently access to large volumes of information on a specified group of individuals without having to access one record at a time. This population-level access would enable these stakeholders to: assess the value of the care provided, conduct population analyses, identify at-risk populations, and track progress on quality improvement.
 
* Technical Expectations: There would be a standardized method built into the FHIR standard to support access to and transfer of a large amount of data on a specified group of patients and that such method could be reused for any number of specific business purposes.
 
* Technical Expectations: There would be a standardized method built into the FHIR standard to support access to and transfer of a large amount of data on a specified group of patients and that such method could be reused for any number of specific business purposes.
 
* Policy Expectations: All existing legal requirements for accessing identifiable patient information via other bulk methods (e.g., ETL) used today would continue to apply (e.g., through HIPAA BAAs/contracts, Data Use Agreements, etc).
 
* Policy Expectations: All existing legal requirements for accessing identifiable patient information via other bulk methods (e.g., ETL) used today would continue to apply (e.g., through HIPAA BAAs/contracts, Data Use Agreements, etc).
Line 20: Line 30:
 
==Proposed Track Lead==
 
==Proposed Track Lead==
  
Dan Gottleib with support from Grahame Grieve ([[Connectathon_Track_Lead_Responsibilities]])
+
Dan Gottlieb and Josh Mandel with support from Grahame Grieve ([[Connectathon_Track_Lead_Responsibilities]])
  
 
==Expected participants==
 
==Expected participants==
  
 
* Grahame Grieve
 
* Grahame Grieve
 +
* Dan Gottlieb
 +
* Josh Mandel
 
* Cerner  
 
* Cerner  
 
* Epic
 
* Epic
 
* CARIN health alliance
 
* CARIN health alliance
 +
* Ken Kawamoto (along with CDS Hooks)
  
 
==Roles==
 
==Roles==
  
Data Provider: provides data in the manner specified by the bulk data API
+
* Data Provider: provides data in the manner specified by the bulk data API
Data Consumer: consumes data in the manner specified by the bulk data API and displays/processes the data
+
* Data Consumer: consumes data in the manner specified by the bulk data API and displays/processes the data
 +
 
 +
==Scenarios==
 +
 
 +
The bulk data track is divided into the following scenarios:
 +
 
 +
# Full bulk data export, open server without security
 +
# Targeted bulk data export, open server without security
 +
# Secured bulk data export using SMART ([http://docs.smarthealthit.org/authorization/backend-services/ backend services specification])
 +
 
 +
==Scenario 1: Full Bulk Data Export (Open Endpoint)==
 +
 
 +
See http://www.healthintersections.com.au/?p=2689 for a description of the workflow.
 +
 
 +
====Action====
 +
 
 +
1. Data Consumer requests a bulk data export
 +
<pre>
 +
GET [base]/Patient/$everything
 +
Accept: application/fhir+ndjson
 +
Prefer: respond-async
 +
</pre>
 +
 
 +
2. Data Provider responds with a location for progress updates
 +
<pre>
 +
HTTP/1.1 202 Accepted
 +
Content-Location: [url for subsequent progress requests]
 +
</pre>
 +
 
 +
3. Data Consumer requests a progress update
 +
<pre>
 +
GET [url for subsequent progress requests]
 +
</pre>
 +
 
 +
4. Data Provider responds with the operation's interim status (optional)
 +
<pre>
 +
HTTP/1.1 202 Accepted
 +
X-Progress: [optional <100 character string indicating progress]
 +
</pre>
 +
 
 +
''Note: a 5xx Error indicates that preparing the response has failed. The body is an OperationOutcome describing the error.''
  
 +
5. Data Provider responds with links to the generated data files
 +
<pre>
 +
HTTP/1.1 200 OK
 +
Content-Type: application/fhir+ndjson
 +
X-FHIR-Links-Require-Authorization: false
 +
Expires: Wed, 21 Oct 2018 07:28:00 GMT
 +
Link: <https://data/file/location/0001.Patient.ndjson>,
 +
      <https://data/file/location/0001.Observation.ndjson>,
 +
      <https://data/file/location/0002.Observation.ndjson>,
 +
      <...>
 +
</pre>
  
==Scenarios==
+
''Note: newlines are for formatting purposes only and are not valid in http headers. See https://tools.ietf.org/search/rfc5988#page-6 for details on Link header formatting. The data provider may specify the server path, but files must end in '''[FHIR Type].ndjson'''.''
 +
 
 +
6. Data Consumer requests each of the generated files
 +
<pre>
 +
GET https://data/file/location/0001.Patient.ndjson
 +
GET https://data/file/location/0001.Observation.ndjson
 +
GET https://data/file/location/0002.Observation.ndjson
 +
</pre>
 +
 
 +
7. Optionally, Data Consumer may ETL and process these files.
 +
 
 +
==Scenario 2: Targeted Bulk Data Export (Open Endpoint)==
 +
 
 +
See http://www.healthintersections.com.au/?p=2689 for a description of the workflow.
 +
 
 +
====Action====
 +
 
 +
1. Data Consumer issues one or more of the following requests:
 +
<pre>
 +
GET [base]/Patient/$everything?start=[date-time]&_type=[FHIR Resource Type],[FHIR Resource Type]
 +
Accept: application/fhir+ndjson
 +
Prefer: respond-async
 +
</pre>
 +
 
 +
<pre>
 +
GET [base]/Group/[id]/$everything
 +
Accept: application/fhir+ndjson
 +
Prefer: respond-async
 +
</pre>
  
The bulk data track is divided into 3 scenarios:
+
<pre>
 +
GET [base]/Group/[id]/$everything?start=[date-time]&_type=[FHIR Resource Type],[FHIR Resource Type]
 +
Accept: application/fhir+ndjson
 +
Prefer: respond-async
 +
</pre>
  
# asynchronous access
+
2. Subsequent workflow proceeds as in Scenario 1
# nd-json support
 
# smart back-end services
 
  
Participants are encouraged to support either of the following initiation points:
+
==Scenario 3: Secured Bulk Data Export (SMART Backend Services Protected Endpoint)==
* GET [base]/Patient/[id]/$everything
 
* GET [base]/Patient/$everything
 
  
However for the purposes of this connectathon track, data providers can nominate any other URL to initiate the process, as long as it returns a FHIR Bundle
+
====Action====
  
===Scenario Step 1: Asynchronous Access===
+
1. Data Consumer registers itself with Data Provider and obtains an access token as described in the SMART ([http://docs.smarthealthit.org/authorization/backend-services/ backend services specification])
:Action: <!--Who does what?  (Use the role names listed above when referring to the participants -->
 
:Precondition: Use the nominated URL for the
 
:Success Criteria: <!-- How will the participants know if the test was successful? -->
 
:Bonus point: <!-- Any additional complexity to make the scenario more challenging -->
 
  
<!-- Provide a description of each task -->
+
2. Data Consumer and Provider follow the workflows described in Scenario 1 and 2 with the addition of an authorization header in each request. If the <code>X-FHIR-Links-Require-Authorization</code> header in the final async response is not set to <code>true</code>, the Data Consumer should not include the authorization token in the file download requests.
  
 
==TestScript(s)==
 
==TestScript(s)==
<!-- Optional (for initial proposal): Provide links to the TestScript instance(s) that define the behavior to be tested
+
 
These should be committed to SVN under trunk/connectathons/[connectathon]
+
This is an API extension, and will require extensions to the test script resource in order to be tested
-->
 
  
 
==Security and Privacy Considerations==
 
==Security and Privacy Considerations==
<!-- Optional (for initial proposal): Address the topic of Privacy and Security.
+
 
* What Authentication/Authorization will be used (e.g. SMART on FHIR (OAuth), HEART (UMA/OAuth), IHE IUA (OAuth), generic OAuth, generic SAML, mutual-Auth-TLS), or explicitly indicate that it is out of scope and left to implementations.
+
* Obviously, access to APIs like this in production require both authentication and consent
* What Privacy Consent management will be used? When the Consent Resource is used, define how.
+
* Step 3 tests out application authentication
* What Audit Logging will be used? When the AuditEvent Resource is used, define expectations of what events will be logged and what each AuditEvent will contain.
+
* For now, it is assumed that consent is managed elsewhere, though extensions may be added to the stream for this (see [[https://chat.fhir.org/#narrow/stream/implementers/subject/Bulk.20Data.20Access]])
* How will Provenance be used? Provenance use should be mandated when data is imported from other systems, so as to track that source of that data. Provenance should be used when data is authored by unusual sources, such as the Patient themselves or devices.  
+
* Audit: For now, it is assumed that applications will audit the initial FHIR retrieval, and a smart on fhir login, but there are no rules about that
* How will security-labels be used? Security-labels are meta tags used to classify the data into various confidentiality, sensitivity, and integrity classifications. These security-labels are then available for use and available for access control decisions.
+
* The <code>X-FHIR-Links-Require-Authorization</code> header is a proposal to support both servers that use SMART authentication to secure the generated files and those that leverage other techniques (eg S3 signed URLs).
I am happy to help: JohnMoehrke@gmail.com -- security co-chair
 
-->
 

Latest revision as of 21:20, 22 January 2018


Track Name

Bulk Data Access (see blog post)

Track Overview

Please review: Slide Presentation from Kick-Off and Orientation Meeting Recording

Please fill out: Track Registration Spreadsheet

Bulk Data API reference implementations:

Submitting WG/Project/Implementer Group

FHIR-I

Justification

This track is created at the request of the ONC. ONC provided this justification:

  • Ecosystem outcome expected to enable many specific use case/business needs: Providers and organizations accountable for managing the health of populations can efficiently access to large volumes of information on a specified group of individuals without having to access one record at a time. This population-level access would enable these stakeholders to: assess the value of the care provided, conduct population analyses, identify at-risk populations, and track progress on quality improvement.
  • Technical Expectations: There would be a standardized method built into the FHIR standard to support access to and transfer of a large amount of data on a specified group of patients and that such method could be reused for any number of specific business purposes.
  • Policy Expectations: All existing legal requirements for accessing identifiable patient information via other bulk methods (e.g., ETL) used today would continue to apply (e.g., through HIPAA BAAs/contracts, Data Use Agreements, etc).

Proposed Track Lead

Dan Gottlieb and Josh Mandel with support from Grahame Grieve (Connectathon_Track_Lead_Responsibilities)

Expected participants

  • Grahame Grieve
  • Dan Gottlieb
  • Josh Mandel
  • Cerner
  • Epic
  • CARIN health alliance
  • Ken Kawamoto (along with CDS Hooks)

Roles

  • Data Provider: provides data in the manner specified by the bulk data API
  • Data Consumer: consumes data in the manner specified by the bulk data API and displays/processes the data

Scenarios

The bulk data track is divided into the following scenarios:

  1. Full bulk data export, open server without security
  2. Targeted bulk data export, open server without security
  3. Secured bulk data export using SMART (backend services specification)

Scenario 1: Full Bulk Data Export (Open Endpoint)

See http://www.healthintersections.com.au/?p=2689 for a description of the workflow.

Action

1. Data Consumer requests a bulk data export

GET [base]/Patient/$everything
Accept: application/fhir+ndjson
Prefer: respond-async

2. Data Provider responds with a location for progress updates

HTTP/1.1 202 Accepted
Content-Location: [url for subsequent progress requests]

3. Data Consumer requests a progress update

GET [url for subsequent progress requests]

4. Data Provider responds with the operation's interim status (optional)

HTTP/1.1 202 Accepted
X-Progress: [optional <100 character string indicating progress]

Note: a 5xx Error indicates that preparing the response has failed. The body is an OperationOutcome describing the error.

5. Data Provider responds with links to the generated data files

HTTP/1.1 200 OK
Content-Type: application/fhir+ndjson
X-FHIR-Links-Require-Authorization: false
Expires: Wed, 21 Oct 2018 07:28:00 GMT 
Link: <https://data/file/location/0001.Patient.ndjson>,
      <https://data/file/location/0001.Observation.ndjson>,
      <https://data/file/location/0002.Observation.ndjson>,
      <...>

Note: newlines are for formatting purposes only and are not valid in http headers. See https://tools.ietf.org/search/rfc5988#page-6 for details on Link header formatting. The data provider may specify the server path, but files must end in [FHIR Type].ndjson.

6. Data Consumer requests each of the generated files

GET https://data/file/location/0001.Patient.ndjson
GET https://data/file/location/0001.Observation.ndjson
GET https://data/file/location/0002.Observation.ndjson

7. Optionally, Data Consumer may ETL and process these files.

Scenario 2: Targeted Bulk Data Export (Open Endpoint)

See http://www.healthintersections.com.au/?p=2689 for a description of the workflow.

Action

1. Data Consumer issues one or more of the following requests:

GET [base]/Patient/$everything?start=[date-time]&_type=[FHIR Resource Type],[FHIR Resource Type]
Accept: application/fhir+ndjson
Prefer: respond-async
GET [base]/Group/[id]/$everything
Accept: application/fhir+ndjson
Prefer: respond-async
GET [base]/Group/[id]/$everything?start=[date-time]&_type=[FHIR Resource Type],[FHIR Resource Type]
Accept: application/fhir+ndjson
Prefer: respond-async

2. Subsequent workflow proceeds as in Scenario 1

Scenario 3: Secured Bulk Data Export (SMART Backend Services Protected Endpoint)

Action

1. Data Consumer registers itself with Data Provider and obtains an access token as described in the SMART (backend services specification)

2. Data Consumer and Provider follow the workflows described in Scenario 1 and 2 with the addition of an authorization header in each request. If the X-FHIR-Links-Require-Authorization header in the final async response is not set to true, the Data Consumer should not include the authorization token in the file download requests.

TestScript(s)

This is an API extension, and will require extensions to the test script resource in order to be tested

Security and Privacy Considerations

  • Obviously, access to APIs like this in production require both authentication and consent
  • Step 3 tests out application authentication
  • For now, it is assumed that consent is managed elsewhere, though extensions may be added to the stream for this (see [[1]])
  • Audit: For now, it is assumed that applications will audit the initial FHIR retrieval, and a smart on fhir login, but there are no rules about that
  • The X-FHIR-Links-Require-Authorization header is a proposal to support both servers that use SMART authentication to secure the generated files and those that leverage other techniques (eg S3 signed URLs).