201801 Bulk Data
Bulk Data Access (see blog post)
Submitting WG/Project/Implementer Group
This track is created at the request of the ONC. ONC provided this justification:
- Ecosystem outcome expected to enable many specific use case/business needs: Providers and organizations accountable for managing the health of populations can efficiently access to large volumes of informationon a specified group of individuals without having to access one record at a time. This population-level access would enable these stakeholders to: assess the value of the care provided, conduct population analyses, identify at-risk populations, and track progress on quality improvement.
- Technical Expectations: There would be a standardized method built into the FHIR standard to support access to and transfer of a large amount of data on a specified group of patients and that such method could be reused for any number of specific business purposes.
- Policy Expectations: All existing legal requirements for accessing identifiable patient information via other bulk methods (e.g., ETL) used today would continue to apply (e.g., through HIPAA BAAs/contracts, Data Use Agreements, etc).
Proposed Track Lead
Dan Gottlieb and Josh Mandel with support from Grahame Grieve (Connectathon_Track_Lead_Responsibilities)
- Grahame Grieve
- Dan Gottlieb
- Josh Mandel
- CARIN health alliance
- Data Provider: provides data in the manner specified by the bulk data API
- Data Consumer: consumes data in the manner specified by the bulk data API and displays/processes the data
The bulk data track is divided into the following scenarios:
- Full bulk data export, open server without security
- Targeted bulk data export, open server without security
- Secured bulk data export using SMART (backend services specification)
Scenario 1: Full Bulk Data Export (Open Endpoint)
See http://www.healthintersections.com.au/?p=2689 for a description of the workflow.
1. Data Consumer requests a bulk data export
GET [base]/Patient/$everything Accept: application/fhir+ndjson Prefer: respond-async
2. Data Provider responds with a location for progress updates
HTTP/1.1 202 Accepted Content-Location: [url for subsequent progress requests]
3. Data Consumer requests a progress update
GET [url for subsequent progress requests]
4. Data Provider responds with the operation's interim status (optional)
HTTP/1.1 202 Accepted X-Progress: [optional <100 character string indicating progress]
Note: a 5xx Error indicates that preparing the response has failed. The body is an OperationOutcome describing the error.
5. Data Provider responds with links to the generated data files
HTTP/1.1 200 OK Content-Type: application/fhir+ndjson X-FHIR-Links-Require-Authorization: false Expires: Wed, 21 Oct 2018 07:28:00 GMT Link: <https://data/file/location/0001.Patient.ndjson>, <https://data/file/location/0001.Observation.ndjson>, <https://data/file/location/0002.Observation.ndjson>, <...>
Note: newlines are for formatting purposes only and are not valid in http headers. See https://tools.ietf.org/search/rfc5988#page-6 for details on Link header formatting. The data provider may specify the server path, but files must end in [FHIR Type].ndson.
6. Data Consumer requests each of the generated files
GET https://data/file/location/0001.Patient.ndjson GET https://data/file/location/0001.Observation.ndjson GET https://data/file/location/0002.Observation.ndjson
7. Optionally, Data Consumer may ETL and process these files.
Scenario 2: Targeted Bulk Data Export (Open Endpoint)
1. Data Consumer issues one or more of the following requests:
GET [base]/Patient/$everything?start=[date-time]&_type=[FHIR Resource Type],[FHIR Resource Type] Accept: application/fhir+ndjson Prefer: respond-async
GET [base]/Group/[id]/$everything Accept: application/fhir+ndjson Prefer: respond-async
GET [base]/Group/[id]/$everything?start=[date-time]&_type=[FHIR Resource Type],[FHIR Resource Type] Accept: application/fhir+ndjson Prefer: respond-async
2. Subsequent workflow proceeds as in Scenario 1
Scenario 3: Secured Bulk Data Export
1. Data Consumer registers itself with Data Provider and obtains an access token as described in the SMART (backend services specification)
2. Data Consumer and Provider follow the workflows described in Scenario 1 and 2 with the addition of an authorization header in each request. If the
X-FHIR-Links-Require-Authorization header in the final async response is not set to
true, the Data Consumer should not include the authorization token in the file download requests.
This is an API extension, and will require extensions to the test script resource in order to be tested
Security and Privacy Considerations
- Obviously, access to APIs like this in production require both authentication and consent
- Step 3 tests out application authentication
- For now, it is assumed that consent is managed elsewhere, though extensions may be added to the stream for this (see [])
- Audit: For now, it is assumed that applications will audit the initial FHIR retrieval, and a smart on fhir login, but there are no rules about that
X-FHIR-Links-Require-Authorizationheader is a proposal to support both servers that use SMART authentication to secure the generated files and those that leverage other techniques (eg S3 signed URLs).