This wiki has undergone a migration to Confluence found Here
<meta name="googlebot" content="noindex">

Difference between revisions of "201801 Bulk Data"

From HL7Wiki
Jump to navigation Jump to search
(Streamlined scenarios to focus on ndjson bulk data workflows)
Line 4: Line 4:
 
=Track Name=
 
=Track Name=
  
Bulk Data Access
+
Bulk Data Access ([http://www.healthintersections.com.au/?p=2689 see blog post])
  
 
==Submitting WG/Project/Implementer Group==
 
==Submitting WG/Project/Implementer Group==
Line 25: Line 25:
  
 
* Grahame Grieve
 
* Grahame Grieve
 +
* Dan Gottlieb
 +
* Josh Mandel
 
* Cerner  
 
* Cerner  
 
* Epic
 
* Epic
Line 34: Line 36:
 
Data Consumer: consumes data in the manner specified by the bulk data API and displays/processes the data
 
Data Consumer: consumes data in the manner specified by the bulk data API and displays/processes the data
  
 +
==Scenarios==
 +
 +
The bulk data track is divided into the following scenarios:
  
==Scenarios==
+
# Full bulk data export, open server without security
 +
# Targeted bulk data export, open server without security
 +
# Secured bulk data export using SMART ([http://docs.smarthealthit.org/authorization/backend-services/ backend services specification])
  
The bulk data track is divided into 3 scenarios:
+
==Scenario 1: Full Bulk Data Export (Open Endpoint)==
  
# asynchronous access
+
====Action====
# nd-json support
 
# smart back-end services
 
  
Participants are encouraged to support either of the following initiation points:
+
1. Data Consumer requests a bulk data export
* GET [base]/Patient/[id]/$everything
+
<pre>
* GET [base]/Patient/$everything
+
GET [base]/Patient/$everything
 +
Accept: application/fhir+ndjson
 +
Prefer: respond-async
 +
</pre>
  
However for the purposes of this connectathon track, data providers can nominate any other URL to initiate the process, as long as it returns a FHIR Bundle
+
2. Data Provider responds with a location for progress updates
 +
<pre>
 +
HTTP/1.1 202 Accepted
 +
Content-Location: [url for subsequent progress requests]
 +
</pre>
  
===Scenario Step 1: Asynchronous Access===
+
3. Data Consumer requests a progress update
 +
<pre>
 +
GET [url for subsequent progress requests]
 +
</pre>
  
====Action====
+
4. Data Provider responds with the operation's interim status (optional)
Data Consumer makes a request of the nominated URL, with the Preference Header set to 'respond-async', and a format of xml or json as negotiated with the server. Data provider sees this preference header, and returns a 202 Accepted header, and a Content-Location at which the client can use to access the response. Subsquently, the client repeatedly queries this content location using GET [content-location]. The response can be one of 3 outcomes:
+
<pre>
** a 202 Accepted that indicates that processing is still happening. This may have an “X-Progress header” that provides some indication of progress to the user (displayed as is to the user – no format restrictions but should be <100 characters in length).
+
HTTP/1.1 202 Accepted
* a 5xx Error that indicates that preparing the response has failed. The body is an OperationOutcome describing the error
+
X-Progress: [optional <100 character string indicating progress]
* a 200 OK with the response for the original request. This response has one or more Link: headers (see rfc 5988) that list the files that are available for download as a result of servicing the request. The response can also carry a X-Available-Until header to indicate when the response will no longer be available
+
</pre>
  
The client repeats this request periodically until it gets either a 200 or a 5xx
+
''Note: a 5xx Error indicates that preparing the response has failed. The body is an OperationOutcome describing the error.''
  
Precondition: Use the nominated URL for the particular Data Provider
+
5. Data Provider responds with links to the generated data files
Success Criteria: client is able to retrieve the bundle
+
<pre>
 +
HTTP/1.1 200 OK
 +
Content-Type: application/fhir+ndjson
 +
X-FHIR-Links-Require-Authorization: false
 +
Expires: Wed, 21 Oct 2018 07:28:00 GMT
 +
Link: <https://data/file/location/0001.Patient.ndjson>,
 +
      <https://data/file/location/0001.Observation.ndjson>,
 +
      <https://data/file/location/0002.Observation.ndjson>,
 +
      <...>
 +
</pre>
  
Note: Data providers and Consumers can skip this step and only implement step #2
+
''Note: newlines are for formatting purposes only and are not valid in http headers. See https://tools.ietf.org/search/rfc5988#page-6 for details on Link header formatting. The data provider may specify the server path, but files must end in '''[FHIR Type].ndson'''.''
  
 +
6. Data Consumer requests each of the generated files
 +
<pre>
 +
GET https://data/file/location/0001.Patient.ndjson
 +
GET https://data/file/location/0001.Observation.ndjson
 +
GET https://data/file/location/0002.Observation.ndjson
 +
</pre>
  
 +
7. Optionally, Data Consumer may ETL and process these files.
  
===Scenario Step 2: ND-json support===
+
==Scenario 2: Targeted Bulk Data Export (Open Endpoint)==
  
 
====Action====
 
====Action====
Data Consumer specifies the content-type in the request as 'application/fhir+ndjson'.
 
  
Instead of a single bundle (per step 1), the Data provider returns a set of links in the link header(s), one for each type of resource in the return bundle.
+
1. Data Consumer issues one or more of the following requests:
 +
<pre>
 +
GET [base]/Patient/$everything?start=[date-time]&_type=[FHIR Resource Type],[FHIR Resource Type]
 +
Accept: application/fhir+ndjson
 +
Prefer: respond-async
 +
</pre>
  
Each link in the return header contains a set of resource of the same type, in order to make processing the content easy using existing tools. Since data consumers may need to specify import scripts etc and/or route the content for processing before loading, the consumer needs to know the resource type for each link. For now, in order to support this, each link shall end with the string [type].ndjson, so that the type can be determined without loading the content.
+
<pre>
 +
GET [base]/Group/[id]/$everything
 +
Accept: application/fhir+ndjson
 +
Prefer: respond-async
 +
</pre>
  
==== Criteria ====
+
<pre>
 +
GET [base]/Group/[id]/$everything?start=[date-time]&_type=[FHIR Resource Type],[FHIR Resource Type]
 +
Accept: application/fhir+ndjson
 +
Prefer: respond-async
 +
</pre>
  
* Precondition: specify the correct content-type
+
2. Subsequent workflow proceeds as in Scenario 1
* Success Criteria: Client can retrieve the links. In addition, at least one link SHALL contain the bundle itself, and the bundle itself shall have the id 'start-here'. This identifies the bundle that contains the actual result set, and may be used by the data consumer to navigate through the set of data that is returned
 
  
===Scenario Step 3: Smart Backend Services Support ===
+
==Scenario 3: Secured Bulk Data Export==
  
 
====Action====
 
====Action====
Data Consumer uses the Smart backend services spec to create a session prior to the calls in steps #1/#2
 
 
See [[http://docs.smarthealthit.org/authorization/backend-services/]]. For the purpose of this connectathon, servers need only support the scope system/*.read
 
 
 
==== Criteria ====
 
  
* Precondition: Application is registered with the server
+
1. Data Consumer registers itself with Data Provider and obtains an access token as described in the SMART ([http://docs.smarthealthit.org/authorization/backend-services/ backend services specification])
* Success Criteria: Client can contain an access token that grants them the right to make the calls in step #1 and Step #2
 
  
Success criteria: the download links are only available to clients that have the same set of scopes as the original request
+
2. Data Consumer and Provider follow the workflows described in Scenario 1 and 2 with the addition of an authorization header in each request. If the <code>X-FHIR-Links-Require-Authorization</code> header in the final async response is not set to <code>true</code>, the Data Consumer should not include the authorization token in the file download requests.
  
 
==TestScript(s)==
 
==TestScript(s)==
Line 103: Line 138:
 
* Obviously, access to APIs like this in production require both authentication and consent
 
* Obviously, access to APIs like this in production require both authentication and consent
 
* Step 3 tests out application authentication  
 
* Step 3 tests out application authentication  
* for now, it is assumed that consent is managed elsewhere, though extensions may be added to the stream for this (see [[https://chat.fhir.org/#narrow/stream/implementers/subject/Bulk.20Data.20Access]])
+
* For now, it is assumed that consent is managed elsewhere, though extensions may be added to the stream for this (see [[https://chat.fhir.org/#narrow/stream/implementers/subject/Bulk.20Data.20Access]])
* Audit: for now, it is assumed that applications will audit the initial FHIR retrieval, and a smart on fhir login, but there are no rules about that
+
* Audit: For now, it is assumed that applications will audit the initial FHIR retrieval, and a smart on fhir login, but there are no rules about that
 +
* The <code>X-FHIR-Links-Require-Authorization</code> header is a proposal to support both servers that use SMART authentication to secure the generated files and those that leverage other techniques (eg S3 signed URLs).

Revision as of 21:04, 6 November 2017


Track Name

Bulk Data Access (see blog post)

Submitting WG/Project/Implementer Group

FHIR-I

Justification

This track is created at the request of the ONC. ONC provided this justification:

  • Ecosystem outcome expected to enable many specific use case/business needs: Providers and organizations accountable for managing the health of populations can efficiently access to large volumes of informationon a specified group of individuals without having to access one record at a time. This population-level access would enable these stakeholders to: assess the value of the care provided, conduct population analyses, identify at-risk populations, and track progress on quality improvement.
  • Technical Expectations: There would be a standardized method built into the FHIR standard to support access to and transfer of a large amount of data on a specified group of patients and that such method could be reused for any number of specific business purposes.
  • Policy Expectations: All existing legal requirements for accessing identifiable patient information via other bulk methods (e.g., ETL) used today would continue to apply (e.g., through HIPAA BAAs/contracts, Data Use Agreements, etc).

Proposed Track Lead

Dan Gottlieb and Josh Mandel with support from Grahame Grieve (Connectathon_Track_Lead_Responsibilities)

Expected participants

  • Grahame Grieve
  • Dan Gottlieb
  • Josh Mandel
  • Cerner
  • Epic
  • CARIN health alliance

Roles

Data Provider: provides data in the manner specified by the bulk data API Data Consumer: consumes data in the manner specified by the bulk data API and displays/processes the data

Scenarios

The bulk data track is divided into the following scenarios:

  1. Full bulk data export, open server without security
  2. Targeted bulk data export, open server without security
  3. Secured bulk data export using SMART (backend services specification)

Scenario 1: Full Bulk Data Export (Open Endpoint)

Action

1. Data Consumer requests a bulk data export

GET [base]/Patient/$everything
Accept: application/fhir+ndjson
Prefer: respond-async

2. Data Provider responds with a location for progress updates

HTTP/1.1 202 Accepted
Content-Location: [url for subsequent progress requests]

3. Data Consumer requests a progress update

GET [url for subsequent progress requests]

4. Data Provider responds with the operation's interim status (optional)

HTTP/1.1 202 Accepted
X-Progress: [optional <100 character string indicating progress]

Note: a 5xx Error indicates that preparing the response has failed. The body is an OperationOutcome describing the error.

5. Data Provider responds with links to the generated data files

HTTP/1.1 200 OK
Content-Type: application/fhir+ndjson
X-FHIR-Links-Require-Authorization: false
Expires: Wed, 21 Oct 2018 07:28:00 GMT 
Link: <https://data/file/location/0001.Patient.ndjson>,
      <https://data/file/location/0001.Observation.ndjson>,
      <https://data/file/location/0002.Observation.ndjson>,
      <...>

Note: newlines are for formatting purposes only and are not valid in http headers. See https://tools.ietf.org/search/rfc5988#page-6 for details on Link header formatting. The data provider may specify the server path, but files must end in [FHIR Type].ndson.

6. Data Consumer requests each of the generated files

GET https://data/file/location/0001.Patient.ndjson
GET https://data/file/location/0001.Observation.ndjson
GET https://data/file/location/0002.Observation.ndjson

7. Optionally, Data Consumer may ETL and process these files.

Scenario 2: Targeted Bulk Data Export (Open Endpoint)

Action

1. Data Consumer issues one or more of the following requests:

GET [base]/Patient/$everything?start=[date-time]&_type=[FHIR Resource Type],[FHIR Resource Type]
Accept: application/fhir+ndjson
Prefer: respond-async
GET [base]/Group/[id]/$everything
Accept: application/fhir+ndjson
Prefer: respond-async
GET [base]/Group/[id]/$everything?start=[date-time]&_type=[FHIR Resource Type],[FHIR Resource Type]
Accept: application/fhir+ndjson
Prefer: respond-async

2. Subsequent workflow proceeds as in Scenario 1

Scenario 3: Secured Bulk Data Export

Action

1. Data Consumer registers itself with Data Provider and obtains an access token as described in the SMART (backend services specification)

2. Data Consumer and Provider follow the workflows described in Scenario 1 and 2 with the addition of an authorization header in each request. If the X-FHIR-Links-Require-Authorization header in the final async response is not set to true, the Data Consumer should not include the authorization token in the file download requests.

TestScript(s)

This is an API extension, and will require extensions to the test script resource in order to be tested

Security and Privacy Considerations

  • Obviously, access to APIs like this in production require both authentication and consent
  • Step 3 tests out application authentication
  • For now, it is assumed that consent is managed elsewhere, though extensions may be added to the stream for this (see [[1]])
  • Audit: For now, it is assumed that applications will audit the initial FHIR retrieval, and a smart on fhir login, but there are no rules about that
  • The X-FHIR-Links-Require-Authorization header is a proposal to support both servers that use SMART authentication to secure the generated files and those that leverage other techniques (eg S3 signed URLs).