Data Submission Flows

Submitting Datasets to the Dataset Exchange API

There are two options for submitting datasets through the Dataset Exchange API:

  • One for those with dataset files at existing pre-signed URLS

  • One for users who want to request a pre-signed URL where they can upload a dataset file

These flows are available after Authentication to the Dataset Exchange API is completed.

📘

Option 1: A dataset in JSON or CSV format is already available at a pre-signed URL to load into the Dataset Exchange API. Learn more here!

If datasets (formatted as either JSON or CSV) are already available via a pre-signed URL available via AWS S3 or GCP Cloud Storage, it can be loaded directly to the Dataset Exchange API:

  1. Hit POST /datasets to create the new dataset in DDx API's database and define the schema.

  2. Once an id is returned for the newly created dataset in the response body of Step 1, hit POST /datasets/:id/records:load with the pre-signed URL in the body of the request.

  3. Done!

📘

Option 2: If not already available, request a pre-signed URL to load a dataset into the Dataset Exchange API. Learn more here!

To load a dataset to a pre-signed UR, request an upload URL from the Dataset Exchange API.

In order to generate a pre-signed URL and load a dataset:

  1. Hit POST /datasets to create the new dataset in our database and define the schema.

  2. Once an id is returned for the newly created dataset in the response body of Step 1, hit POST /datasets/:id/uploadUrl with a string specifying the content type of the dataset. Content type options are CSV or JSON. The /uploadUrl endpoint will return a pre-signed URL to a GCS bucket where the dataset can be loaded. Please note pre-signed URLs are only valid for 1 hour.

  3. Load the dataset to the pre-signed URL received in Step 2.

  4. Done!

🤔

Questions? We're here to help! Reach out to us at [email protected].