An API reference for the user-facing storage service APIs
Last updated
Last updated
The storage service provides three APIs:
The is a request to store a new bag in the storage service
The lets you track the progress of a bag through the storage service, until it's successfully stored or fails to store
The lets you retrieve a storage manifest that describes the contents of a successfully-stored bag
The get bag versions API lets you see all the versions of an external identifier stored in a given space
The storage service does not provide an API for retrieving the content of stored bags -- instead, you should retrieve content .
After you upload a gzip-tar-compressed bag to S3, you call the "Create Ingest" API to ask the storage service to store the bag. This is a POST request, passing the JSON-formatted ingest in the request body.
Request:
Parameters:
{externalIdentifier}
-- the identifier of the bag within this space. This must match the External-Identifier in the bag-info.txt
file.
{ingestType}
-- if this is the first bag with this external identifier in this space, use create
. Otherwise, use update
.
{uploadedTarGzS3Bucket}
and {uploadedTarGzS3Key}
-- the bucket and key where you uploaded the gzip-compressed bag.
{callbackUrl}
-- the URL you want the storage service to send a callback to when the ingest completes. If there's no callback, omit the "callback"
object.
Response:
Parameters:
Once an ingest has been created, you can use the "Get Ingest" API to track the progress of an ingest through the storage service. It tells you as a bag moves through the various processing steps, and whether it was successfully stored or rejected by the storage service.
Request:
Response:
Some of these fields are repeating values from the "create ingest" API. The following fields are added by the storage service as it processes the ingest:
{id}
-- the ID of the ingest. This is a UUID.
{version}
-- the version of the bag with this external identifier in the space, if assigned. This will be a string like v1
, v2
, v3
, and so on.
Callback status -- the status of the callback, if the storage service has been asked to send one after the ingest completes.
Status -- the status of the ingest. The ingest starts as accepted
, and moves into processing
when the bag is picked up by the storage service.
Events -- each ingest event describes a processing step in the storage service. The timestamps are UTC, and the descriptions are a single sentence explaining the start/finish of a processing step. Examples (each line is a separate event):
Unpacking started Unpacking succeeded - Unpacked 168 KB from 9 files Detecting bag root started Detecting bag root succeeded Verification (pre-replicating to archive storage) started
These events are meant to be human readable, and help a user understand where their bag is in the process. They're meant to be suitable for display in a dashboard or workflow tool.
If the bag can't be ingested correctly, the events should explain why, for example:
Unpacking started Unpacking failed - Error reading s3://bucket/b12345678.tar.gz: either it doesn't exist, or the unpacker doesn't have permission to read it
This API lets you retrieve a storage manifest, which describes the contents of a successfully stored bag.
Request:
By default this API returns the latest version of a bag, but you can retrieve a specific version by passing the version
query parameter.
Example response:
This API lets you see all the versions of an external identifier stored in a given space.
Request:
The storage service doesn't provide this API because:
It already exists in the storage providers, and inserting ourself in the middle would create an unnecessary dependency on the storage service. This is particularly important for ensuring long-term access to the content.
Any intermediary service would likely be less performant and full-featured than just using the underlying APIs.
We can push the problem of permissions management onto the storage providers.
{space}
-- the broad category of the bag, say digitised
or born-digital
. See the .
The Location
header returns you a URL you can use for the .
The storage service does not provide an API for retrieving the content of stored bags -- instead, you should use the storage provider APIs, like or .