πŸ“¦
Storage service
  • Introduction
  • How-to: basic operations
    • Ingest a bag into the storage service
    • Look up an already-stored bag in the storage service
    • Look up the versions of a bag in the storage service
  • How to: advanced usage
    • Getting notifications of newly stored bags
  • How to: debugging errors
    • Where to find application logs
    • Manually marking ingests as failed
  • Reference/design decisison
    • The semantics of bags, ingests and ingest types
    • How identifiers work in the storage service
    • How files are laid out in the underlying storage
    • Compressed vs uncompressed bags, and the choice of tar.gz
  • Developer information/workflow
    • An API reference for the user-facing storage service APIs
    • Key technologies
    • Inter-app messaging with SQS and SNS
    • How requests are routed from the API to app containers
    • Repository layout
    • How Docker images are published to ECR
  • Wellcome-specific information
    • Our storage configuration
      • Our three replicas: S3, Glacier, and Azure
      • Using multiple storage tiers for cost-efficiency (A/V, TIFFs)
      • Small fluctuations in our storage bill
      • Delete protection on the production storage service
    • Wellcome-specific debugging
      • Why did my callback to Goobi return a 401 Unauthorized?
    • Recovering files from our Azure replica
    • Awkward files and bags
    • Deleting files or bags bags from the storage service
Powered by GitBook
On this page
  • Using multiple storage tiers for cost-efficiency (A/V, TIFFs)
  • Use case
  • How it works
  1. Wellcome-specific information
  2. Our storage configuration

Using multiple storage tiers for cost-efficiency (A/V, TIFFs)

PreviousOur three replicas: S3, Glacier, and AzureNextSmall fluctuations in our storage bill

Last updated 2 years ago

Using multiple storage tiers for cost-efficiency (A/V, TIFFs)

Within , we can store content as a mixture of Standard-IA and Glacier; this is primarily for cost efficiency. Storing objects in Glacier is than storing them in Standard-IA.

Use case

At time of writing (May 2023), there are two use cases for this feature:

  • Digitised A/V. Our digitised A/V workflow produces both a high-resolution MXF and a lower-resolution MP4.

    • The MP4 is the "access copy" – if somebody is watching the video through DLCS, it’s being transcoded from the MP4.

    • The MXF is the "preservation copy" – it's considered the canonical copy of the video and we could use it to create new access copies in the future, but it's too big to serve in a sensible way (some of the files are >100GB). We don't need immediate access to it.

    We store the MP4 in Standard-IA and the MXF in Glacier.

  • Digitised manuscripts. In our digitised manuscripts workflow, we keep both the original TIFF and the edited JP2 from LayoutWizzard.

    • The JP2 is the "access copy" used by DLCS to serve images on the web

    • The TIFF is the "preservation copy" that we don't access on a day-to-day basis.

    We store the JP2s in Standard-IA and the TIFFs in Glacier.

You can see the current set in in the bag tagger.

How it works

  • When the bag register finishes storing a bag, it sends a notification "We've successfully stored a new bag in space X with identifier Y and version Z"

  • The bag tagger picks up this message, and applies key-value tags to certain objects in the newly stored bag, e.g. we add Content-Type: application/mxf for our high-resolution MXF video files.

  • We set up S3 lifecycle configuration rules on our storage buckets to transition objects with certain tags into the Glacier storage tier, e.g. "Move any object with the tag Content-Type: application/mxf to Glacier 90 days after it was created."

our warm replica
approximately 3.5x cheaper
the TagRules object