📦
Storage service
  • Introduction
  • How-to: basic operations
    • Ingest a bag into the storage service
    • Look up an already-stored bag in the storage service
    • Look up the versions of a bag in the storage service
  • How to: advanced usage
    • Getting notifications of newly stored bags
  • How to: debugging errors
    • Where to find application logs
    • Manually marking ingests as failed
  • Reference/design decisison
    • The semantics of bags, ingests and ingest types
    • How identifiers work in the storage service
    • How files are laid out in the underlying storage
    • Compressed vs uncompressed bags, and the choice of tar.gz
  • Developer information/workflow
    • An API reference for the user-facing storage service APIs
    • Key technologies
    • Inter-app messaging with SQS and SNS
    • How requests are routed from the API to app containers
    • Repository layout
    • How Docker images are published to ECR
  • Wellcome-specific information
    • Our storage configuration
      • Our three replicas: S3, Glacier, and Azure
      • Using multiple storage tiers for cost-efficiency (A/V, TIFFs)
      • Small fluctuations in our storage bill
      • Delete protection on the production storage service
    • Wellcome-specific debugging
      • Why did my callback to Goobi return a 401 Unauthorized?
    • Recovering files from our Azure replica
    • Awkward files and bags
    • Deleting files or bags bags from the storage service
Powered by GitBook
On this page
  • AWS services
  • Tools
  1. Developer information/workflow

Key technologies

PreviousAn API reference for the user-facing storage service APIsNextInter-app messaging with SQS and SNS

Last updated 2 years ago

This document provides a brief overview of some of the key technologies we use in the storage service.

AWS services

S3 – petabyte-scale object storage. We use it for permanent storage of assets, temporary/working storage, and to hold storage "manifests" (our JSON representation of a bag).

DynamoDB – a NoSQL database that we use as a key-value store. We use it to track ingests, store information about versions, and lock around certain processes.

SNS/SQS – inter-app message queues. Our apps form a pipeline: an app receives a message, does some work, then sends another message to the next app in line. This inter-app messaging uses SNS and SQS, for more see .

ECS/Fargate – a serverless container runtime. We package our services in Docker images, and then Fargate actually runs the containers, without us having to provision VMs/servers to run them on. Fargate is a subservice of Elastic Container Service, or ECS.

Tools

Scala – a JVM-based language with an emphasis on functional programming. Most of the storage service applications are written in Scala.

Terraform – an infrastructure-as-code tool that we use to manage our resources: AWS services, Elastic Cloud clusters, Azure storage containers, and so on. Using Terraform makes it easier for us to track changes to our infrastructure, and to run multiple identical copies of the storage service (one for real content, one for testing).

Python – a scripting language that we use as "glue" code between certain applications, and for local debugging scripts.

inter-app messaging