Archivematica @ Wellcome Collection
  • Introduction
  • High-level design
  • Storing born-digital files
    • Creating a transfer package
    • Upload a transfer package to S3
    • Check a package was stored successfully
    • Downloading a package from the storage service
    • Following a package in the dashboard
  • Service architecture
    • How does Archivematica work?
      • The Archivematica apps
      • Microservices, tasks and jobs
      • Gearman, ElastiCache, and the MCP server/client
    • How is our deployment unusual?
      • What are our extra services?
      • ECS containers on EC2, not Fargate
      • Why we forked Archivematica
    • How it fits into the wider platform
  • About our deployment
    • Using Wellcome catalogue identifiers
    • Different environments
    • Working storage: MySQL, Redis, and EBS
  • Administering Archivematica
    • Bootstrapping a new Archivematica stack
    • User management
      • How to add or remove users
      • Authentication with Azure AD
    • Upgrading to a new version of Archivematica
    • Running an end-to-end test
    • Clearing old transfers from the dashboard
  • Debugging Archivematica
    • Where to find application logs
    • Troubleshooting known errors
      • Timeout waiting for network interface provisioning to complete
      • 401 Unauthorized when the s3_start_transfer Lambda tries to run
      • "pull access denied" when running containers (and other ECS agent issues)
      • "Unauthorized for url" when logging in
      • "gearman.errors.ExceededConnectionAttempts: Exceeded 1 connection attempt(s)" in MCP server
      • NotADirectoryError in the Extract zipped transfer stage
    • Restarting services if a task is stuck
    • SSH into the Archivematica container hosts
Powered by GitBook
On this page
  1. About our deployment

Using Wellcome catalogue identifiers

PreviousHow it fits into the wider platformNextDifferent environments

Last updated 2 years ago

We use the External-Identifier from the BagIt bag to store bags in the storage service (see ). By default, Archivematica uses the ingest UUID as the External-Identifier for the AIPs it creates, but this UUID has no meaning outside Archivematica. We want to use our own identifiers (i.e. references from CALM) to store bags in the storage service.

We achieve this in two steps:

  1. When users upload bags, they include a metadata.csv file that includes our reference number as the Dublin Core identifier. e.g.

    filename,dc.identifier
    objects/,archivematica-dev/TEST/1

    would use the reference archivematica-dev/TEST/1. This identifier gets written to the Archivematica METS file.

  2. In our fork of Archivematica, before we store the AIP, we unpack the bag, extract the reference from the METS file, and write it as the External-Identifier. We move the Archivematica UUID to the Internal-Identifier field.

    We record this reference in the Archivematica database so that Archivematica can retrieve the bag later (although we don't actually retrieve bags in Archivematica).

    You can see the code for this in .

notes on identifiers
storage_service/locations/models/wellcome.py