Archivematica @ Wellcome Collection
  • Introduction
  • High-level design
  • Storing born-digital files
    • Creating a transfer package
    • Upload a transfer package to S3
    • Check a package was stored successfully
    • Downloading a package from the storage service
    • Following a package in the dashboard
  • Service architecture
    • How does Archivematica work?
      • The Archivematica apps
      • Microservices, tasks and jobs
      • Gearman, ElastiCache, and the MCP server/client
    • How is our deployment unusual?
      • What are our extra services?
      • ECS containers on EC2, not Fargate
      • Why we forked Archivematica
    • How it fits into the wider platform
  • About our deployment
    • Using Wellcome catalogue identifiers
    • Different environments
    • Working storage: MySQL, Redis, and EBS
  • Administering Archivematica
    • Bootstrapping a new Archivematica stack
    • User management
      • How to add or remove users
      • Authentication with Azure AD
    • Upgrading to a new version of Archivematica
    • Running an end-to-end test
    • Clearing old transfers from the dashboard
  • Debugging Archivematica
    • Where to find application logs
    • Troubleshooting known errors
      • Timeout waiting for network interface provisioning to complete
      • 401 Unauthorized when the s3_start_transfer Lambda tries to run
      • "pull access denied" when running containers (and other ECS agent issues)
      • "Unauthorized for url" when logging in
      • "gearman.errors.ExceededConnectionAttempts: Exceeded 1 connection attempt(s)" in MCP server
      • NotADirectoryError in the Extract zipped transfer stage
    • Restarting services if a task is stuck
    • SSH into the Archivematica container hosts
Powered by GitBook
On this page
  • Requirements
  • Documentation
  • Repo

Introduction

NextHigh-level design

Last updated 2 years ago

We use Archivematica to process and store our born-digital archives.

This processing includes:

  • Analysing files in the archive, like virus scanning, file format identification, and fixity checking

  • Creating a metadata description of the archive that can be read by downstream applications

  • Uploading the archive to our permanent cloud storage

Archivematica is an open-source application created by .

Requirements

  • Allow archivists to manage our born-digital collections

  • Ensure our born-digital collections are processed consistently and stored safely

  • Provide metadata in a consistent format that we can (eventually) use to display born-digital archives on wellcomecollection.org

  • Avoid "reinventing the wheel" when processing born-digital archives

Documentation

This GitBook space is meant for staff at Wellcome Collection to understand how our Archivematica deployment works, so they can use it, debug issues, and administer our deployment.

This includes:

  • How-to guides explaining how to do common operations, e.g. create a new transfer package

  • Reference material explaining how Archivematica works

  • Notes for developers who want to administer or debug our Archivematica deployment

Repo

The READMEs in the repo have instructions for specific procedures, e.g. how to create new Docker images. This GitBook is meant to be a bit higher-level.

It should be read in conjunction with , because these docs mostly contain information specific to Wellcome.

All our Archivematica-related code is in

Artefactual
the first-party Archivematica docs
https://github.com/wellcomecollection/archivematica-infrastructure