Archivematica @ Wellcome Collection
  • Introduction
  • High-level design
  • Storing born-digital files
    • Creating a transfer package
    • Upload a transfer package to S3
    • Check a package was stored successfully
    • Downloading a package from the storage service
    • Following a package in the dashboard
  • Service architecture
    • How does Archivematica work?
      • The Archivematica apps
      • Microservices, tasks and jobs
      • Gearman, ElastiCache, and the MCP server/client
    • How is our deployment unusual?
      • What are our extra services?
      • ECS containers on EC2, not Fargate
      • Why we forked Archivematica
    • How it fits into the wider platform
  • About our deployment
    • Using Wellcome catalogue identifiers
    • Different environments
    • Working storage: MySQL, Redis, and EBS
  • Administering Archivematica
    • Bootstrapping a new Archivematica stack
    • User management
      • How to add or remove users
      • Authentication with Azure AD
    • Upgrading to a new version of Archivematica
    • Running an end-to-end test
    • Clearing old transfers from the dashboard
  • Debugging Archivematica
    • Where to find application logs
    • Troubleshooting known errors
      • Timeout waiting for network interface provisioning to complete
      • 401 Unauthorized when the s3_start_transfer Lambda tries to run
      • "pull access denied" when running containers (and other ECS agent issues)
      • "Unauthorized for url" when logging in
      • "gearman.errors.ExceededConnectionAttempts: Exceeded 1 connection attempt(s)" in MCP server
      • NotADirectoryError in the Extract zipped transfer stage
    • Restarting services if a task is stuck
    • SSH into the Archivematica container hosts
Powered by GitBook
On this page
  1. Service architecture
  2. How does Archivematica work?

Microservices, tasks and jobs

PreviousThe Archivematica appsNextGearman, ElastiCache, and the MCP server/client

Last updated 2 years ago

There are three units of "work" in Archivematica, which you can see in the Archivematica dashboard:

The top level unit is the microservice, for example Create SIP from Transfer.

Each microservice runs a number of jobs. Each job is doing a different action – for example, Check transfer directory for objects or Load options to create SIPs.

Each job may spawn one or more tasks, which are the Python scripts that run under the hood. You can see the tasks by clicking the gear icon. Often tasks run on a per-file basis: if there are 100 files in a transfer package and you need to perform an action on each file, there would be 100 tasks.

Microservices contain jobs, jobs spawn tasks:

Sometimes actions get stuck and need to be restarted; the only way I know how to do this is to restart the Archivematica containers (more on that below). Doing this may cause weird things to happen:

  • When the job is re-run, it gets scheduled twice, which might cause interesting things to happen downstream. Here's an example: this ingest had failed at the Prepare AIP step, I restarted the containers, and every job in and after Prepare AIP was run twice:

  • Not all tasks tolerate being run twice, e.g. they try to create a directory and fail if the directory already exists (from a previous run of the task).