Archivematica @ Wellcome Collection
  • Introduction
  • High-level design
  • Storing born-digital files
    • Creating a transfer package
    • Upload a transfer package to S3
    • Check a package was stored successfully
    • Downloading a package from the storage service
    • Following a package in the dashboard
  • Service architecture
    • How does Archivematica work?
      • The Archivematica apps
      • Microservices, tasks and jobs
      • Gearman, ElastiCache, and the MCP server/client
    • How is our deployment unusual?
      • What are our extra services?
      • ECS containers on EC2, not Fargate
      • Why we forked Archivematica
    • How it fits into the wider platform
  • About our deployment
    • Using Wellcome catalogue identifiers
    • Different environments
    • Working storage: MySQL, Redis, and EBS
  • Administering Archivematica
    • Bootstrapping a new Archivematica stack
    • User management
      • How to add or remove users
      • Authentication with Azure AD
    • Upgrading to a new version of Archivematica
    • Running an end-to-end test
    • Clearing old transfers from the dashboard
  • Debugging Archivematica
    • Where to find application logs
    • Troubleshooting known errors
      • Timeout waiting for network interface provisioning to complete
      • 401 Unauthorized when the s3_start_transfer Lambda tries to run
      • "pull access denied" when running containers (and other ECS agent issues)
      • "Unauthorized for url" when logging in
      • "gearman.errors.ExceededConnectionAttempts: Exceeded 1 connection attempt(s)" in MCP server
      • NotADirectoryError in the Extract zipped transfer stage
    • Restarting services if a task is stuck
    • SSH into the Archivematica container hosts
Powered by GitBook
On this page
  1. Service architecture
  2. How does Archivematica work?

Gearman, ElastiCache, and the MCP server/client

PreviousMicroservices, tasks and jobsNextHow is our deployment unusual?

Last updated 2 years ago

Archivematica also has microservices in the sense we use them in the rest of the platform: independent containers running in ECS.

The MCP server is a scheduler written as part of Archivematica. It decides what tasks (in the sense described above) need to be run. It tells the Gearman server about these tasks.

Gearman is for distributing tasks between machines. It uses Redis as a database to track the in-flight tasks, which in our case is an instance of Amazon hosted ElastiCache.

The MCP client picks up tasks from Gearman, and actually does the work -- for example, moving a file from A to B. It then reports the results back to Gearman. You can have multiple instances of the MCP client, and the computational resources available to each client are a dominant factor in the speed of processing in Archivematica. At time of writing (March 2020), we run two instances of the MCP client.

So the lifecycle of a task is as follows:

  • The MCP server schedules a task, and sends it to Gearman

  • Gearman forwards the task to an MCP client

  • The MCP client performs the task, and reports the result back to Gearman

  • Gearman forwards the result to the MCP server, which then displays the result in the dashboard, and decides what task to run next

These services write the result of their processing to a MySQL database, which uses RDS.

an open-source framework