Archivematica @ Wellcome Collection
  • Introduction
  • High-level design
  • Storing born-digital files
    • Creating a transfer package
    • Upload a transfer package to S3
    • Check a package was stored successfully
    • Downloading a package from the storage service
    • Following a package in the dashboard
  • Service architecture
    • How does Archivematica work?
      • The Archivematica apps
      • Microservices, tasks and jobs
      • Gearman, ElastiCache, and the MCP server/client
    • How is our deployment unusual?
      • What are our extra services?
      • ECS containers on EC2, not Fargate
      • Why we forked Archivematica
    • How it fits into the wider platform
  • About our deployment
    • Using Wellcome catalogue identifiers
    • Different environments
    • Working storage: MySQL, Redis, and EBS
  • Administering Archivematica
    • Bootstrapping a new Archivematica stack
    • User management
      • How to add or remove users
      • Authentication with Azure AD
    • Upgrading to a new version of Archivematica
    • Running an end-to-end test
    • Clearing old transfers from the dashboard
  • Debugging Archivematica
    • Where to find application logs
    • Troubleshooting known errors
      • Timeout waiting for network interface provisioning to complete
      • 401 Unauthorized when the s3_start_transfer Lambda tries to run
      • "pull access denied" when running containers (and other ECS agent issues)
      • "Unauthorized for url" when logging in
      • "gearman.errors.ExceededConnectionAttempts: Exceeded 1 connection attempt(s)" in MCP server
      • NotADirectoryError in the Extract zipped transfer stage
    • Restarting services if a task is stuck
    • SSH into the Archivematica container hosts
Powered by GitBook
On this page
  1. Debugging Archivematica

SSH into the Archivematica container hosts

PreviousRestarting services if a task is stuck

Last updated 1 year ago

It can be useful to SSH into the Archivematica container hosts for debugging.

There's , or you can follow the instructions below.

The Archivematica container hosts aren't connected directly to the Internet; instead you have to go through the bastion host. There are only a handful of EC2 instances in the workflow account:

Steps:

  1. Download the wellcomedigitalworkflow SSH key from Secrets Manager in the platform account.

  2. Identify the container/bastion hsot pair you want to SSH into. Let's suppose I want to log into the staging instance.

  3. Select the bastion instance, then the "Security" tab. There should be two security groups:

    • full egress (which allows all outbound traffic from the instance)

    • SSH controlled ingress (which filters inbound traffic to the instance)

  4. Select the SSH controlled ingress security group. In the security group console, add an inbound rule that allows SSH from your current IP address. Add your name and the current date to provide an audit trail.

  5. Find the DNS names of the instances:

    • the public DNS name of the bastion instance

    • the private DNS name of the container instance

  6. SSH through the instances. I feel like there's probably a way to do this a single tunneling command, but I find it easier to move keys around:

    # Upload the SSH key to the bastion instance
    scp -i key_on_local key_on_local ec2-user@BASTION_HOST:key_on_bastion
    
    # SSH into the bastion instance
    ssh -i key_on_local ec2-user@BASTION_HOST
    
    # SSH from the bastion instance into the private instance
    # (on the bastion)
    ssh -i key_on_bastion ec2-user@CONTAINER_HOST

Interesting locations on the file system

If you are trying to fix an issue with failing ingests, you may wish to look at these locations:

  • /ebs/pipeline-data/: The folders containing "processing storage" for archivematica (including currentlyProcessing)

  • /ebs/var/archivematica/storage_service/: The archivematica-storage-service working storage

an unmaintained script
A list of EC2 instances in the console. Two of them are named Goobi; the others are "Archivematica staging container host", "Archivematica prod container host", "Archivematica prod bastion" and "Archivematica staging bastion". The two instances named "staging" are highlighted with pink arrows.
The "Security" tab of the EC2 Console. There's a pink hand-drawn circle highlighting the two security groups.
Adding an inbound rule with type "SSH" and source "My IP"