Bootstrapping a new Archivematica stack

At time of writing, we run two Archivematica instances:

These are the steps for creating a new stack.

1. Create a new ACM certificate (maybe)

You need two hostnames for an Archivematica instance:

  • Dashboard, e.g. https://archivematica.wellcomecollection.org or https://archivematica-stage.wellcomecollection.org

  • Storage service, e.g. https://archivematica-storage-service.wellcomecollection.org or https://archivematica-storage-service-stage.wellcomecollection.org

The existing certificate is defined in the infra Terraform stack.

If you're adding a new hostname, you'll need to create a certificate that covers these hostnames -- be careful not to accidentally delete the existing certificate, which would break the production Archivematica instance.

2. Create a new Terraform stack

Each Archivematica instance has two Terraform stacks:

  • The "critical_NAME" stack creates S3 buckets and databases. Anything stateful goes in here.

  • The "stack_NAME" stack creates the services, load balancers, and so on, that read from the databases.

You need to create the critical_NAME stack first, then stack_NAME. Make sure to change the config values before you plan/apply!

3. Format the EBS volume

When you first create a Terraform stack, you'll create a large EBS volume -- this is where Archivematica stores any currently-processing packages. This volume needs to be formatted before it can be used.

To format the volume:

  1. SSH into the EC2 container host.

  2. Run the command df -h. You should see output something like:

    $ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    devtmpfs         16G     0   16G   0% /dev
    tmpfs            16G     0   16G   0% /dev/shm
    tmpfs            16G  1.9M   16G   1% /run
    tmpfs            16G     0   16G   0% /sys/fs/cgroup
    /dev/nvme0n1p1   30G   15G   15G  51% /
    tmpfs           3.1G     0  3.1G   0% /run/user/1000

    If you see an entry "Mounted on: /ebs" then this task has already been completed, and you can move to creating the Archivematica databases.

  3. Run sudo bash /format_ebs_volume.sh, then reboot the instance by running sudo reboot.

  4. When the instance has rebooted, SSH back in and run df -h again. This time you should see an entry "Mounted on: /ebs", for example:

    $ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    devtmpfs         16G     0   16G   0% /dev
    tmpfs            16G     0   16G   0% /dev/shm
    tmpfs            16G  1.9M   16G   1% /run
    tmpfs            16G     0   16G   0% /sys/fs/cgroup
    /dev/nvme0n1p1   30G   15G   15G  51% /
    /dev/nvme1n1    246G  320K  234G   1% /ebs
    tmpfs           3.1G     0  3.1G   0% /run/user/1000

    This means the volume has been successfully formatted and mounted.

4. Create the Archivematica databases

When you first create your Archivematica stack, you'll notice that none of the tasks stay up for very long. If you look in the logs, you'll see them crashing with this error:

OperationalError: (1049, "Unknown database 'MCP'")

To fix this:

  1. SSH into one of the EC2 container hosts. This gets you inside the security group that connects to RDS.

  2. Start a Docker container and install MySQL:

    $ docker run -it alpine sh
    # apk add --update mariadb-client
  3. Open a MySQL connection to the RDS instance, using the outputs from the critical_NAME stack:

    mysql \
      --host=$HOSTNAME \
      --user=archivematica \
      --password=$PASSWORD

    Run the following MySQL command:

    CREATE DATABASE SS;
    CREATE DATABASE MCP;

5. Run the Django database migrations

Once the databases have been created, we need to run Django migrations.

To fix this:

  1. SSH into the EC2 container hosts.

  2. Run the Django migrations in the dashboard:

    docker exec -it $(docker ps | grep dashboard | grep app | awk '{print $1}') python /src/src/dashboard/src/manage.py migrate

    It might take a couple of attempts before this finishes successfully. The dashboard can't start until the database is set up correctly, which means it fails load balancer healthchecks -- ECS will be continually restarting the container until you successfully run the database migrations.

  3. Look for a Docker container running the storage service. Similar to above:

    docker exec -it $(docker ps | grep storage-service | grep app | awk '{print $1}') python /src/storage_service/manage.py migrate

6. Create initial users

When you see the dashboard and storage service are both running (you get a login page if you visit their URLs), you can create the initial users.

Create a storage service user:

docker exec -it $(docker ps | grep storage-service | grep app | awk '{print $1}') \
    python /src/storage_service/manage.py \
    create_user \
    --username="admin" \
    --password="PASSWORD" \
    --email="wellcomedigitalworkflow@wellcome.ac.uk" \
    --api-key="SS_API_KEY" \
    --superuser

Create a dashboard user:

docker exec -it $(docker ps | grep dashboard | grep app | awk '{print $1}') \
    python /src/src/dashboard/src/manage.py install \
    --username="admin" \
    --password="PASSWORD" \
    --email="wellcomedigitalworkflow@wellcome.ac.uk" \
    --org-name="wellcome" \
    --org-id="wellcome" \
    --api-key="API_KEY" \
    --ss-url="SS_HOSTNAME" \
    --ss-user="admin" \
    --ss-api-key="SS_API_KEY" \
    --site-url="DASHBOARD_HOSTNAME"

7. Connect to the Wellcome Archival Storage

This step tells Archivematica how to write to the Wellcome Archival Storage.

  1. Log in to the Archivematica Storage Service (e.g. at https://archivematica-storage-service.wellcomecollection.org/).

  2. Select "Spaces" in the top tab bar. Click "Create new space".

  3. Select the following options:

    Access Protocol: Wellcome Storage Service

    Path: /

    Staging path: /var/archivematica/sharedDirectory/wellcome-storage-service Used as a temporary area for transfers to/from the remote service

    Token url / Api root url / App client id / App client secret Details of the Wellcome storage service

    Access Key ID / Secret Access Key / Assumed AWS IAM Role AWS auth details, shouldn't be needed when running with ECS task roles

    Bucket: wellcomecollection-archivematica-ingests wellcomecollection-archivematica-staging-ingests This is where the Wellcome storage plugin will place files for the WSS. It will then notify the storage service of this location so it can pick them up.

    Callback host: https://archivematica-storage-service.wellcomecollection.org/ https://archivematica-storage-service-stage.wellcomecollection.org/

    Callback username / api key: a username and API key for the AMSS so the callback from WSS can authenticate

  4. Click "Create location here".

    The purpose is AIP Storage and the relative path is /born-digital.

    (This will be concatenated onto the space path to produce a full path to which files should be uploaded. This does not correspond to a filesystem path, but maps to a location on the eventual storage. e.g. /born-digital/ will map to the born-digital space in the Archival Storage.)

Here's what a successfully configured space looks like:

and location:

8. Connect to the transfer source bucket

This step tells Archivematica how to read uploads from the S3 transfer bucket.

  1. Log in to the Archivematica Storage Service (e.g. at https://archivematica-storage-service.wellcomecollection.org/).

  2. Select "Spaces" in the top tab bar. Click "Create new space".

  3. Select the following options:

    Access protocol: S3

    Path: /

    Staging path: /var/archivematica/sharedDirectory/s3_transfers Used as a temporary area for transfers to/from S3

    S3 Bucket: wellcomecollection-archivematica-transfer-source wellcomecollection-archivematica-staging-transfer-source This is where the Wellcome storage plugin will place files for the WSS. It will then notify the storage service of this location so it can pick them up.

  4. Click "Create new location here".

    The purpose is Transfer Source.

    Give it a description of "S3 transfer source" or similar.

    The relative path corresponds to the name of the drop directory (within the root path) into which files should be dropped and an automated transfer started on Archivematica. It must match the name of a workflow on Archivematica (with dashes replaced by underscores, e.g. born-digital directory will trigger a transfer using the born_digital flow)

    You need to create locations for /born-digital and /born-digital-accessions.

9. Configure the local filesystem storage

  1. Log in to the Archivematica Storage Service (e.g. at https://archivematica-storage-service.wellcomecollection.org/).

  2. Select "Spaces" in the top tab bar. The first space should have "Access Protocol: Local Filesystem". Click "Edit Space".

  3. Select the following options:

    Path: /

    Staging path: /

If these are not set, you may get "No space left on device" errors when trying to process larger packages; see archivematica-infrastructure#128.

10. Set up the default processing configuration

  1. Log in to the Archivematica Dashboard (e.g. at https://archivematica.wellcomecollection.org/).

  2. Select "Administration" in the top tab bar. Select "Processing configuration" in the sidebar.

  3. Set the following settings in the "Default" configuration:

    data

    Scan for virusesYes

    Assign UUIDs to directories

    No

    Generate transfer structure report

    No

    Perform file format identification (Transfer)

    Yes

    Perform policy checks on originals

    No

    Examine contents

    Examine contents

    Perform file format identification (Ingest)

    No, use existing

    Generate thumbnails

    No

    Perform policy checks on preservation derivatives

    No

    Perform policy checks on access derivatives

    No

    Bind PIDs

    No

    Document empty directories

    No

    Transcribe files (OCR)

    No

    Perform file format identification (Submission documentation & metadata)

    No

    Select compression algorithm

    Gzipped tar

    Select compression level

    1 - fastest mode

    Store AIP location

    Wellcome AIP storage

    Upload DIP

    Do not upload DIP

    All other fields should be "None".

  4. Create a "born_digital" config, with the settings above and additionally:

    Extract packagesNo

    Perform policy checks on originals

    No

    Create SIP(s)

    Create single SIP and continue processing

    Normalize

    Do not normalize

    Add metadata if desired

    Continue

    Store AIP

    Yes

  5. Create a "b_dig_accessions" config, with the default settings above and additionally:

    processing

    Extract packagesNo

    Perform policy checks on originals

    No

    Create SIP(s)

    Create single SIP and continue

    Normalize

    Do not normalize

    Add metadata if desired

    Continue

    Store AIP

    Yes

Last updated