Bootstrapping a new Archivematica stack

At time of writing, we run two Archivematica instances:

The production instance writes into the production Wellcome Archival Storage. This is accessible at https://archivematica.wellcomecollection.org. You should use this for everything you want to keep.
The staging instances writes into the staging Wellcome Archival Storage. This is accessible at https://archivematica-stage.wellcomecollection.org. This is used for experiments, testing Archivematica, and so on. Don't use this for anything you want to keep long-term.

These are the steps for creating a new stack.

1. Create a new ACM certificate (maybe)

You need two hostnames for an Archivematica instance:

Dashboard, e.g. https://archivematica.wellcomecollection.org or https://archivematica-stage.wellcomecollection.org
Storage service, e.g. https://archivematica-storage-service.wellcomecollection.org or https://archivematica-storage-service-stage.wellcomecollection.org

The existing certificate is defined in the infra Terraform stack.

If you're adding a new hostname, you'll need to create a certificate that covers these hostnames -- be careful not to accidentally delete the existing certificate, which would break the production Archivematica instance.

2. Create a new Terraform stack

Each Archivematica instance has two Terraform stacks:

The "critical_NAME" stack creates S3 buckets and databases. Anything stateful goes in here.
The "stack_NAME" stack creates the services, load balancers, and so on, that read from the databases.

You need to create the critical_NAME stack first, then stack_NAME. Make sure to change the config values before you plan/apply!

3. Format the EBS volume

When you first create a Terraform stack, you'll create a large EBS volume -- this is where Archivematica stores any currently-processing packages. This volume needs to be formatted before it can be used.

To format the volume:

SSH into the EC2 container host.

Run the command df -h. You should see output something like:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G  1.9M   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/nvme0n1p1   30G   15G   15G  51% /
tmpfs           3.1G     0  3.1G   0% /run/user/1000

If you see an entry "Mounted on: /ebs" then this task has already been completed, and you can move to creating the Archivematica databases.

Run sudo bash /format_ebs_volume.sh, then reboot the instance by running sudo reboot.

When the instance has rebooted, SSH back in and run df -h again. This time you should see an entry "Mounted on: /ebs", for example:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G  1.9M   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/nvme0n1p1   30G   15G   15G  51% /
/dev/nvme1n1    246G  320K  234G   1% /ebs
tmpfs           3.1G     0  3.1G   0% /run/user/1000

This means the volume has been successfully formatted and mounted.

4. Create the Archivematica databases

When you first create your Archivematica stack, you'll notice that none of the tasks stay up for very long. If you look in the logs, you'll see them crashing with this error:

OperationalError: (1049, "Unknown database 'MCP'")

To fix this:

SSH into one of the EC2 container hosts. This gets you inside the security group that connects to RDS.

Start a Docker container and install MySQL:

$ docker run -it alpine sh
# apk add --update mariadb-client

Open a MySQL connection to the RDS instance, using the outputs from the critical_NAME stack:
```
mysql \
  --host=$HOSTNAME \
  --user=archivematica \
  --password=$PASSWORD
```
Run the following MySQL command:
```
CREATE DATABASE SS;
CREATE DATABASE MCP;
```

5. Run the Django database migrations

Once the databases have been created, we need to run Django migrations.

To fix this:

SSH into the EC2 container hosts.
Run the Django migrations in the dashboard:
```
docker exec -it $(docker ps | grep dashboard | grep app | awk '{print $1}') python /src/src/dashboard/src/manage.py migrate
```
It might take a couple of attempts before this finishes successfully. The dashboard can't start until the database is set up correctly, which means it fails load balancer healthchecks -- ECS will be continually restarting the container until you successfully run the database migrations.

Look for a Docker container running the storage service. Similar to above:

docker exec -it $(docker ps | grep storage-service | grep app | awk '{print $1}') python /src/storage_service/manage.py migrate

6. Create initial users

When you see the dashboard and storage service are both running (you get a login page if you visit their URLs), you can create the initial users.

Create a storage service user:

docker exec -it $(docker ps | grep storage-service | grep app | awk '{print $1}') \
    python /src/storage_service/manage.py \
    create_user \
    --username="admin" \
    --password="PASSWORD" \
    --email="[email protected]" \
    --api-key="SS_API_KEY" \
    --superuser

Create a dashboard user:

docker exec -it $(docker ps | grep dashboard | grep app | awk '{print $1}') \
    python /src/src/dashboard/src/manage.py install \
    --username="admin" \
    --password="PASSWORD" \
    --email="[email protected]" \
    --org-name="wellcome" \
    --org-id="wellcome" \
    --api-key="API_KEY" \
    --ss-url="SS_HOSTNAME" \
    --ss-user="admin" \
    --ss-api-key="SS_API_KEY" \
    --site-url="DASHBOARD_HOSTNAME"

7. Connect to the Wellcome Archival Storage

This step tells Archivematica how to write to the Wellcome Archival Storage.

Log in to the Archivematica Storage Service (e.g. at https://archivematica-storage-service.wellcomecollection.org/).
Select "Spaces" in the top tab bar. Click "Create new space".
Select the following options:
Access Protocol: Wellcome Storage Service
Path: /
Staging path: /var/archivematica/sharedDirectory/wellcome-storage-service Used as a temporary area for transfers to/from the remote service
Token url / Api root url / App client id / App client secret Details of the Wellcome storage service
Access Key ID / Secret Access Key / Assumed AWS IAM Role AWS auth details, shouldn't be needed when running with ECS task roles
Bucket: wellcomecollection-archivematica-ingests wellcomecollection-archivematica-staging-ingests This is where the Wellcome storage plugin will place files for the WSS. It will then notify the storage service of this location so it can pick them up.
Callback host: https://archivematica-storage-service.wellcomecollection.org/ https://archivematica-storage-service-stage.wellcomecollection.org/
Callback username / api key: a username and API key for the AMSS so the callback from WSS can authenticate
Click "Create location here".
The purpose is AIP Storage and the relative path is /born-digital.
(This will be concatenated onto the space path to produce a full path to which files should be uploaded. This does not correspond to a filesystem path, but maps to a location on the eventual storage. e.g. /born-digital/ will map to the born-digital space in the Archival Storage.)

Here's what a successfully configured space looks like:

and location:

8. Connect to the transfer source bucket

This step tells Archivematica how to read uploads from the S3 transfer bucket.

Log in to the Archivematica Storage Service (e.g. at https://archivematica-storage-service.wellcomecollection.org/).
Select "Spaces" in the top tab bar. Click "Create new space".
Select the following options:
Access protocol: S3
Path: /
Staging path: /var/archivematica/sharedDirectory/s3_transfers Used as a temporary area for transfers to/from S3
S3 Bucket: wellcomecollection-archivematica-transfer-source wellcomecollection-archivematica-staging-transfer-source This is where the Wellcome storage plugin will place files for the WSS. It will then notify the storage service of this location so it can pick them up.
Click "Create new location here".
The purpose is Transfer Source.
Give it a description of "S3 transfer source" or similar.
The relative path corresponds to the name of the drop directory (within the root path) into which files should be dropped and an automated transfer started on Archivematica. It must match the name of a workflow on Archivematica (with dashes replaced by underscores, e.g. born-digital directory will trigger a transfer using the born_digital flow)
You need to create locations for /born-digital and /born-digital-accessions.

9. Configure the local filesystem storage

Log in to the Archivematica Storage Service (e.g. at https://archivematica-storage-service.wellcomecollection.org/).
Select "Spaces" in the top tab bar. The first space should have "Access Protocol: Local Filesystem". Click "Edit Space".
Select the following options:
Path: /
Staging path: /

If these are not set, you may get "No space left on device" errors when trying to process larger packages; see archivematica-infrastructure#128.

10. Set up the default processing configuration

Log in to the Archivematica Dashboard (e.g. at https://archivematica.wellcomecollection.org/).
Select "Administration" in the top tab bar. Select "Processing configuration" in the sidebar.
Set the following settings in the "Default" configuration:
data
Scan for viruses
Yes
Assign UUIDs to directories
No
Generate transfer structure report
No
Perform file format identification (Transfer)
Yes
Perform policy checks on originals
No
Examine contents
Examine contents
Perform file format identification (Ingest)
No, use existing
Generate thumbnails
No
Perform policy checks on preservation derivatives
No
Perform policy checks on access derivatives
No
Bind PIDs
No
Document empty directories
No
Transcribe files (OCR)
No
Perform file format identification (Submission documentation & metadata)
No
Select compression algorithm
Gzipped tar
Select compression level
1 - fastest mode
Store AIP location
Wellcome AIP storage
Upload DIP
Do not upload DIP
All other fields should be "None".
Create a "born_digital" config, with the settings above and additionally:
Extract packages
No
Perform policy checks on originals
No
Create SIP(s)
Create single SIP and continue processing
Normalize
Do not normalize
Add metadata if desired
Continue
Store AIP
Yes
Create a "b_dig_accessions" config, with the default settings above and additionally:
processing
Extract packages
No
Perform policy checks on originals
No
Create SIP(s)
Create single SIP and continue
Normalize
Do not normalize
Add metadata if desired
Continue
Store AIP
Yes

PreviousWorking storage: MySQL, Redis, and EBS NextUser management

Last updated 2 years ago