Bootstrapping a new Archivematica stack
Last updated
Last updated
At time of writing, we run two Archivematica instances:
The production instance writes into the production Wellcome Archival Storage. This is accessible at . You should use this for everything you want to keep.
The staging instances writes into the staging Wellcome Archival Storage. This is accessible at . This is used for experiments, testing Archivematica, and so on. Don't use this for anything you want to keep long-term.
These are the steps for creating a new stack.
You need two hostnames for an Archivematica instance:
Dashboard, e.g. https://archivematica.wellcomecollection.org
or https://archivematica-stage.wellcomecollection.org
Storage service, e.g. https://archivematica-storage-service.wellcomecollection.org
or https://archivematica-storage-service-stage.wellcomecollection.org
The existing certificate is defined in the infra
Terraform stack.
If you're adding a new hostname, you'll need to create a certificate that covers these hostnames -- be careful not to accidentally delete the existing certificate, which would break the production Archivematica instance.
Each Archivematica instance has two Terraform stacks:
The "critical_NAME" stack creates S3 buckets and databases. Anything stateful goes in here.
The "stack_NAME" stack creates the services, load balancers, and so on, that read from the databases.
You need to create the critical_NAME stack first, then stack_NAME. Make sure to change the config values before you plan/apply!
When you first create a Terraform stack, you'll create a large EBS volume -- this is where Archivematica stores any currently-processing packages. This volume needs to be formatted before it can be used.
To format the volume:
SSH into the EC2 container host.
Run the command df -h
. You should see output something like:
If you see an entry "Mounted on: /ebs
" then this task has already been completed, and you can move to creating the Archivematica databases.
Run sudo bash /format_ebs_volume.sh
, then reboot the instance by running sudo reboot
.
When the instance has rebooted, SSH back in and run df -h
again. This time you should see an entry "Mounted on: /ebs
", for example:
This means the volume has been successfully formatted and mounted.
When you first create your Archivematica stack, you'll notice that none of the tasks stay up for very long. If you look in the logs, you'll see them crashing with this error:
OperationalError: (1049, "Unknown database 'MCP'")
To fix this:
SSH into one of the EC2 container hosts. This gets you inside the security group that connects to RDS.
Start a Docker container and install MySQL:
Open a MySQL connection to the RDS instance, using the outputs from the critical_NAME stack:
Run the following MySQL command:
Once the databases have been created, we need to run Django migrations.
To fix this:
SSH into the EC2 container hosts.
Run the Django migrations in the dashboard:
It might take a couple of attempts before this finishes successfully. The dashboard can't start until the database is set up correctly, which means it fails load balancer healthchecks -- ECS will be continually restarting the container until you successfully run the database migrations.
Look for a Docker container running the storage service. Similar to above:
When you see the dashboard and storage service are both running (you get a login page if you visit their URLs), you can create the initial users.
Create a storage service user:
Create a dashboard user:
This step tells Archivematica how to write to the Wellcome Archival Storage.
Select "Spaces" in the top tab bar. Click "Create new space".
Select the following options:
Access Protocol: Wellcome Storage Service
Path: /
Staging path: /var/archivematica/sharedDirectory/wellcome-storage-service
Used as a temporary area for transfers to/from the remote service
Token url / Api root url / App client id / App client secret Details of the Wellcome storage service
Access Key ID / Secret Access Key / Assumed AWS IAM Role AWS auth details, shouldn't be needed when running with ECS task roles
Bucket:
wellcomecollection-archivematica-ingests
wellcomecollection-archivematica-staging-ingests
This is where the Wellcome storage plugin will place files for the WSS. It will then notify the storage service of this location so it can pick them up.
Callback username / api key: a username and API key for the AMSS so the callback from WSS can authenticate
Click "Create location here".
The purpose is AIP Storage and the relative path is /born-digital
.
(This will be concatenated onto the space path to produce a full path to which files should be uploaded. This does not correspond to a filesystem path, but maps to a location on the eventual storage. e.g. /born-digital/
will map to the born-digital
space in the Archival Storage.)
Here's what a successfully configured space looks like:
and location:
This step tells Archivematica how to read uploads from the S3 transfer bucket.
Select "Spaces" in the top tab bar. Click "Create new space".
Select the following options:
Access protocol: S3
Path: /
Staging path: /var/archivematica/sharedDirectory/s3_transfers
Used as a temporary area for transfers to/from S3
S3 Bucket:
wellcomecollection-archivematica-transfer-source
wellcomecollection-archivematica-staging-transfer-source
This is where the Wellcome storage plugin will place files for the WSS. It will then notify the storage service of this location so it can pick them up.
Click "Create new location here".
The purpose is Transfer Source.
Give it a description of "S3 transfer source" or similar.
The relative path corresponds to the name of the drop directory (within the root path) into which files should be dropped and an automated transfer started on Archivematica. It must match the name of a workflow on Archivematica (with dashes replaced by underscores, e.g. born-digital directory will trigger a transfer using the born_digital flow)
You need to create locations for /born-digital
and /born-digital-accessions
.
Select "Spaces" in the top tab bar. The first space should have "Access Protocol: Local Filesystem". Click "Edit Space".
Select the following options:
Path: /
Staging path: /
Select "Administration" in the top tab bar. Select "Processing configuration" in the sidebar.
Set the following settings in the "Default" configuration:
data
Assign UUIDs to directories
No
Generate transfer structure report
No
Perform file format identification (Transfer)
Yes
Perform policy checks on originals
No
Examine contents
Examine contents
Perform file format identification (Ingest)
No, use existing
Generate thumbnails
No
Perform policy checks on preservation derivatives
No
Perform policy checks on access derivatives
No
Bind PIDs
No
Document empty directories
No
Transcribe files (OCR)
No
Perform file format identification (Submission documentation & metadata)
No
Select compression algorithm
Gzipped tar
Select compression level
1 - fastest mode
Store AIP location
Wellcome AIP storage
Upload DIP
Do not upload DIP
All other fields should be "None".
Create a "born_digital" config, with the settings above and additionally:
Perform policy checks on originals
No
Create SIP(s)
Create single SIP and continue processing
Normalize
Do not normalize
Add metadata if desired
Continue
Store AIP
Yes
Create a "b_dig_accessions" config, with the default settings above and additionally:
processing
Perform policy checks on originals
No
Create SIP(s)
Create single SIP and continue
Normalize
Do not normalize
Add metadata if desired
Continue
Store AIP
Yes
Log in to the Archivematica Storage Service (e.g. at ).
Callback host:
Log in to the Archivematica Storage Service (e.g. at ).
Log in to the Archivematica Storage Service (e.g. at ).
If these are not set, you may get "No space left on device" errors when trying to process larger packages; see .
Log in to the Archivematica Dashboard (e.g. at ).