Bootstrapping a new Archivematica stack
At time of writing, we run two Archivematica instances:
The production instance writes into the production Wellcome Archival Storage. This is accessible at https://archivematica.wellcomecollection.org. You should use this for everything you want to keep.
The staging instances writes into the staging Wellcome Archival Storage. This is accessible at https://archivematica-stage.wellcomecollection.org. This is used for experiments, testing Archivematica, and so on. Don't use this for anything you want to keep long-term.
These are the steps for creating a new stack.
1. Create a new ACM certificate (maybe)
You need two hostnames for an Archivematica instance:
Dashboard, e.g.
https://archivematica.wellcomecollection.org
orhttps://archivematica-stage.wellcomecollection.org
Storage service, e.g.
https://archivematica-storage-service.wellcomecollection.org
orhttps://archivematica-storage-service-stage.wellcomecollection.org
The existing certificate is defined in the infra
Terraform stack.
If you're adding a new hostname, you'll need to create a certificate that covers these hostnames -- be careful not to accidentally delete the existing certificate, which would break the production Archivematica instance.
2. Create a new Terraform stack
Each Archivematica instance has two Terraform stacks:
The "critical_NAME" stack creates S3 buckets and databases. Anything stateful goes in here.
The "stack_NAME" stack creates the services, load balancers, and so on, that read from the databases.
You need to create the critical_NAME stack first, then stack_NAME. Make sure to change the config values before you plan/apply!
3. Format the EBS volume
When you first create a Terraform stack, you'll create a large EBS volume -- this is where Archivematica stores any currently-processing packages. This volume needs to be formatted before it can be used.
To format the volume:
SSH into the EC2 container host.
Run the command
df -h
. You should see output something like:If you see an entry "Mounted on:
/ebs
" then this task has already been completed, and you can move to creating the Archivematica databases.Run
sudo bash /format_ebs_volume.sh
, then reboot the instance by runningsudo reboot
.When the instance has rebooted, SSH back in and run
df -h
again. This time you should see an entry "Mounted on:/ebs
", for example:This means the volume has been successfully formatted and mounted.
4. Create the Archivematica databases
When you first create your Archivematica stack, you'll notice that none of the tasks stay up for very long. If you look in the logs, you'll see them crashing with this error:
OperationalError: (1049, "Unknown database 'MCP'")
To fix this:
SSH into one of the EC2 container hosts. This gets you inside the security group that connects to RDS.
Start a Docker container and install MySQL:
Open a MySQL connection to the RDS instance, using the outputs from the critical_NAME stack:
Run the following MySQL command:
5. Run the Django database migrations
Once the databases have been created, we need to run Django migrations.
To fix this:
SSH into the EC2 container hosts.
Run the Django migrations in the dashboard:
It might take a couple of attempts before this finishes successfully. The dashboard can't start until the database is set up correctly, which means it fails load balancer healthchecks -- ECS will be continually restarting the container until you successfully run the database migrations.
Look for a Docker container running the storage service. Similar to above:
6. Create initial users
When you see the dashboard and storage service are both running (you get a login page if you visit their URLs), you can create the initial users.
Create a storage service user:
Create a dashboard user:
7. Connect to the Wellcome Archival Storage
This step tells Archivematica how to write to the Wellcome Archival Storage.
Log in to the Archivematica Storage Service (e.g. at https://archivematica-storage-service.wellcomecollection.org/).
Select "Spaces" in the top tab bar. Click "Create new space".
Select the following options:
Access Protocol: Wellcome Storage Service
Path:
/
Staging path:
/var/archivematica/sharedDirectory/wellcome-storage-service
Used as a temporary area for transfers to/from the remote serviceToken url / Api root url / App client id / App client secret Details of the Wellcome storage service
Access Key ID / Secret Access Key / Assumed AWS IAM Role AWS auth details, shouldn't be needed when running with ECS task roles
Bucket:
wellcomecollection-archivematica-ingests
wellcomecollection-archivematica-staging-ingests
This is where the Wellcome storage plugin will place files for the WSS. It will then notify the storage service of this location so it can pick them up.Callback host: https://archivematica-storage-service.wellcomecollection.org/ https://archivematica-storage-service-stage.wellcomecollection.org/
Callback username / api key: a username and API key for the AMSS so the callback from WSS can authenticate
Click "Create location here".
The purpose is AIP Storage and the relative path is
/born-digital
.(This will be concatenated onto the space path to produce a full path to which files should be uploaded. This does not correspond to a filesystem path, but maps to a location on the eventual storage. e.g.
/born-digital/
will map to theborn-digital
space in the Archival Storage.)
Here's what a successfully configured space looks like:
and location:
8. Connect to the transfer source bucket
This step tells Archivematica how to read uploads from the S3 transfer bucket.
Log in to the Archivematica Storage Service (e.g. at https://archivematica-storage-service.wellcomecollection.org/).
Select "Spaces" in the top tab bar. Click "Create new space".
Select the following options:
Access protocol: S3
Path:
/
Staging path:
/var/archivematica/sharedDirectory/s3_transfers
Used as a temporary area for transfers to/from S3S3 Bucket:
wellcomecollection-archivematica-transfer-source
wellcomecollection-archivematica-staging-transfer-source
This is where the Wellcome storage plugin will place files for the WSS. It will then notify the storage service of this location so it can pick them up.Click "Create new location here".
The purpose is Transfer Source.
Give it a description of "S3 transfer source" or similar.
The relative path corresponds to the name of the drop directory (within the root path) into which files should be dropped and an automated transfer started on Archivematica. It must match the name of a workflow on Archivematica (with dashes replaced by underscores, e.g. born-digital directory will trigger a transfer using the born_digital flow)
You need to create locations for
/born-digital
and/born-digital-accessions
.
9. Configure the local filesystem storage
Log in to the Archivematica Storage Service (e.g. at https://archivematica-storage-service.wellcomecollection.org/).
Select "Spaces" in the top tab bar. The first space should have "Access Protocol: Local Filesystem". Click "Edit Space".
Select the following options:
Path:
/
Staging path:
/
If these are not set, you may get "No space left on device" errors when trying to process larger packages; see archivematica-infrastructure#128.
10. Set up the default processing configuration
Log in to the Archivematica Dashboard (e.g. at https://archivematica.wellcomecollection.org/).
Select "Administration" in the top tab bar. Select "Processing configuration" in the sidebar.
Set the following settings in the "Default" configuration:
data
Scan for virusesYesAssign UUIDs to directories
No
Generate transfer structure report
No
Perform file format identification (Transfer)
Yes
Perform policy checks on originals
No
Examine contents
Examine contents
Perform file format identification (Ingest)
No, use existing
Generate thumbnails
No
Perform policy checks on preservation derivatives
No
Perform policy checks on access derivatives
No
Bind PIDs
No
Document empty directories
No
Transcribe files (OCR)
No
Perform file format identification (Submission documentation & metadata)
No
Select compression algorithm
Gzipped tar
Select compression level
1 - fastest mode
Store AIP location
Wellcome AIP storage
Upload DIP
Do not upload DIP
All other fields should be "None".
Create a "born_digital" config, with the settings above and additionally:
Extract packagesNoPerform policy checks on originals
No
Create SIP(s)
Create single SIP and continue processing
Normalize
Do not normalize
Add metadata if desired
Continue
Store AIP
Yes
Create a "b_dig_accessions" config, with the default settings above and additionally:
processing
Extract packagesNoPerform policy checks on originals
No
Create SIP(s)
Create single SIP and continue
Normalize
Do not normalize
Add metadata if desired
Continue
Store AIP
Yes
Last updated