Why we forked Archivematica
Last updated
Last updated
We fork Archivematica to add support for our storage service. We've considered adding support to the upstream code (and deleting our forks), but this is non-trivial:
It means adding a new dependency to Archivematica (our storage service client library), which Artefactual are understandably reluctant to do.
Archivematica is designed to work with a variety of storage backends (e.g. S3, DuraCloud, Fedora), and our storage service is a bit of an "odd one out".
Most of the storage backends can store packages very quickly, whereas our storage service is asynchronous and can sometimes take multiple hours to successfully store a package. We've had to change some of the code around timeouts and waiting for the storage backend.
Previously we maintained two completely separate copies of the Archivematica repositories (artefactual/archivematica and archivematica-storage-service), but because we only modify a handful of files we've replaced them with "overlays" that live in this repository.
The overlay works as follows:
Clone the upstream Artefactual repository
Copy our "overlay" files into the clone
Run the docker build
command inside the clone-plus-overlay
The overlay is designed to balance a few competing concerns:
We only want to diverge from the upstream Artefactual code in a handful of places
We don't want the overhead of a separate Archivematica fork
We want to be able to update to new versions of Archivematica
The overlay is best explained with an example:
This represents a Wellcome-specific version of the file src/archivematicaCommon/lib/storageService.py
in the core Archivematica repo. When we build the Docker image, these files replace the upstream versions.
We keep both the upstream and Wellcome-specific copy in the tree so that we can easily see how we've diverged. This also allows us to maintain the divergence if the upstream code changes, because we can see what our changes from the original were.
Because we only fork in a handful of places, we should be able to update to newer Archivematica versions relatively easily.
It should be sufficient to bump the version of the Artefactual repo that we clone.
When you bump the version, you may get errors from the copy_overlay_files.py
script warning that there's a mismatch between upstream. This means that there have been changes in Archivematica that need to be mirrored to our repo.
To fix these errors:
Diff the artefactual/wellcome copies of the file, to determine what changes we've made.
Copy the latest file from the artefactual repo into our codebase, replacing both the artefactual/wellcome copies of the file.
Reapply any changes from the wellcome copy which you saw in step 1.