RFC 007: Goobi Upload
Last updated: 02 November 2018.
Problem statement
There are currently four mechanisms in use for uploading assets to Goobi workflows:
IA harvesting: Fully automated download from IA, coordinated by Goobi
FTP: Bulk upload mechanism, automatically matches to existing processes
Home directory: Bulk upload mechanism, requires manual matching to existing processes
Hot folder: Bulk upload for editorial photography, automatically creates new process
Two of these (home directories and hot folders) rely on SMB network shares and a third relies on an insecure, outdated technology that we don't want to run in AWS (FTP).
We want to rationalise this to the following:
Web upload: Built in Goobi web upload for small numbers of files
S3: A new bulk upload mechanism, which automatically matches or creates processes
This allows us to replace the three existing bulk upload mechanisms, one of which is semi-manual, with one that is fully automated and works regardless of network location.
Suggested solution
Web
This is already available in Goobi, no changes required.
S3
Package format
Packages should be uploaded to S3 as zip files, one per process. All assets and metadata should be at the root level, in a single directory. Compressing packages into a single file is required to ensure that packages are only processed when completely uploaded.
S3 layout
Processing
Initiation
Processing should be triggered automatically by S3 event notifications.
Digitised content
Packages placed in the digitised
prefix should be automatically matched to an existing process.
Editorial photography
Packages placed in the editorial
prefix should automatically create an editorial photography process.
Completion
Succesfully processed packages should be deleted from the upload bucket. Failed packages should be moved to the failed
prefix.
Last updated