# RFC 007: Goobi Upload

This RFC proposes a new mechanism for uploading assets to Goobi workflows, replacing existing mechanisms with a more efficient and automated solution.

**Last modified:** 2018-11-02T16:46:57+00:00

## Problem statement

There are currently four mechanisms in use for uploading assets to Goobi workflows:

* **IA harvesting**: Fully automated download from IA, coordinated by Goobi
* **FTP**: Bulk upload mechanism, automatically matches to existing processes
* **Home directory**: Bulk upload mechanism, requires manual matching to existing processes
* **Hot folder**: Bulk upload for editorial photography, automatically creates new process

Two of these (home directories and hot folders) rely on SMB network shares and a third relies on an insecure, outdated technology that we don't want to run in AWS (FTP).

We want to rationalise this to the following:

* **Web upload**: Built in Goobi web upload for small numbers of files
* **S3**: A new bulk upload mechanism, which automatically matches or creates processes

This allows us to replace the three existing bulk upload mechanisms, one of which is semi-manual, with one that is fully automated and works regardless of network location.

## Suggested solution

### Web

This is already available in Goobi, no changes required.

### S3

#### Package format

Packages should be uploaded to S3 as zip files, one per process. All assets and metadata should be at the root level, in a single directory. Compressing packages into a single file is required to ensure that packages are only processed when completely uploaded.

#### S3 layout

```
s3://wellcomecollection-workflow-upload
|
| /digitised
| /editorial
| /failed
```

#### Processing

**Initiation**

Processing should be triggered automatically by S3 event notifications.

**Digitised content**

Packages placed in the `digitised` prefix should be automatically matched to an existing process.

**Editorial photography**

Packages placed in the `editorial` prefix should automatically create an editorial photography process.

**Completion**

Succesfully processed packages should be deleted from the upload bucket. Failed packages should be moved to the `failed` prefix.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.wellcomecollection.org/request-for-comments-rfcs/007-goobi_upload.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
