# RFC 005: Reporting Pipeline

This RFC proposes a reporting pipeline for the Wellcome Collection data, allowing for analytics and reporting on data from various sources.

**Last modified:** 2018-11-02T16:46:57+00:00

## Background

The collection aggregates data from a number of sources:

* Archival records
* Library systems
* Digital asset metadata

Data will flow from these systems into our data store.

## Problem Statement

In order to make decisions about collection data.

We need to run analytics and reporting on data from various sources.

## Proposed Solution

We propose to add a simple reporting pipeline powered by lambda functions feeding an ElasticSearch cluster.

![overview](/files/ihrJG4NFJ5FcK45gjQCT)

**Note:** [Kinesis Firehose](https://aws.amazon.com/blogs/aws/amazon-kinesis-firehose-simple-highly-scalable-data-ingestion/) is unsuitable for this purpose at time of writing as it is incapable of performing ElasticSearch document updates.

### Process flow

The event stream from the SourceData "Versioned Hybrid Store" triggers:

* A lambda which performs a custom transformation on source data making it suitable for ingest into elasticsearch.
  * This lambda will pass a json object and index identifier to SNS
* An ingestion lambda PUTs the object passed to the specified index

It is intended that there may be multiple transformation lambdas, providing custom transforms. There will be one ingestion lambda intended to try and PUT any object to any index specified.

#### Ingestion Lambda proposed message format

The ingestion lambda needs to take a message that configures which index to attempt to add the object to.

```json
{
  "index": "my-index-1",
  "object": {
    "foo": "bar"
  }
}
```

#### Elasticsearch mappings

It is not intended that strict mappings will be provided. It will instead be the job of the transformation to provide representative data.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.wellcomecollection.org/request-for-comments-rfcs/005-reporting_pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
