RFC 005: Reporting Pipeline
Last updated
Last updated
Last updated: 02 November 2018.
The collection aggregates data from a number of sources:
Archival records
Library systems
Digital asset metadata
Data will flow from these systems into our data store.
In order to make decisions about collection data.
We need to run analytics and reporting on data from various sources.
We propose to add a simple reporting pipeline powered by lambda functions feeding an ElasticSearch cluster.
Note: Kinesis Firehose is unsuitable for this purpose at time of writing as it is incapable of performing ElasticSearch document updates.
The event stream from the SourceData "Versioned Hybrid Store" triggers:
A lambda which performs a custom transformation on source data making it suitable for ingest into elasticsearch.
This lambda will pass a json object and index identifier to SNS
An ingestion lambda PUTs the object passed to the specified index
It is intended that there may be multiple transformation lambdas, providing custom transforms. There will be one ingestion lambda intended to try and PUT any object to any index specified.
The ingestion lambda needs to take a message that configures which index to attempt to add the object to.
It is not intended that strict mappings will be provided. It will instead be the job of the transformation to provide representative data.