Tracking changes to the Miro data
The Miro data was originally exported as a collection of XML files, one per letter prefix. We have split these XML files into a series of JSON files, one per image. These JSON files are stored in S3, with a pointer in DynamoDB:
Although the Miro data is static, we may change how we want to use it. For example:
A contributor may ask us to remove an image from the site
We may change the license on an image
We may make an image available that we were previously unsure about
We need a way to apply these changes and record them.
Principles
We should be able to override specific values in the Miro data/transformer. We should assume we will asked to make changes on an ongoing basis -- this isn't a one-off operation.
We should keep a record of our changes: who made them, when, and why
Our changes should be separate from the Miro exports
Proposal
We extend the MiroSourcePayload
model with two optional fields:
The MiroUpdateEvent
model will track our changes to the data:
The description
will be an automatically generated description of the change, e.g.
Change license override from "None" to "cc-by"
and the message
will be a human-written explanation of why we made the change, e.g.
We realised we could make this available under a more permissive licence.
The date
and user
will be automatically populated.
The MiroSourceOverride
model will allow us to track overrides:
This model can be extended to add new overrides as necessary.
When we make changes to a Miro record, we add a new MiroUpdateEvent to record the change, and we update the DynamoDB record. This gives us change tracking that preserves the integrity of the original Miro data in S3.
Python API
As part of this change, there will be a collection of Python functions that you can use to write scripts for modifying the Miro data.
You could use these to, for example, write a script to suppress three images:
These functions will send a message to the Miro updates topic, so the record gets re-transformed by the Miro transformer.
Worked example
Suppose we have the following Miro record:
The data in the S3 metadata means this is mapped to an "in-copyright" license.
We get an email from the contributor, who tells us we can release it under the CC-BY-NC license. We call the Python helper:
The helper will add an appropriate MiroUpdateEvent and MiroSourceOverride:
Later we get another contributor, saying we can now use CC-BY. We call the helper a second time:
And the record gets updated again:
Last updated