Catalogue pipeline
  • Introduction
  • Fetching records from source catalogues
    • What is an adapter?
    • CALM: Our archive catalogue
    • MIRO: Our image collections
  • Transforming records into a single, common model
    • Our single model: the Work
    • Creating canonical identifiers
  • Combining records from multiple sources
    • Why do we combine records?
    • How we choose which records to combine
  • Other topics
    • Catalogue
    • Search
      • wellcomecollection.org query development index
      • Hypotheses
        • Concepts, subjects
        • Contributors
        • Titles
        • Genres
        • Reference numbers
        • Synonymous names and subjects
        • Mood
        • Phrases
        • Concepts, subjects with other field
        • Contributor with other field
        • Title with other field
        • Genre with other field
        • Reference number with other field
        • Behaviours
        • Further research and design considerations
      • Analysis
        • Less than 3-word searches
        • Searches with 3 words or more
        • Subsequent searches
      • Query design
      • Relevance tests
        • Test 1 - Explicit feedback
        • Test 2 - Implicit feedback
        • Test 3 - Adding notes
        • Test 4 - AND or OR
        • Test 5 - Scoring Tiers
        • Test 6 - English tokeniser and Contributors
        • Test 7 - BoolBoosted vs ConstScore
        • Test 8 - BoolBoosted vs PhaserBeam
      • Collecting data
      • Reporting and metrics
      • Work IDs crib sheet
    • Adapters
      • Adapter lifecycle
      • Fetching records from Sierra
    • Sierra
      • Sierra IDs
    • Pipeline
      • Merging
    • APM
Powered by GitBook
On this page
  • Identifiers in Miro
  • The Miro data
  • How we update records
  • How we delete records
  1. Fetching records from source catalogues

MIRO: Our image collections

PreviousCALM: Our archive catalogueNextOur single model: the Work

Last updated 1 year ago

Miro is the name of the asset management software used for Wellcome Images, an online picture service that . This service included both openly-licenced images and images that required a paid licence.

Although wellcomeimages.org has now been subsumed by the image search on wellcomecollection.org, we still have the images and metadata, and we call it the "Miro" data.

Not all the wellcomeimages.org data came to the new site; only the subset which could be made available under an open licence.

Identifiers in Miro

A record in Miro may have (up to) three parts:

  • A prefix letter, which identifies the broad collection. From a :

    A – Animal Images AS – Family Life. Child Development B – Biomedical supplied by external contributors C – Corporate images relating to the Wellcome Trust D – Footage F – Microfilm FP – Family life child development L – Images of items held in the Library. Historical. M – Images of items held in the library. Historical. N – Clinical S – Slide collection V – Iconographic/works of art W – Publishing group International Health

  • An image number, which identifies this image within the collection. This is usually seven digits, e.g. 0029572.

  • Suffix characters, e.g. ER and EL are used to identify the right- and left-hand images of the same page in a book. This is optional and doesn't appear on all Miro records.

Examples of Miro identifiers: V0029572, V0018563ER.

The Miro data

We have XML exports of the Miro data from before Wellcome Images was turned off; these are kept in the . We have a JSON copy of this data in the platform account.

When we got the Miro data, we sorted the images into three buckets:

  • Open access – anything where we had positive confirmation from the original contributor that we could keep using their image, and make it available under a permissive licence

  • Staff access – anything we wanted to keep but couldn't put on the public website (this includes a lot of our in-house photography)

  • Cold store – anything we didn't want to or weren't sure if we could keep

Only the open access images are available on the website and to the catalogue pipeline; we don't use the other images.

How we update records

It is extremely rare for us to update Miro data; usually the Collections team are responsible for keeping data up-to-date, and there's no easy way for them to edit the Miro data exports.

We can override a select number of fields, including the "licence" field, but we try to do so as little as possible. All the Miro images we're keeping are gradually being ingested through Goobi, after which they'll have a METS file and an editable Sierra record -- and this will replace the legacy Miro record.

How we delete records

Occasionally Collections will ask us to take down a Miro image; we have a script for doing this in the pipeline repo.

All the Miro takedowns are recorded here: https://github.com/wellcomecollection/private/blob/main/miro-suppressions.md

More information at:

Wellcome used to run
Slack thread in 2020
storage service
https://github.com/wellcomecollection/private/blob/rk/takedown-requests/takedown-requests.md
https://github.com/wellcomecollection/storage-service/blob/main/docs/wellcome/completely-deleting-bags-from-the-storage-service.md