Catalogue pipeline
  • Introduction
  • Fetching records from source catalogues
    • What is an adapter?
    • CALM: Our archive catalogue
    • MIRO: Our image collections
  • Transforming records into a single, common model
    • Our single model: the Work
    • Creating canonical identifiers
  • Combining records from multiple sources
    • Why do we combine records?
    • How we choose which records to combine
  • Other topics
    • Catalogue
    • Search
      • wellcomecollection.org query development index
      • Hypotheses
        • Concepts, subjects
        • Contributors
        • Titles
        • Genres
        • Reference numbers
        • Synonymous names and subjects
        • Mood
        • Phrases
        • Concepts, subjects with other field
        • Contributor with other field
        • Title with other field
        • Genre with other field
        • Reference number with other field
        • Behaviours
        • Further research and design considerations
      • Analysis
        • Less than 3-word searches
        • Searches with 3 words or more
        • Subsequent searches
      • Query design
      • Relevance tests
        • Test 1 - Explicit feedback
        • Test 2 - Implicit feedback
        • Test 3 - Adding notes
        • Test 4 - AND or OR
        • Test 5 - Scoring Tiers
        • Test 6 - English tokeniser and Contributors
        • Test 7 - BoolBoosted vs ConstScore
        • Test 8 - BoolBoosted vs PhaserBeam
      • Collecting data
      • Reporting and metrics
      • Work IDs crib sheet
    • Adapters
      • Adapter lifecycle
      • Fetching records from Sierra
    • Sierra
      • Sierra IDs
    • Pipeline
      • Merging
    • APM
Powered by GitBook
On this page
  • Identifiers in CALM
  • The CALM API
  • How we get updates
  1. Fetching records from source catalogues

CALM: Our archive catalogue

PreviousWhat is an adapter?NextMIRO: Our image collections

Last updated 2 years ago

CALM is the database used by Collections Information to manage our archives catalogue.

Identifiers in CALM

A record in CALM has three identifiers:

  • A RecordID, which is a UUID used by the CALM database, e.g. 002c5acf-a977-4f1e-ae6f-bcf84143ec05

  • A RefNo, which is a slash-separated string that tells us the position of a record in an archive hierachy, e.g. PPMLV/C/7/6/6.

    All the records in the same archive will have the same prefix before the first slash, e.g. everything with a RefNo that starts PPMLV/ is part of the Dr Marthe Louise Vogt archive.

    PP is a common prefix that stands for "Personal Papers"; SA is another that stands for "Societies and Associations".

  • An AltRefNo, which is the display version of the RefNo, e.g. PP/MLV/C/7/6/6.

    This may be formatted slightly differently, and should not be analysed as a structured string.

The CALM API

There's no publicly available documentation, but there's a in one of our private S3 buckets.

How we get updates

  • We poll CALM on a fixed interval, and retrieve any records which have been created or modified since the last poll.

  • We don't pull complete CALM records into the pipeline; we suppress a handful of fields, e.g. those which contain personally identifiable information (PII). We'd never present that in the catalogue API, and creating another copy is unnecessary. See for more detail.

  • When records are deleted from CALM, they disappear immediately. They no longer appear in the API.

    We have a separate app that looks for deleted records, by comparing the records CALM knows about and the records we know about. e.g. if CALM thinks there are 9 records and we think there are 10, we know that we need to remove 1 record from the catalogue pipeline.

CALM API guide
suppressed fields in CALM