Catalogue pipeline
  • Introduction
  • Fetching records from source catalogues
    • What is an adapter?
    • CALM: Our archive catalogue
    • MIRO: Our image collections
  • Transforming records into a single, common model
    • Our single model: the Work
    • Creating canonical identifiers
  • Combining records from multiple sources
    • Why do we combine records?
    • How we choose which records to combine
  • Other topics
    • Catalogue
    • Search
      • wellcomecollection.org query development index
      • Hypotheses
        • Concepts, subjects
        • Contributors
        • Titles
        • Genres
        • Reference numbers
        • Synonymous names and subjects
        • Mood
        • Phrases
        • Concepts, subjects with other field
        • Contributor with other field
        • Title with other field
        • Genre with other field
        • Reference number with other field
        • Behaviours
        • Further research and design considerations
      • Analysis
        • Less than 3-word searches
        • Searches with 3 words or more
        • Subsequent searches
      • Query design
      • Relevance tests
        • Test 1 - Explicit feedback
        • Test 2 - Implicit feedback
        • Test 3 - Adding notes
        • Test 4 - AND or OR
        • Test 5 - Scoring Tiers
        • Test 6 - English tokeniser and Contributors
        • Test 7 - BoolBoosted vs ConstScore
        • Test 8 - BoolBoosted vs PhaserBeam
      • Collecting data
      • Reporting and metrics
      • Work IDs crib sheet
    • Adapters
      • Adapter lifecycle
      • Fetching records from Sierra
    • Sierra
      • Sierra IDs
    • Pipeline
      • Merging
    • APM
Powered by GitBook
On this page
  1. Fetching records from source catalogues

What is an adapter?

PreviousIntroductionNextCALM: Our archive catalogue

Last updated 2 years ago

An adapter is an application that pulls records from source catalogues and into a catalogue pipeline database. We have one adapter per source catalogue.

Why do we have adapters?

  • Each adapter absorbs the complexity of the source catalogue's API, e.g. one catalogue might use a REST API, another uses SOAP XML, another sends SNS notifications. The adapter uses this API and copies the data into the catalogue (more specifically, a mix of DynamoDB and S3).

    Downstream code can then read from the catalogue copy of the data, and not worry about how the source catalogue works.

    (This gives adapters their name, compare to .)

  • Adapters isolate the catalogue pipeline from problems in the source catalogues.

    e.g. if CALM is down for upgrades, the pipeline is unaffected because it has a complete copy of the CALM data.

  • Adapters allow us to reprocess data at speed, without impacting the source catalogues.

    If we want to do some batch processing, we can process our copy of the data and run as fast as we like, without sending extra traffic to the source catalogues. This reduces the risk of us running an expensive query that accidentally overwhelms an upstream system, and breaks an application that other teams rely on.

plug adapters