Catalogue pipeline
  • Introduction
  • Fetching records from source catalogues
    • What is an adapter?
    • CALM: Our archive catalogue
    • MIRO: Our image collections
  • Transforming records into a single, common model
    • Our single model: the Work
    • Creating canonical identifiers
  • Combining records from multiple sources
    • Why do we combine records?
    • How we choose which records to combine
  • Other topics
    • Catalogue
    • Search
      • wellcomecollection.org query development index
      • Hypotheses
        • Concepts, subjects
        • Contributors
        • Titles
        • Genres
        • Reference numbers
        • Synonymous names and subjects
        • Mood
        • Phrases
        • Concepts, subjects with other field
        • Contributor with other field
        • Title with other field
        • Genre with other field
        • Reference number with other field
        • Behaviours
        • Further research and design considerations
      • Analysis
        • Less than 3-word searches
        • Searches with 3 words or more
        • Subsequent searches
      • Query design
      • Relevance tests
        • Test 1 - Explicit feedback
        • Test 2 - Implicit feedback
        • Test 3 - Adding notes
        • Test 4 - AND or OR
        • Test 5 - Scoring Tiers
        • Test 6 - English tokeniser and Contributors
        • Test 7 - BoolBoosted vs ConstScore
        • Test 8 - BoolBoosted vs PhaserBeam
      • Collecting data
      • Reporting and metrics
      • Work IDs crib sheet
    • Adapters
      • Adapter lifecycle
      • Fetching records from Sierra
    • Sierra
      • Sierra IDs
    • Pipeline
      • Merging
    • APM
Powered by GitBook
On this page
  • Documentation
  • Repo

Introduction

NextWhat is an adapter?

Last updated 2 years ago

The catalogue pipeline populates the search index for .

This includes:

  • fetching records from source catalogues and keeping them up-to-date

  • transforming records into a single, common model

  • combining records from multiple sources, where appropriate

  • creating an Elasticsearch index which can be queried by

Documentation

This GitBook space is meant to provide a high-level overview of the catalogue pipeline and its design. These docs are meant for Wellcome Collection developers who want to learn about the project, or for colleagues at other institutions who want to build something similar.

It does not contain specific operational details, e.g. how to deploy specific services. Those are kept inside the code repository.

Repo

The catalogue pipeline code is in

our online catalogue search
the catalogue API
https://github.com/wellcomecollection/catalogue-pipeline