Request For Comments (RFCs)
  • Request for comments (RFC)
  • RFC 001: Matcher architecture
  • RFC 002: Archival Storage Service
  • RFC 003: Asset Access
  • RFC 004: METS Adapter
  • RFC 005: Reporting Pipeline
  • RFC 006: Reindexer architecture
  • RFC 007: Goobi Upload
  • RFC 008: API Filtering
  • RFC 009: AWS account setup
  • RFC 010: Data model
  • RFC 011: Network Architecture
  • RFC 012: API Architecture
  • RFC 013: Release & Deployment tracking
    • Deployment example
    • Version 1
  • RFC 014: Born digital workflow
  • RFC 015: How we work
    • Code Reviews
    • Shared Libraries
  • RFC 016: Holdings service
  • URL Design
  • Pipeline Tracing
  • Platform Reliability
    • CI/CD
    • Observability
    • Reliability
  • RFC 020: Locations and requesting
  • RFC 021: Data science in the pipeline
  • RFC 022: Logging
    • Logging example
  • RFC 023: Images endpoint
  • RFC 024: Library management
  • RFC 025: Tagging our Terraform resources
  • RFC 026: Relevance reporting service
  • RFC 026: Relation Embedder
  • RFC 027: Pipeline Intermediate Storage
  • RFC 029: Work state modelling
  • Pipeline merging
  • RFC 031: Relation Batcher
  • RFC 032: Calm deletion watcher
  • RFC 033: Api internal model versioning
  • RFC 034: Modelling Locations in the Catalogue API
  • RFC 035: Modelling MARC 856 "web linking entry"
  • RFC 036: Modelling holdings records
  • API faceting principles & expectations
  • Matcher versioning
  • Requesting API design
  • TEI Adapter
  • Tracking changes to the Miro data
  • How do we tell users how to find stuff?
  • Removing deleted records from (re)indexes
  • RFC 044: Tracking Patron Deletions
  • Work relationships in Sierra, part 2
    • Work relationships in Sierra
  • Born Digital in IIIF
  • Transitive hierarchies in Sierra
  • RFC 047: Changing the structure of the Catalogue API index
  • RFC 048: Concepts work plan
  • RFC 049: Changing how aggregations are retrieved by the Catalogue API
  • RFC 050: Design considerations for the concepts API
  • RFC 051: Ingesting Library of Congress concepts
  • RFC: 052: The Concepts Pipeline - phase one
  • RFC 053: Logging in Lambdas
  • RFC 054: Authoritative ids with multiple Canonical ids.
  • RFC 055: Genres as Concepts
  • RFC 055: Content API
    • Content API: articles endpoint
    • Content API: Events endpoint
    • Content API: exhibitions endpoint
    • The future of this endpoint
  • RFC 056: Prismic to Elasticsearch ETL pipeline
  • RFC 57: Relevance testing
    • Examples of rank CLI usage
  • RFC 059: Splitting the catalogue pipeline Terraform
  • RFC 060: Service health-check principles
  • RFC 060: Offsite requesting
    • Sierra locations in the Catalogue API
  • Content-api: next steps
Powered by GitBook
On this page
  • Background
  • Purposes
  • Further requirements
  • Notes on implementation
  • Proposed endpoints
  • Articles
  • Exhibitions
  • Events

RFC 055: Content API

PreviousRFC 055: Genres as ConceptsNextContent API: articles endpoint

Last updated 10 months ago

Status: Draft

Last updated: 2023-03-06

This RFC outlines a new set of API endpoints which will allow wellcomecollection.org users to search and filter content which is stored in Prismic.

Background

We use to edit and store information about our exhibitions, events, stories, and other pieces of non-catalogue content on wellcomecollection.org.

Recently, we've allowed users to using .

That MVP implementation has demonstrated that Prismic's search functionality isn't good enough to produce relevant results on its own. See Slack threads and .

Prismic matches documents to a user's search terms using a very loose, fuzzy query on all text-like fields, but, unlike Elasticsearch, does not assign each document a score corresponding to its relevance. Instead of sorting by relevance, users are limited to sorting the retrieved documents by date or by title, which often makes the results appear irrelevant (e.g. weak matches appearing at the top of the list due to recency). Prismic's GraphQL API is also unsuitable for filtering content by arbitrary fields, which further limits our users' ability to find the content they're looking for.

We'd like to replace our queries to the Prismic API with something more configurable, like the system we have for the catalogue.

We're building a pipeline which will ingest content from Prismic into a set of Elasticsearch indices (see ). To allow users to search and filter that content from Prismic, we also need a new set of API endpoints which will query those Elasticsearch indices. The priority purpose of these endpoints will be to serve our Search. We might use them at a later time for content list pages, but at this time the focus will solely be on making this useful for Search.

This API will live at https://api.wellcomecollection.org/content/v0/, with endpoints for /articles, /exhibitions, and /events.

We won't consider the way that documents are scored as part of this RFC. Relevance requirements should be developed iteratively and independently from the development of the API.

Purposes

The /content API should allow users to:

  • request a single exhibition, event, or article by ID;

  • query articles, exhibitions and events, retrieving relevant results based on their search terms. The focus for v0 will be on articles - the other two might be explored further in a future version of this API.

  • filter and aggregate list of articles by a set of predefined filters and aggregations - for v0 of the Content API, we will only use the query parameter for exhibitions and events

Further requirements

  • That being said, we will be following the Prismic content model in v0 over the Works model. Should that model not satisfy, we should consider making the changes in Prismic directly and adjusting the content.

  • The API should only return enough information for users to determine whether a result is relevant, and provide a link to the relevant page on wellcomecollection.org.

  • Even though we will be making [contentType]/[id] endpoints, the content of the pages themselves, and the content type list pages, should still be fetched from Prismic directly for the time being.

  • The API's URL structure should be consistent with what appears on wellcomecollection.org's front-end. For example, if article on the site appears at /articles/{id}, the API equivalent should be at content/v0/articles/{id}.

Notes on implementation

  • Though the content API will share code with the concepts API, it should be built as a separate service.

  • The Elasticsearch index mapping should represent the contract between the pipeline and the API. The API shouldn't need to know anything about the structure of the data in Prismic, and any substantial data augmentation should be done by the pipeline.

Proposed endpoints

Articles

https://api.wellcomecollection.org/content/v0/articles

Exhibitions

https://api.wellcomecollection.org/content/v0/exhibitions

Events

https://api.wellcomecollection.org/content/v0/events

The new endpoints should fit as seamlessly as possible into the rest of the from a user's POV, following as many of the existing conventions as it can.

The new API service should be written in Typescript, following patterns set by the for filtering, pagination, error handling, etc.

We shouldn't mint new IDs for exhibitions, events, stories, etc. Articles, exhibitions, and events are only stored in Prismic, and won't need to be merged with other sources. /content objects should therefore use the document IDs directly from Prismic. This is also consistent with the way that these objects are referenced in wellcomecollection.org's URLs (eg , ). It was flagged that it might eventually become an issue as the Prismic IDs are case-sensitive, but it's not enough of an issue at the moment to warrant the work that this would require.

Read more about the structure of the

Read more about the structure of the

Read more about the structure of the

Prismic
search for stories on wellcomecollection.org
Prismic's GraphQL API
about Prismic's GraphQL search
attempts to augment Prismic's search results with third party libraries
RFC 056: Prismic to Elasticsearch ETL pipeline
wellcomecollection.org API suite
concepts API
/articles/Y_M_xhQAACcAqmjW
/exhibitions/Y0QhIxEAAA__0sMb
articles endpoint
exhibitions endpoint
events endpoint