📇
Catalogue API
  • Catalogue API
  • developers
  • How users request items
  • Search
    • Current queries
      • images
      • Query structure
    • Search
      • Changelog
      • Collecting data
      • Query design
      • Query design
      • wellcomecollection.org query development index
      • Reporting and metrics
      • Work IDs crib sheet
      • Analysis
        • Less than 3-word searches
        • Subsequent searches
        • Searches with 3 words or more
      • Hypotheses
        • Behaviours
        • Concepts, subject, and another field
        • Concepts, subjects with other field
        • Concepts, subjects
        • Contributor with other field
        • Contributors
        • Further research and design considerations
        • Genre with other field
        • Genres
        • Mood
        • Phrases
        • Reference number with other field
        • Reference numbers
        • Search scenarios
        • Synonymous names and subjects
        • Title with other field
        • Titles
      • Relevance tests
        • Test 1 - Explicit feedback
        • Test 2 - Implicit feedback
        • Test 3 - Adding notes
        • Test 4 - AND or OR
        • Test 5 - Scoring Tiers
        • Test 6 - English tokeniser and Contributors
        • Test 7 - BoolBoosted vs ConstScore
        • Test 8 - BoolBoosted vs PhaserBeam
    • Rank
      • Rank cluster
      • Developing with rank
      • Testing
Powered by GitBook
On this page
  • Intentions & Expectations
  • Phrase matching
  • Synonymous names and subjects
  • Shingled titles
  • Titles
  • IDs
  • Contributors
  • Subjects
  • Genres
  • General
  1. Search
  2. Search

Query design

Each intention is mapped to a query within Elastic.

We then boost these to sort the results in relation to people's expectations and priorities.

e.g.

Someone is searching for Honor Fell. This would match both intentions of the works with the contributor of Honor Fell and works with Honor Fell in the title.

We would then boost the contributor query by 2000 and the title query by 1000

This would surface works by Honor Fell first, and then works with Honor Fell in the title.

Each query has:

  • Intentions: What a person is trying to achieve with their search

  • Data features: Parts of the data we think are relevant

  • Status: Where we are with developing this query

    • TODO: We know it's something we need to do

    • Testing: The initial query has been created and is running as a

      test

    • Stable: The current implementation of the query meets the

      expectations and has been measured in the world as doing so.

Intentions & Expectations

Phrase matching

Status
Query name
Ranking evaluation test

TODO

PhraseMatchQuery

TBD

Intentions

Searching for an exact, ordered set of tokens, in quotation marks

Data features

  • data.title

  • data.alternativeTitle

  • physicalDescription

  • subjects.label

  • genres.label

  • contributors.label

  • description

Expectations

  • Phrases are defined by tokens wrapped in quotation marks and should be matched exactly

  • Works matching individual tokens in the phrase should not be matched by this query

Examples and analysis of 3-word searches (quoted percentages of total 653)

  • CITY OF WESTMINSTER: place names (8%)

  • glasgow royal infirmary: institutions + place names (2%)

Synonymous names and subjects

Status
Query name
Ranking evaluation test

TODO

SynonymQuery

TBD

Intentions

Searching for names of people, scientific concepts, places, or other subjects which have known synonyms or alternative names

Data features

  • subjects.label

  • contributors.label

Expectations

  • Searching for a name should return works by the author, even if the structure of the name is recorded differently in the catalogue

  • Searching for the current or scientific name of a disease should return works about the same disease recorded with its original or common name(s)

  • Searching for a scientific or medical concept should return works about the subject, even if described using different language in the catalogue, including "narrower terms" and "related terms"

Examples

Shingled titles

Status
Query name
Ranking evaluation test

TODO

ShingledTitleQuery

TBD

Intentions

Searching for works with structurally important features in the query. Most useful when looking for specific titles, where the order of tokens matters most.

Data features

  • data.title

  • data.alternativeTitle

Expectations

  • Searching for an exact title should show that title at the top of the list

  • Occurrences of ordered tokens matching the query should appear before the matches which occur in a different order

Examples and analysis of 3-word searches (quoted percentages of total 653)

  • ORIGIN OF SPECIES: titles without quotation marks (9%)

  • "MEDICAL TIMES GAZETTE": titles with quotation marks (0.5%)

Titles

Status
Query name
Ranking evaluation test

Stable

TitleQuery

TBD

Intentions

Searching for a work by its title.

Data features

  • data.title

  • data.alternativeTitles

Expectations

  • When the exact title is searched for, it is the first result

  • If it is a partial match of the title, it is the first result

Examples

TBD

What's next

  • How to handle fuzziness?

IDs

Status
Query name
Ranking evaluation test

TODO

IdQuery

TBD

Intentions

Searching for a work based on local and external identifiers e.g. Catalogue API IDs, Sierra IDs etc.

Data features

  • canonicalIds

  • sourceIdentifiers

  • otherIdentifiers

Expectations

  • Searching for an identifier, I get the result back

  • Searching for a list of identifiers, I get all the results back

  • Searches should be case insensitive

  • If the search query contains and ID and other input, we should match

    the ID and terms with the the ID match at the top of the list.

Examples

  • V1234567

  • V1234567 i1234567 aTrf569

Contributors

Status
Query name
Ranking evaluation test

Testing

ContributorsQuery

TBD

Intentions

Searching for works that have certain subjects associated with it.

Data features

  • contributors.label

Expectations

  • Searching for the exact name of a subject, works they have

    contributed towards are first results

Examples and analysis of 3-word initial searches (quoted percentages of total 653)

  • A K JOHNSON: contributor consists of 3 words (25%)

  • "Holloway, Thomas, 1748-1827.": contributor combined with date (18%)

  • john snow map: contributor combined with other entity (3%)

Subjects

Status
Query name
Ranking evaluation test

Testing

SubjectsQuery

TBD

Intentions

Searching for works that have certain subjects associated with it.

Data features

  • subjects.label

Expectations

  • Searching for the exact name of a subject, works they have

    contributed towards are first results

Examples and analysis of 3-word initial searches (quoted percentages of total 653)

  • Royal Astronomical Society: 3-word subjects with no quotation marks used, so no phrase matching (24%)

  • cholera, london, patient: 3 distinct subjects but expectation is that first term will be boosted

  • "Colour vision defects." 2 subjects but spanning unequal word counts (4%)

Genres

Status
Query name
Ranking evaluation test

Testing

GenresQuery

TBD

Intentions

Searching for works that have certain genres associated with it.

Data features

  • genres.label

Expectations

  • Searching for the exact name of a genre, works they have

    contributed towards are first results

Examples and analysis of 3-word initial searches (quoted percentages of total 653)

  • Lithographs human anatomy: genres combined with other entities (3%)

  • paintings still life: queries including genres (4%)

General

Status
Query name
Ranking evaluation test

Testing

GeneralQuery

TBD

Intentions

Searching the catalogue for general information

Data features

  • title

  • alternativeTitles

  • physicalDescription

  • language

  • edition

  • physicalDescription

  • subjects.label

  • genres.label

  • contributors.label

  • description

Expectations

  • Relevant and interesting results are returned in order of relevance

    and interest

Examples

TBD

PreviousCollecting dataNextQuery design

Last updated 10 months ago

The ocean as a health resort : a practical handbook of the sea for the use of tourists and health-seekers- should be the only result returned by this query

A set of look-up-able synonyms obtained from variant terms, narrower terms and related terms

william smellie should match results tagged with Smellie, William, eg

flu should match results tagged with influenza, eg

phytology should match results tagged with botany, eg

east london - above

The ocean as a health resort : a practical handbook of the sea for the use of tourists and health-seekers- at the top of the list, with other works further down

https://wellcomecollection.org/works/uxxaqdkg
LCSH
https://wellcomecollection.org/works/nswqv96z
https://wellcomecollection.org/works/kfneqvdx
https://wellcomecollection.org/works/eqqmtzca
https://wellcomecollection.org/works/ufw89pqr
https://wellcomecollection.org/works/pabxvfqu
https://wellcomecollection.org/works/uxxaqdkg