📇
Catalogue API
  • Catalogue API
  • developers
  • How users request items
  • Search
    • Current queries
      • images
      • Query structure
    • Search
      • Changelog
      • Collecting data
      • Query design
      • Query design
      • wellcomecollection.org query development index
      • Reporting and metrics
      • Work IDs crib sheet
      • Analysis
        • Less than 3-word searches
        • Subsequent searches
        • Searches with 3 words or more
      • Hypotheses
        • Behaviours
        • Concepts, subject, and another field
        • Concepts, subjects with other field
        • Concepts, subjects
        • Contributor with other field
        • Contributors
        • Further research and design considerations
        • Genre with other field
        • Genres
        • Mood
        • Phrases
        • Reference number with other field
        • Reference numbers
        • Search scenarios
        • Synonymous names and subjects
        • Title with other field
        • Titles
      • Relevance tests
        • Test 1 - Explicit feedback
        • Test 2 - Implicit feedback
        • Test 3 - Adding notes
        • Test 4 - AND or OR
        • Test 5 - Scoring Tiers
        • Test 6 - English tokeniser and Contributors
        • Test 7 - BoolBoosted vs ConstScore
        • Test 8 - BoolBoosted vs PhaserBeam
    • Rank
      • Rank cluster
      • Developing with rank
      • Testing
Powered by GitBook
On this page
  • Overview
  • Known unknowns
  • Glossary
  • Results
  • Click through rate
  • Click distribution
  • Conclusions
  1. Search
  2. Search
  3. Relevance tests

Test 8 - BoolBoosted vs PhaserBeam

PreviousTest 7 - BoolBoosted vs ConstScoreNextRank

Last updated 11 months ago

Overview

Currently we are matching with a simple AND query.

This means that if something has the title of Treatise on Radioactivity, all of the below will match with equal scoring:

  • Treatise on Radioactivity

  • Radioactivity on Treatise

  • on Radioactivity Treatise

  • etc

This tests whether is a better fit for this. You would think so given that is what it is for.

We wrap the query in a , matching on the text type and keyword type fields, boosting the keyword field higher as that would infer an exact match.

Using the phrase type of multi_match then chooses the highest score of the two, and surfaces that.

Because we the lose the niceness of the AND search, we've added that as a tier, applied it similarly to the BaseQuery but boosted it by 2.

We think this should give us much better matching of items people know the name of, but retain the fetching of things that are loosely relevant.

Known unknowns

After running through the explain API with this, boosting seems to be similar, but not exactly the same as what you put in the query, we'll be exploring further as to why that is the case.

Glossary

Named feature A named feature is a piece of data in which the whole phrase contains semantic meaning. e.g. subjects, genres, titles, people and organisation names etc.

Results

TBD

Click through rate

ConstScore
BoolBoosted

first page only

TBD

TBD

beyond first page

TBD

TBD

Click distribution

TBD

Conclusions

TBD

phrase matching
multi_match query
named features