Developing with rank

Setting up

  • If you need an up-to-date index, replicate one from the production cluster as documented here

  • Copy the query config across from the search API application: yarn copyQueries

Queries

Queries are the easiest part of the search-relevance puzzle to modify and test.

  • Make your changes to WorksMultiMatcherQuery.json or ImagesMultiMatcherQuery.json in public (these have been copied here by yarn copyQueries above).

  • Use the candidate queryEnv on /dev or /search to see the results.

  • When you're happy with the effect of your changes on the rank tests, you'll need to make the scala used by the API match the JSON used by rank. Edit the images and/or works scala files until the tests pass.

Mappings and settings

We often want to test against indices that have new or altered analyzers, mappings, or settings. To create and populate a new index:

  • Run yarn getIndexConfig to fetch mappings and other config from existing indices in the rank cluster. The config for your chosen indices will be written to ./data/indices/.

  • Edit the file(s) in data/indices to your needs, using existing mappings as a starting point.

  • Run yarn createIndex to create the new index in the rank cluster from the edited mappings. This will also give you an option to start a reindex.

  • If you need to monitor the state of a reindex, run yarn checkTask.

  • If you need to delete a candidate index, run yarn deleteIndex

  • If you need to update a candidate index, run yarn updateIndex

To see the results of your changes, select your new index on /dev or /search.

You might need to edit the query to fit the new mapping, following these instructions.

Before deploying your changes, you'll need to make sure the scala version of the config used by the pipeline matches the JSON version you've been testing. You should copy your JSON config over to the catalogue pipeline repo, and edit the scala until the tests pass.

Test cases

We collect test cases directly from the stakeholders and feedback channels for wellcomecollection.org.

Each test should represent a search intention - a class of search which we see real users performing. For example

Tests should be grouped according to the following structure:

  • id, label, and description - describing what each group of cases is testing

  • metric - an Elasticsearch metric to run the cases against

  • eval - an optional, alternative evaluation method to apply to the metric score returned by elastic

  • searchTemplateAugmentation - an optional augmentation to the query, eg a filter

  • cases - the list of search terms and corresponding results to be tested

Each test case in that list should contain:

  • query - the search terms a researcher uses

  • ratings - IDs of documents that we want to evaluate against the results

  • description - a description of the search intention which is embodied by the test

Last updated