📇
Catalogue API
  • Catalogue API
  • developers
  • How users request items
  • Search
    • Current queries
      • images
      • Query structure
    • Search
      • Changelog
      • Collecting data
      • Query design
      • Query design
      • wellcomecollection.org query development index
      • Reporting and metrics
      • Work IDs crib sheet
      • Analysis
        • Less than 3-word searches
        • Subsequent searches
        • Searches with 3 words or more
      • Hypotheses
        • Behaviours
        • Concepts, subject, and another field
        • Concepts, subjects with other field
        • Concepts, subjects
        • Contributor with other field
        • Contributors
        • Further research and design considerations
        • Genre with other field
        • Genres
        • Mood
        • Phrases
        • Reference number with other field
        • Reference numbers
        • Search scenarios
        • Synonymous names and subjects
        • Title with other field
        • Titles
      • Relevance tests
        • Test 1 - Explicit feedback
        • Test 2 - Implicit feedback
        • Test 3 - Adding notes
        • Test 4 - AND or OR
        • Test 5 - Scoring Tiers
        • Test 6 - English tokeniser and Contributors
        • Test 7 - BoolBoosted vs ConstScore
        • Test 8 - BoolBoosted vs PhaserBeam
    • Rank
      • Rank cluster
      • Developing with rank
      • Testing
Powered by GitBook
On this page
  • Identification and anonymisation
  • Storage
  1. Search
  2. Search

Collecting data

Collecting feedback and data on how our services are being used helps iterate and improve them over time.

While this insight from behavioural data is valuable, we don't believe that bigger data is necessarily better. Our philosophy is it would be foolish to start collecting data without first establishing which questions we wanted to answer, and wrong to collect data that we don't need. For example, we see no need to personalise users' search results so our search logs are kept entirely anonymous.

We restrict the data we collect only answer specific questions that we have. This allows us to iterate quickly while limiting risks to the people using our services.

What we track is primarily split into two interactions

  • What has a person searched for

  • How have the interacted with the results

Examples of the data we store for these are

// A search request
{
  "event": "Search",
  "anonymousId": "e1f51e69-d17e-4c06-a93f-7e280910b534", // This is an anonymous ID created when a session starts, and is used across all interactions of this session
  "timestamp": "2020-05-18T09:39:18.544Z",
  "network": "Staff", // If the interaction was on a staff network, we tag it with this
  "toggles": ["10-per-page"], // A list of A/B tests a person is in
  "query": { // This information is what was requested from the API
    "page": 1,
    "production.dates.from": "1900",
    "production.dates.to": "1950",
    "query": "Zodiac sign gemini",
    "format": ["books", "manuscripts", "images"]
  }
}

// Search result selected
{
  "event": "Search result selected",
  "anonymousId": "2104fa2d-59d3-423d-8958-3f254bc2bf62",
  "timestamp": "2020-05-18T09:39:23.529Z",
  "network": null,
  "toggles": ["availableOnline:false"],
  "query": { // The API request for the results
    "page": 1,
    "production.dates.from": "2000",
    "production.dates.to": "2010",
    "query": "Nurse",
    "format": ["journals"]
  },
  "data": { // Extra data about the interaction and work that was selected
    "id": "mruzf9kx",
    "position": 13,
    "resultIdentifiers": [
      "L0043772",
      "Museum No A96087"
    ],
    "resultSubjects": [
      "Nurse"
    ],
    "resultFormat": "Digital Images",
  }
}

Identification and anonymisation

We store no personably identifiable information with each interaction collected.

We do store if an request was made from within Wellcome's network.

Storage

We currently retain anonymised data in perpetuity.

PreviousChangelogNextQuery design

Last updated 10 months ago

We label each interaction with an .

Data is collected on the frontend via , sent to a kinesis stream, and then stored in Elasticsearch.

This document does not include general data collection across wellcomecollection.org, but for work on the .

anonymous ID from on Segment
Segment's analytics.js
catalogue search