# Incident retro - works search errors

**Incident from:** 2022-05-24

**Incident until:** 2022-05-24

**Retro held:** 2022-05-26

* [Timeline](#timeline)
* [Analysis of causes](#analysis-of-causes)
* [Actions](#actions)

## Timeline

24 May 2022

See <https://wellcome.slack.com/archives/C01FBFSDLUA/p1653408854658319>

15.03 alerts for errors in the catalogue API, followed by a continuous stream of alerts for cloudfront

15.15 (approx) CCR of works index started from pipeline cluster into rank cluster

16.00 (approx) HP noticed 500 and asked if search was broken in #wc-incident-response, later deleted that message as reran search and it worked as expected

17.14 AC <https://wellcomecollection.org/works?page=1\\&query=conosceva> looks like the API is unhappy

Saw api was flaky

Cleared index caches which didn’t help

Suggested check refresh interval seen in previous issues - wasn’t that

A rolling restart did fix the issue

PB That link works for me in production but not if I switch my toggle to staging api

17.43 AC We seem to have fixed the issue

## Analysis of causes

Combination of different factors:

* CCR was the trigger which depleted some resource
* Increased document size was a contributing factor
* Not being able to clear out the circuit breakers

## Actions

**Harrison**

* Add notes to rank documentation as reminders for:
  * Increase cluster size when doing a CCR as that is memory intensive and remember to turn down the size afterwards
  * More communication with colleagues when you’re about to use CCR
  * CCR works and images separately

**Alex**

* Improve alerts in Slack so it’s easier to triage and spot new errors - DONE

**Team**

* Fix known errors in Slack as they arise


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.wellcomecollection.org/incident-retros/2022-05-24_works-search-errors.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
