Incident Retros
  • Incident retro - Internal model <=> Elastic index sync question
  • Incident retro - merging
  • Incident retro - Miro images
  • Incident retro - January downtime
  • Incident retro - Elastic Cloud
  • Incident retro - stories and home page down
  • Incident retro - search not available
  • Incident retro - ingestors
  • Incident retro - home page with json
  • Incident retro - slow search
  • Incident retro - search not available
  • Incident retro - search not available
  • Incident retro - cross cluster replication
  • Incident retro - 500'ing on the /images endpoint
  • Incident retro - home page and stories page not available
  • Incident retro - home page and what's on not available
  • Incident retro - requests not showing in account
  • Incident retro - requests not showing in account
  • Incident retro - works search errors
  • Incident retro - date picker
  • Incident retro - requests not showing in account
  • Incident retro - reporting cluster downtime and configuration loss
  • Incident retro - story page appearing then replaced by a 404
  • Incident retro - increased rate of errors in searches on wellcomecollection.org
  • Incident retro - slow search due to 900k messages on the ingestor queue
  • Incident retro - concept pages not available
  • Incident retro - Prismic model changes
  • Incident retro - Images search down
  • Incident retro - wc.org intermittently available
  • Incident retro - web site not available
  • Incident retro - search not available
  • Incident retro - search not available
  • Incident retro - users cannot login to their accounts on wellcomecollection.org
  • Incident retro - users cannot login to their accounts on wellcomecollection.org
  • Incident retro - digital assets not available
Powered by GitBook
On this page
  • Timeline
  • Analysis of causes
  • Actions

Incident retro - search not available

PreviousIncident retro - search not availableNextIncident retro - cross cluster replication

Last updated 10 months ago

Incident from: 2021-09-20

Incident until: 2021-09-20

Retro held: 2021-09-22

Timeline

20 September 2021

See https://wellcome.slack.com/archives/C01FBFSDLUA/p1632152123024700

16.32 Updown alert for: Front End Works Search (Origin) Front End Works Search (Cache)

16.35 DM: I’ve just deployed, but server routing for works is broken. I’ve got the fix and think we should be able to roll forward

16.39 JP yeah tempted to say we roll back for now

16.40 NP put a message about the search issue in #wc-platform-feedback

16.41 DM has a PR https://github.com/wellcomecollection/wellcomecollection.org/pull/7062/files

16.43 JP I’ve started a rollback here btw; can’t do any harm even if we get that PR in

16.44 JP But I do think we should wait to check stage before promoting [the PR fix]

16.45 Updown recovery for: Front End Works Search (Origin) Front End Works Search (Cache)

16.46 NP Okay, looks like I can search again. [Fixed by the rollback, not the PR] Also said search was working in #wc-platform-feedback

Analysis of causes

A deployment caused server routing for works to break, and end to end tests didn’t catch the problem on stage.

Actions

DM

  • Create a runbook for front end incidents

JG

  • Investigate why end to end tests didn’t break on all pages

Add render tests for the catalogue app top-level pages DONE

#7063
Timeline
Analysis of causes
Actions