Incident retro - concept pages not available
Incident from: 2022-11-03
Incident until: 2022-11-03
Retro held: 2022-10-04
Timeline
3 November 2022
See https://wellcome.slack.com/archives/CQ720BG02/p1667487956149609 and https://wellcome.slack.com/archives/C01FBFSDLUA/p1667488779997449
14.44 AC doing the 2022-03-11 reindex to:
Fix the snapshot reporter
Flush out some pipeline issues
I’m going to kick off by deleting the 2022-10-14 pipeline, which seems like it was never used
AC thought it was never used based on https://api.wellcomecollection.org/catalogue/v2/_elasticConfig
15.05 AC Was concepts looking at the 2022-10-14 index? yes gonna roll it back
15.19 AC FYI the concepts pages are broken, because the concepts API is broken For some reason we had a mixture of API indices:
/works in prod = 2022-10-03
/concepts in prod = 2022-10-14
/works in stage = 2022-10-14
15:20 So I thought it was safe to delete 2022-10-14 and spin up 11-03, turns out that was wrong I’m rolling everything back to 10-03, and preparing 11-03 for rolling forward
15.24 JP suggests something went wrong with deployment?
AC yeah
errr https://buildkite.com/wellcomecollection/catalogue-api-deploy-prod/builds/373#0183e5ce-fcaa-487a-9d99-7c32ff37c073
concepts
ae604a2
-
-
items
-------
No image found!
search
-------
No image found!
snapshot_generator
-------
No image found!
I’m having to deploy locally with weco-deploy, because Buildkite is unhappy at the missing secrets
It wants to see 2022-10-14 because that’s what the staging API has, but I’ve deleted that
I’m deploying the latest images to staging
I wonder if something between our build short-circuiting and weco-deploy has gone wrong, e.g. there is no image tagged ae604a2 because it was only a change to the concepts API
15.34 AC seems to be okay, deploying to prod
15.43 AC Should be back up now
Analysis of causes
Thought same index was being used in works and concepts API
Something went wrong with the deployment of the 2022-10-14 index - bug in we-co deploy?
Actions
Alex
Remove all the “clever” build logic from all the scala repos #5626
Simplify the deployment logic in weco deploy so it always deploys a complete set of images
Last updated