Incident Retros
  • Incident retro - Internal model <=> Elastic index sync question
  • Incident retro - merging
  • Incident retro - Miro images
  • Incident retro - January downtime
  • Incident retro - Elastic Cloud
  • Incident retro - stories and home page down
  • Incident retro - search not available
  • Incident retro - ingestors
  • Incident retro - home page with json
  • Incident retro - slow search
  • Incident retro - search not available
  • Incident retro - search not available
  • Incident retro - cross cluster replication
  • Incident retro - 500'ing on the /images endpoint
  • Incident retro - home page and stories page not available
  • Incident retro - home page and what's on not available
  • Incident retro - requests not showing in account
  • Incident retro - requests not showing in account
  • Incident retro - works search errors
  • Incident retro - date picker
  • Incident retro - requests not showing in account
  • Incident retro - reporting cluster downtime and configuration loss
  • Incident retro - story page appearing then replaced by a 404
  • Incident retro - increased rate of errors in searches on wellcomecollection.org
  • Incident retro - slow search due to 900k messages on the ingestor queue
  • Incident retro - concept pages not available
  • Incident retro - Prismic model changes
  • Incident retro - Images search down
  • Incident retro - wc.org intermittently available
  • Incident retro - web site not available
  • Incident retro - search not available
  • Incident retro - search not available
  • Incident retro - users cannot login to their accounts on wellcomecollection.org
  • Incident retro - users cannot login to their accounts on wellcomecollection.org
  • Incident retro - digital assets not available
Powered by GitBook
On this page
  • Timeline
  • Thursday 2 May
  • Tuesday 7 May
  • Wednesday 8 May
  • Thursday 9 May
  • Friday 10 May
  • Analysis of causes
  • Actions

Incident retro - search not available

PreviousIncident retro - search not availableNextIncident retro - users cannot login to their accounts on wellcomecollection.org

Last updated 10 months ago

Incident from: 2024-05-07

Incident until: 2024-05-10

Retro held: 2024-05-13

Timeline

Site was checked against Prismic staging

Thursday 2 May

Decision to not to make changes ahead of a bank holiday weekend

Tuesday 7 May

DM: Pushed Slice Machine types to prod and migrated content after checking with Editorial Told editorial about changes to Quote slices and Image gallery slices

: 1st There was an error in the content-pipeline-2023-03-24 Lambda alert in #wc-platform-alert channel Cloudwatch logs indicate the issue is around fetching the data from the Prismic API

        "",
        "A shared slice can only contains  `variation`."

14:30ish: DM, RC, RK and AG get together to debug. The issue is identified as the content-api’s articles graph query being incompatible with the new Prismic data model. The graph query is modified to work with the new Prismic data model.

(15:00-16:00 break for review meeting)

16:00: Fix is being implemented and tested locally. PR open for review

16:39: Last alert pop in #wc-platform-alert

16:50: Fix is approved, merged and deployed. Content-pipeline-2023-03-24 alerts stop

Wednesday 8 May

  1. Promo images aren’t always using the 16:9 version

  2. ‘In Pictures’ articles aren’t showing all images by default

  3. Old ‘webcomics’ (Body Squabbles) are rendering as image galleries with one image

  4. One article from 2017 was somehow listed as the most recently published thing I have delisted the old article.

I’ve got a PR on the way that deals with 2. and 3. impact of 1 is that images aren’t all the same aspect ratio in cards currently (don’t think this is a huge deal?) [aspect ratio via contentUrlSuffix didn’t get taken over in the migration. Reported to Prismic as a bug with migrating assets] Change expected URL for meta image #10859

DM: I’ve just removed and re-added a promo image for one of the cards on the /stories page and it appears to have fixed it for that one so I think it might be a case of doing that for the ones on landing pages I think this is something we can report to Prismic as a bug with migrating assets

Thursday 9 May

Friday 10 May

Analysis of causes

What happened that we didn’t anticipate?

  • Content API affected

  • Opening times weren’t displaying “name” and “image” as they didn’t have a “label” anymore

  • Promo images aren’t always using the 16:9 version

  • ‘In Pictures’ articles aren’t showing all images by default

  • Old ‘webcomics’ (Body Squabbles) are rendering as image galleries with one image

  • One article from 2017 was somehow listed as the most recently published thing

  • Drafts weren’t migrated

NB Alerts caught the content API issue

Why didn’t our safeguards catch this?

  • Mostly these were subtle changes to the site

  • No end to end tests with the migrated material

Actions

DM

  • Migrate any remaining docs in draft

  • Investigate how to find drafts

Prismic

  • To investigate why image weight label crops didn't migrate

NP

  • Take to planning: Toggle for data from Prismic staging / run e2es on Prismic staging to enable sharing of the site with a wider group before migrating

Future migrations

  • Share with more people before migrating e.g. devs and editorial

  • Kickoff with reps from other teams

AG: could it be possible that the slice machine changes are affecting the way we're fetching prismic articles and events to write them into the content index? We're seeing errors in the content pipeline where it's trying to fetch prismic docs. Articles weren’t modified yet

DM Things I’ve noticed after remapping content:

Removed Bodies of Knowledge symposium to help with a bug fix Draft articles not migrated in the same way as published articles, and not easy to find

DM you should be good to add/edit in Prismic now, but there are still old events that you won’t currently be able to see the body content for

17.07 Email reported that the was missing the location in the building - collection venues weren’t migrated (Collection venues weight/label needs to be added )

RK: Am I right in thinking if we'd been using the content API to front all prismic content we'd have avoided this migration issue? The pipeline would have broken but we'd have maintained user facing content. RC: Hmmm that's a good question. We still would have had to migrate the content in order for it to render in the CMS editing side?

[Images] RC: It looks like just the width is now getting passed and anything else is ignored, which explains the difference in height. I feel like it has to do with missing contentUrlSuffix. They should probably contain w=, h= and rect= for crops and they're just empty strings?

Fix for opening times page deployed content for draft event publishing this afternoon (JWM perspective tour and workshop) Fix for change expected URL for meta image

Add publishDate to articles graphQuery deployed; reindex run Monday 13 May [fixed issue with old article that had to be delisted] Override date bug already existed but wasn’t known about - that date wasn’t fetched by the content API

14.15
07.58
13.15
14.39
Opening times page
#10856
18.30
18.36
#10856
Migrated
#10859
#128
10.41
11:53
Timeline
Analysis of causes
Actions