Incident retro - requests not showing in account

Incident from: 2022-05-13

Incident until: 2022-05-13

Retro held: 2022-05-13

Timeline

13 May 2022

See https://wellcome.slack.com/archives/C01FBFSDLUA/p1652434124540789

01.33 Viewing, making requests, seeing if something is requestable not available

08.13 AC notices errors in the identity (frontend) app

10.24 AC notices 500s from identity API (can’t view item requests)

10.28 JP: can’t view item requests on prod, nor make a request

10.31 JP checked the API, the authorizer. Neither suspicious

10.37 SSL is the root cause. Attempting to fix by creating a new certificate in the console.

10.47 Cert validation record does exist but not liked by AWS

10.51 AWS Certificate Manager cert validation most likely the underlying cause. JP created the record set.

10.52 “One or more domain names has failed validation due to a certificate authority authentication (CAA) error. Learn more.”

10.52 AC: At some point AWS “forgets” your validation records and stops renewing certs

10.59 JP Fixed. Confirmed by AC/NP

Analysis of causes

SSL certificate was out of date

SSL certificate wasn’t automatically renewed

Also to be looked at: noisy alerts channel

Actions

Jamie

  • Handle identity API proxy errors which don’t have a response #7970 - DONE

Alex

  • Add Cloudwatch log URL to alerts to take you to the right account with text added to help with debugging

  • Turn on CloudFront logging (with filtering)

Last updated