APM
Some guidance for what you can do with the catalogue's APM (Application Performance Monitoring).
Some things you can track with APM:
Errors (with nice stack traces). You can see an incident (elevated 500s rate) that we had here. There's an alerting tool (watcher) which can take a Slack web hook - we should probably use this!
JVM stats - good for discovering memory leaks and also fun for garbage collection enthusiasts. An example.
The meat of APM is transaction monitoring: for us, that's monitoring the performance of endpoints. For example, for /works
we can see quite a lot of data.
The average duration of a request is about 50ms
The 99th percentile is usually around 150ms, but is quite noisy
We have some outlier requests where it looks like the API stalls, and/or the network connection to elastic was very slow - might be worth looking into these!
All APM data is stored in Elastic and we can do our own analyses - here's a dashboard for comparing the performance of aggregations and "normal" queries.
Last updated