APM
Some guidance for what you can do with the catalogue's APM (Application Performance Monitoring).
Last updated
Some guidance for what you can do with the catalogue's APM (Application Performance Monitoring).
Last updated
Some things you can track with APM:
Errors (with nice stack traces). You can see an incident (elevated 500s rate) that we had . There's an alerting tool (watcher) which can take a Slack web hook - we should probably use this!
JVM stats - good for discovering memory leaks and also fun for garbage collection enthusiasts. .
The meat of APM is transaction monitoring: for us, that's monitoring the performance of endpoints. For example, for /works
we can see
The average duration of a request is about 50ms
The 99th percentile is usually around 150ms, but is quite noisy
We have some outlier requests where it looks like the API stalls, and/or the network connection to elastic was very slow - might be worth looking into these!
All APM data is stored in Elastic and we can do our own analyses - for comparing the performance of aggregations and "normal" queries.