# Observability

## Logging

In order that application issues are quickly found and diagnosed logs should be easily searchable & discoverable. We should accumulate logs in a single service, with a consistent format.

We should:

* Send all logs to the `logging.wellcomecollection.org` ELK (Elasticsearch, Logstash, Kibana) stack.
* Log in a consistent format to make the timestamp, originating service and environment easily searchable.
* Remove all CloudWatch logging.

## Tracing

In order that application issues are quickly found and diagnosed we should be able to follow work occurring across multiple services.

We should:

* Implement a tracing solution in our piplelines that allow us to visualise the flow of work through our services.

## Metrics

In order that application issues are quickly found and diagnosed we should be able to view metrics from our applications easily.

We should:

* Decide on appropriate metric collectors for each of our products.
* Decide on appropriate platforms for visualising application metrics across our products.

## Alerting

In order that we can quickly react to application issues we should be notified when issues requiring our attention arise.

We should:

* Be able to track out response to alerts and actions taken to resolve them.
* Decide on what constitutes a critical issue (i.e. one that requires immediate action) and provide a separate channel to deliver critical alerts.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.wellcomecollection.org/request-for-comments-rfcs/019-platform_reliability/observability.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
