Request For Comments (RFCs)
  • Request for comments (RFC)
  • RFC 001: Matcher architecture
  • RFC 002: Archival Storage Service
  • RFC 003: Asset Access
  • RFC 004: METS Adapter
  • RFC 005: Reporting Pipeline
  • RFC 006: Reindexer architecture
  • RFC 007: Goobi Upload
  • RFC 008: API Filtering
  • RFC 009: AWS account setup
  • RFC 010: Data model
  • RFC 011: Network Architecture
  • RFC 012: API Architecture
  • RFC 013: Release & Deployment tracking
    • Deployment example
    • Version 1
  • RFC 014: Born digital workflow
  • RFC 015: How we work
    • Code Reviews
    • Shared Libraries
  • RFC 016: Holdings service
  • URL Design
  • Pipeline Tracing
  • Platform Reliability
    • CI/CD
    • Observability
    • Reliability
  • RFC 020: Locations and requesting
  • RFC 021: Data science in the pipeline
  • RFC 022: Logging
    • Logging example
  • RFC 023: Images endpoint
  • RFC 024: Library management
  • RFC 025: Tagging our Terraform resources
  • RFC 026: Relevance reporting service
  • RFC 026: Relation Embedder
  • RFC 027: Pipeline Intermediate Storage
  • RFC 029: Work state modelling
  • Pipeline merging
  • RFC 031: Relation Batcher
  • RFC 032: Calm deletion watcher
  • RFC 033: Api internal model versioning
  • RFC 034: Modelling Locations in the Catalogue API
  • RFC 035: Modelling MARC 856 "web linking entry"
  • RFC 036: Modelling holdings records
  • API faceting principles & expectations
  • Matcher versioning
  • Requesting API design
  • TEI Adapter
  • Tracking changes to the Miro data
  • How do we tell users how to find stuff?
  • Removing deleted records from (re)indexes
  • RFC 044: Tracking Patron Deletions
  • Work relationships in Sierra, part 2
    • Work relationships in Sierra
  • Born Digital in IIIF
  • Transitive hierarchies in Sierra
  • RFC 047: Changing the structure of the Catalogue API index
  • RFC 048: Concepts work plan
  • RFC 049: Changing how aggregations are retrieved by the Catalogue API
  • RFC 050: Design considerations for the concepts API
  • RFC 051: Ingesting Library of Congress concepts
  • RFC: 052: The Concepts Pipeline - phase one
  • RFC 053: Logging in Lambdas
  • RFC 054: Authoritative ids with multiple Canonical ids.
  • RFC 055: Genres as Concepts
  • RFC 055: Content API
    • Content API: articles endpoint
    • Content API: Events endpoint
    • Content API: exhibitions endpoint
    • The future of this endpoint
  • RFC 056: Prismic to Elasticsearch ETL pipeline
  • RFC 57: Relevance testing
    • Examples of rank CLI usage
  • RFC 059: Splitting the catalogue pipeline Terraform
  • RFC 060: Service health-check principles
  • RFC 060: Offsite requesting
    • Sierra locations in the Catalogue API
  • Content-api: next steps
Powered by GitBook
On this page
  • Two possible approaches
  • A pragmatic path forward
  • Caveats and questions

RFC 053: Logging in Lambdas

PreviousRFC: 052: The Concepts Pipeline - phase oneNextRFC 054: Authoritative ids with multiple Canonical ids.

Last updated 10 months ago

In , we identified that our application logs - which were then stored in Cloudwatch - were costing us money, were hard to query, and were inconsistent. We proposed (and went on to implement) an architecture in which ECS services contained a logging sidecar container that used Fluent Bit to stream logs directly to an Elasticsearch cluster. This continues to serve us well.

However, we've now got a non-trivial number of production applications that are run as AWS Lambdas, rather than as Docker containers in ECS: namely, the and the . To have better visibility over these applications, as well as for consistency, we now want to get our Lambda logs into the logging cluster in the same format as we have our other application logs.

Two possible approaches

We want a flexible approach that doesn't require extra application configuration: it should be language-agnostic and it should capture stdout/stderr rather than providing an API for logs. We should be able to have the same schema for logs as we do for our ECS applications.

1. Streaming from Cloudwatch

This is the simplest approach: we would still use Cloudfront, but we wouldn't retain the logs and we would instead use a Lambda to stream them to Elasticsearch. This approach has been used successfully elsewhere: for example .

Furthermore, Elastic provide a CloudFormation template for a to do this job for us. If this proves insufficient, we can implement our own transformer/streamer quite easily.

We would need to provide a Terraform module (as we do with the existing ECS logging solution) to ensure that a subscription is created automagically for each application.

Optionally, we might choose to send the logs via a Kinesis stream which could be processed by a Lambda (rather than direct to a Lambda). While this introduces complexity and cost, it offers resilience in case of Elasticsearch downtime, Lambda issues or connection failures, and also makes quite straightforward. Lambda invocations would also operate on batches of log lines.

We would need to pay for:

  • CloudWatch ingest ($0.57/GB)

  • Lambda invocation (duration assumed trivial, $0.20/million log lines)

  • Kinesis (optional): ($0.04/hr + $0.12/GB)

  • Network egress (approximately constant)

architecture for streaming from cloudwatch

2. Lambda extensions

If we built our own Lambda extension, we would expose a Terraform module for provisioning Lambdas that included the extension and necessary config, IAM roles etc etc. We could write it in a language of our choice and use it with any native (ie non-Docker lambda).

We use containerised Lambdas in the concepts pipeline: we would either have to switch to using native Java runtimes, or provide our own base images which include the extension code.

There would be no additional cost associated with using a Lambda extension, other than the necessary network egress.

A pragmatic path forward

Option (1) does not preclude option (2) as further work: we would already be providing a Terraform module for consumers, and if the Elastic Serverless Forwarder proved insufficient, we would already have written a module to stream the correct format of logs to Elasticsearch.

This fact clarifies that option (1) - streaming of logs from CloudWatch - is the better starting point. If it proves insufficient or impractical, the work we will have done already will mostly still be useful for option (2).

The work plan might look something like this:

  • Provision an Elastic serverless forwarder

  • Create a log group subscription and verify that the forwarder works

  • Create a Terraform module for Lambdas with log groups that subscribe to the forwarder

  • Roll this out across our Lambdas - if we don't use a cross-account Kinesis stream, we'll want to have a log-streaming Lambda per account, like we have with Slack alerts.

Caveats and questions

  • We've seen CloudWatch logs for Lambdas go missing before - is this a risk? We believe this might be due to (mis)configuration of log retention periods and/or IAM permissions, but we're not sure. Using a consistent Terraform module across our Lambdas will hopefully resolve these issues.

  • Should we use Kinesis or not? Some reasons for using it are given above: I think it would be a good idea.

  • What index should Lambda logs go into? The ECS logs go into firelens-<yyyy.mm.dd> indices - we probably shouldn't do the same for these, as they aren't using firelens and may have a different schema. On the other hand, we would like all of our logs to fall under the same index pattern.

allow code to be integrated into the Lambda execution environment, hooking into the Lambda lifecycle and allowing access to logs and other telemetry. Streaming logs elsewhere is exactly their intended use case - there are already for doing this (unfortunately, not for Elasticsearch).

It could potentially use fluent bit for streaming the logs, instead of implementing that manually, although it looks like have struggled with losing logs for short-lived invocations (probably due to buffering, and possibly resolveable with use of Lambda lifecycle hooks).

There is a caveat with this option - it doesn't work transparently for containerised (ie non-native) Lambdas. Instead, the packaged extension has to be .

Native lambda extensions architecture
Lambda extensions
3rd party extensions
others
added to the container image
RFC 022
Identity APIs
concepts pipeline services
at the BBC
"serverless forwarder"
cross-account streaming