RFC 059: Splitting the catalogue pipeline Terraform

This RFC proposes a change to how we manage the Terraform for instances of the catalogue pipeline.

Last modified: 2023-07-03T10:39:33+01:00

Background

We run multiple instances of the catalogue pipeline. We always have at least one pipeline which is populating the currently-live search index, but we may have more than one pipeline running at a time. Running multiple pipelines means we can try experiments or breaking changes in a new pipeline, and their data is isolated from the live search index (and the public API).

However, we manage every pipeline in a single Terraform configuration, and this breaks some of the isolation. If you're trying to change one pipeline, you can inadvertently change another pipeline -- and this has caused a broken API on several occasions.

We'd like to extend the isolation to Terraform, and keep each pipeline in a different configuration. That way, you can only modify one pipeline at a time, and you know exactly what you're changing.

This RFC describes what that looks like in slightly more detail.

Desirable properties

It should be easy to find the pipeline you want to work in.
It should be easy to create new pipelines.
Each pipeline should have its own Terraform state file and label. This should be specified in as few places as possible, to reduce the risk of inconsistency, e.g. a pipeline whose label is 2001-01-01 but the state file is named 2002-02-02.
This should make sense to somebody who's already an experienced Terraform user – commands like plan, apply, import should all work as they expect (modulo our run_terraform.sh wrapper). You shouldn't need to learn a new tool to work with the pipeline Terraform.

The proposal

We create per-pipeline folders in the terraform directory, something like:

.
└─ terraform/
    ├─ 2001-01-01/
    │   ├─ main.tf
    │   └─ run_terraform.sh
    ├─ 2002-02-02/
    │   ├─ main.tf
    │   └─ run_terraform.sh
    └─ modules/
        └─ ...

Each per-pipeline folder contains:

A single Terraform file that refers to a shared pipeline module in the modules directory
A run_terraform.sh wrapper which injects the Elastic Cloud secrets, and the name of the current pipeline as a Terraform variable. This gets read from the folder name, so it should be possible to use the same copy of main.tf across different pipelines.

What this should look like for developers:

For a pre-existing pipeline, you cd into the folder and run run_terraform.sh as usual.
If you want to create a new pipeline, copy an existing folder and rename it. When you cd into that folder, you run run_terraform.sh init to start the new pipeline and then plan/apply as usual.

Implementation details

Partial configuration. We can use partial configuration for Terraform backends to create a different state file for each pipeline. The run_terraform.sh script can read the name of the current folder, and pass the pipeline date in as a variable.
Auto-deleting old copies of the .terraform folder. When you copy an existing pipeline folder, you'll get the .terraform folder.
We'll want to delete this folder in the run_terraform.sh script if it points to the wrong state file, to avoid Terraform trying to be "helpful" and inadvertently breaking a different pipeline. This is best understood with a worked example:
I create a pipeline 2001-01-01, then I copy the folder to create 2002-02-02.
When I run run_terraform.sh init in the 2002-02-02 folder, it notices that the state in .terraform points at s3://…/2001-01-01.tfstate, but now I'm trying to keep state in s3://…/2002-02-02.tfstate.
It offers to migrate the state to the new location; if I say yes, it will copy the state and then try to destroy the resources labelled 2001-01-01 on the next plan/apply.
It would be better if our script knew that this was a different pipeline, and that we can't reuse the .terraform folder.

The per-pipeline state files should be in a dedicated prefix, so they can be easily identified, e.g.

s3://…/catalogue-pipeline/pipeline/2001-01-01.tfstate
s3://…/catalogue-pipeline/pipeline/2002-02-02.tfstate
s3://…/catalogue-pipeline/pipeline/2003-03-03.tfstate

Rejected approaches

Creating an entirely custom tool. We don't want developers to have to learn a new tool just to work on the catalogue pipeline Terraform -- it should be possible to create a familiar workflow with run_terraform.sh that's easier for everyone to learn.
Terraform features: using workspaces, running a script with -chdir. These are both Terraform features that might enable the sort of process we want, but we don't use them anywhere in the platform so they're not a familiar workflow for us. Using separate folders should be easier.

PreviousExamples of rank CLI usage NextRFC 060: Service health-check principles

Last updated 1 month ago