RFC 071: Python Building and Deployment

Building and deploying Python projects

Last modified: 2025-03-13T18:14:09+00:00

Background

Over time, we have created a number of different Python projects. Some as the main or only application within a repository, others as part of a larger whole alongside Scala applications.

Each time, we have written a new build and deployment process for them. For the most part, the process a project is born with has stuck, despite us finding better tools or methods when working on later projects. As such, we have ended up with multiple diverse mechanisms to check, build and deploy them. Each with different standards and rules.

This RFC proposes to harmonise those projects. The projects are listed in the Appendix, alongside some description of their differences.

Problems

The differences between the projects give rise to various challenges

  • keeping things up to date - Python itself, dependencies, common settings

  • different standards - type checking, introducing previously ignored warnings if we harmonise linting rules

  • sharing code in different projects within the same repository

  • ensuring all required code and dependencies get deployed

  • running an equivalent environment locally on CI, and in production (PYTHONPATH, installed dependencies, Python versions)

  • different ways to specify production vs development dependencies

Out of Scope

Exactly how the applications are run (Lambda, ECS etc.) is a decision to be made at the per-application level. This RFC does make recommendations on how to ensure that a Python project and all its dependencies are included, but whether the end result is a container or a zip is up to the target.

What Can and Cannot Be DRY

Because of the way configuration resolution works in the various build tools (e.g. for Ruff) We do need to define a certain amount of configuration in each project, rather than sharing it across projects, and that could become out of date.

Linting/Formatting

Ruff provides the both linting and formatting together, it has already been used successfully in the Concepts Pipeline.

Testing

pytest is currently in use in all projects, and there is no need to consider changing that.

Dependency Management

UV.

The most pertinent advantages of UV over other package/dependency managers are the way it simplifies:

  1. Treating shared code in the same repository as though it is an installable library

  2. Keeping the Python version up to date, and sharing that with other developers

CI

Github Actions (This has been discussed elsewhere, and is already an assumed requirement of new projects)

Build and deployment

Although harmonising how applications are to be run is out of scope, there are two methods that we currently use. Docker images running on ECS and Zipped packages running on AWS Lambda. Packages for running in these environments can be built according to the UV documentation and example code showing their best practice when buildingDocker images and Lambda Zips. This is to be harmonised via some common github actions (see below)

Specific usage

Individual Configuration

Each project is configured as much as possible in its pyproject.toml, including defining paths to shared modules elsewhere in the repository using pythonpath.

In a UV-based project, a .python-version file defines the python version in use. This should be read and reused where possible, rather than relying on developers to keep the version number in harmony everywhere in the project.

This also allows docker images and build scripts to read the version, meaning that we can use common scripts to achieve these tasks.

e.g. - to package up all the requirements of a project into a zip:

PYTHON_VERSION=$(cat .python-version)
uv export --frozen --no-dev --no-editable -o requirements.txt
uv pip install \
   --no-installer-metadata \
   --no-compile-bytecode \
   --python-platform x86_64-manylinux2014 \
   --python "$PYTHON_VERSION" \
   --target packages \
   -r requirements.txt

cd packages && zip -r ../package.zip .

or to create a Docker image using the correct Python version.

docker build . --build-arg pythonversion=$(cat .python-version) --progress=plain
ARG pythonversion=2.7
FROM python:$pythonversion
ARG pythonversion
RUN echo python version = $pythonversion

Common Configuration

Although each project will have its own pyproject.toml files, there can also be some commonality.

Common Github Actions

Three Actions are to be defined in the .github repository:

Check

A "Python Check" Github Action in the .github repository

This will:

  • Be parameterised with

    • path to base of python project

    • whether to care about types

  • install requirements with UV

  • run ruff

  • run pytest

  • run mypy if relevant

Build

Two Python Build actions

  • one to build and publish Docker Images to ECR

  • one to zip a project for use in AWS Lambda (replicating the behaviour here in a common fashion)

These will also be parameterised similarly to Python Check

Getting there from here

Converting existing projects

We should gradually migrate existing projects to the new common approach, starting with theCatalogue Pipeline inferrers.

Migrating the pipeline inferrers should exercise most of the challenges outlined above, which will demonstrate the value of the approach and simplify any subsequent migrations.

Initially, the shared github actions can be defined in the catalogue pipeline project. There is sufficient variety within the catalogue pipeline (inferrers, catalogue graph) to demonstrate that this approach can be applied across all repositories.

The next priority project to migrate would be editorial photography ingest.

Following this, there is no urgent need to migrate all existing Python projects in this way, as many of them are rarely updated. However, once the framework has been implemented, it should be relatively simple to do so.

Project Template

Entirely new Python projects are not common, so the work involved in creating and maintaining a project template is likely to outweigh any advantage it might bring.

Last updated