RFC 071: Python Building and Deployment
Last updated
Last updated
Building and deploying Python projects
Last modified: 2025-03-13T18:14:09+00:00
Over time, we have created a number of different Python projects. Some as the main or only application within a repository, others as part of a larger whole alongside Scala applications.
Each time, we have written a new build and deployment process for them. For the most part, the process a project is born with has stuck, despite us finding better tools or methods when working on later projects. As such, we have ended up with multiple diverse mechanisms to check, build and deploy them. Each with different standards and rules.
This RFC proposes to harmonise those projects. The projects are listed in the , alongside some description of their differences.
The differences between the projects give rise to various challenges
keeping things up to date - Python itself, dependencies, common settings
different standards - type checking, introducing previously ignored warnings if we harmonise linting rules
sharing code in different projects within the same repository
ensuring all required code and dependencies get deployed
running an equivalent environment locally on CI, and in production (PYTHONPATH, installed dependencies, Python versions)
different ways to specify production vs development dependencies
Exactly how the applications are run (Lambda, ECS etc.) is a decision to be made at the per-application level. This RFC does make recommendations on how to ensure that a Python project and all its dependencies are included, but whether the end result is a container or a zip is up to the target.
The most pertinent advantages of UV over other package/dependency managers are the way it simplifies:
Treating shared code in the same repository as though it is an installable library
Keeping the Python version up to date, and sharing that with other developers
Each project is configured as much as possible in its pyproject.toml
, including defining
paths to shared modules elsewhere in the repository using pythonpath.
In a UV-based project, a .python-version file defines the python version in use. This should be read and reused where possible, rather than relying on developers to keep the version number in harmony everywhere in the project.
This also allows docker images and build scripts to read the version, meaning that we can use common scripts to achieve these tasks.
e.g. - to package up all the requirements of a project into a zip:
or to create a Docker image using the correct Python version.
Although each project will have its own pyproject.toml files, there can also be some commonality.
Three Actions are to be defined in the .github repository:
A "Python Check" Github Action in the .github repository
This will:
Be parameterised with
path to base of python project
whether to care about types
install requirements with UV
run ruff
run pytest
run mypy if relevant
Two Python Build actions
one to build and publish Docker Images to ECR
These will also be parameterised similarly to Python Check
Migrating the pipeline inferrers should exercise most of the challenges outlined above, which will demonstrate the value of the approach and simplify any subsequent migrations.
Initially, the shared github actions can be defined in the catalogue pipeline project. There is sufficient variety within the catalogue pipeline (inferrers, catalogue graph) to demonstrate that this approach can be applied across all repositories.
Following this, there is no urgent need to migrate all existing Python projects in this way, as many of them are rarely updated. However, once the framework has been implemented, it should be relatively simple to do so.
Entirely new Python projects are not common, so the work involved in creating and maintaining a project template is likely to outweigh any advantage it might bring.
Because of the way configuration resolution works in the various build tools (e.g. ) We do need to define a certain amount of configuration in each project, rather than sharing it across projects, and that could become out of date.
provides the both linting and formatting together, it has already been used successfully in the Concepts Pipeline.
is currently in use in all projects, and there is no need to consider changing that.
.
Github Actions (This has been discussed elsewhere, and is of new projects)
Although harmonising how applications are to be run is out of scope, there are two methods that we currently use. Docker images running on ECS and Zipped packages running on AWS Lambda. Packages for running in these environments can be built according to the UV and example code showing their best practice when building and . This is to be harmonised via some common github actions (see below)
one to zip a project for use in AWS Lambda (replicating the behaviour in a common fashion)
We should gradually migrate existing projects to the new common approach, starting with the.
The next priority project to migrate would be .