Request For Comments (RFCs)
  • Request for comments (RFC)
  • RFC 001: Matcher architecture
  • RFC 002: Archival Storage Service
  • RFC 003: Asset Access
  • RFC 004: METS Adapter
  • RFC 005: Reporting Pipeline
  • RFC 006: Reindexer architecture
  • RFC 007: Goobi Upload
  • RFC 008: API Filtering
  • RFC 009: AWS account setup
  • RFC 010: Data model
  • RFC 011: Network Architecture
  • RFC 012: API Architecture
  • RFC 013: Release & Deployment tracking
    • Deployment example
    • Version 1
  • RFC 014: Born digital workflow
  • RFC 015: How we work
    • Code Reviews
    • Shared Libraries
  • RFC 016: Holdings service
  • RFC 017: URL Design
  • RFC 018: Pipeline Tracing
  • RFC 019: Platform Reliability
    • CI/CD
    • Observability
    • Reliability
  • RFC 020: Locations and requesting
  • RFC 021: Data science in the pipeline
  • RFC 022: Logging
    • Logging example
  • RFC 023: Images endpoint
  • RFC 024: Library management
  • RFC 025: Tagging our Terraform resources
  • RFC 026: Relevance reporting service
  • RFC 026: Relation Embedder
  • RFC 027: Pipeline Intermediate Storage
  • RFC 029: Work state modelling
  • RFC 030: Pipeline merging
  • RFC 031: Relation Batcher
  • RFC 032: Calm deletion watcher
  • RFC 033: Api internal model versioning
  • RFC 034: Modelling Locations in the Catalogue API
  • RFC 035: Modelling MARC 856 "web linking entry"
  • RFC 036: Modelling holdings records
  • RFC 037: API faceting principles & expectations
  • RFC 038: Matcher versioning
  • RFC 039: Requesting API design
  • RFC 040: TEI Adapter
  • RFC 041: Tracking changes to the Miro data
  • RFC 042: Requesting model
  • RFC 043: Removing deleted records from (re)indexes
  • RFC 044: Tracking Patron Deletions
  • RFC 045: Work relationships in Sierra, part 2
    • Work relationships in Sierra
  • RFC 046: Born Digital in IIIF
  • RFC 047: Changing the structure of the Catalogue API index
  • RFC 048: Concepts work plan
  • RFC 049: Changing how aggregations are retrieved by the Catalogue API
  • RFC 050: Design considerations for the concepts API
  • 051-concepts-adapters
  • RFC 052: The Concepts Pipeline - phase one
  • RFC 053: Logging in Lambdas
  • RFC 054: Authoritative ids with multiple Canonical ids.
  • RFC 055: Genres as Concepts
  • RFC 056: Prismic to Elasticsearch ETL pipeline
  • RFC 058: Relevance testing
    • Examples of rank CLI usage
  • RFC 059: Splitting the catalogue pipeline Terraform
  • RFC 060: Service health-check principles
  • RFC 061: Content API next steps
  • RFC 062: Content API: All search and indexing of addressable content types
  • RFC 062: Wellcome Collection Graph overview and next steps
  • RFC 063: Catalogue Pipeline services from ECS to Lambda
  • RFC 064: Graph data model
  • RFC 065: Library Data Link Explorer
  • RFC 066: Catalogue Graph pipeline
  • RFC 067: Prismic API ID casing
  • RFC 068: Exhibitions in Content API
  • RFC 069: Catalogue Graph Ingestor
  • RFC 070: Concepts API changes
  • RFC 071: Python Building and Deployment
    • The current state
  • RFC 072: Transitive Sierra hierarchies
  • RFC 073: Content API
    • Content API: articles endpoint
    • Content API: Events endpoint
    • Content API: exhibitions endpoint
    • The future of this endpoint
  • RFC 074: Offsite requesting
    • Sierra locations in the Catalogue API
Powered by GitBook
On this page
  • Background
  • Problems
  • Out of Scope
  • What Can and Cannot Be DRY
  • Recommended Toolset
  • Specific usage
  • Common Github Actions
  • Getting there from here
  • Converting existing projects
  • Project Template

RFC 071: Python Building and Deployment

PreviousRFC 070: Concepts API changesNextThe current state

Last updated 3 days ago

Building and deploying Python projects

Last modified: 2025-03-13T18:14:09+00:00

Background

Over time, we have created a number of different Python projects. Some as the main or only application within a repository, others as part of a larger whole alongside Scala applications.

Each time, we have written a new build and deployment process for them. For the most part, the process a project is born with has stuck, despite us finding better tools or methods when working on later projects. As such, we have ended up with multiple diverse mechanisms to check, build and deploy them. Each with different standards and rules.

This RFC proposes to harmonise those projects. The projects are listed in the , alongside some description of their differences.

Problems

The differences between the projects give rise to various challenges

  • keeping things up to date - Python itself, dependencies, common settings

  • different standards - type checking, introducing previously ignored warnings if we harmonise linting rules

  • sharing code in different projects within the same repository

  • ensuring all required code and dependencies get deployed

  • running an equivalent environment locally on CI, and in production (PYTHONPATH, installed dependencies, Python versions)

  • different ways to specify production vs development dependencies

Out of Scope

Exactly how the applications are run (Lambda, ECS etc.) is a decision to be made at the per-application level. This RFC does make recommendations on how to ensure that a Python project and all its dependencies are included, but whether the end result is a container or a zip is up to the target.

What Can and Cannot Be DRY

Recommended Toolset

Linting/Formatting

Testing

Dependency Management

The most pertinent advantages of UV over other package/dependency managers are the way it simplifies:

  1. Treating shared code in the same repository as though it is an installable library

  2. Keeping the Python version up to date, and sharing that with other developers

CI

Build and deployment

Specific usage

Individual Configuration

Each project is configured as much as possible in its pyproject.toml, including defining paths to shared modules elsewhere in the repository using pythonpath.

In a UV-based project, a .python-version file defines the python version in use. This should be read and reused where possible, rather than relying on developers to keep the version number in harmony everywhere in the project.

This also allows docker images and build scripts to read the version, meaning that we can use common scripts to achieve these tasks.

e.g. - to package up all the requirements of a project into a zip:

PYTHON_VERSION=$(cat .python-version)
uv export --frozen --no-dev --no-editable -o requirements.txt
uv pip install \
   --no-installer-metadata \
   --no-compile-bytecode \
   --python-platform x86_64-manylinux2014 \
   --python "$PYTHON_VERSION" \
   --target packages \
   -r requirements.txt

cd packages && zip -r ../package.zip .

or to create a Docker image using the correct Python version.

docker build . --build-arg pythonversion=$(cat .python-version) --progress=plain
ARG pythonversion=2.7
FROM python:$pythonversion
ARG pythonversion
RUN echo python version = $pythonversion

Common Configuration

Although each project will have its own pyproject.toml files, there can also be some commonality.

Common Github Actions

Three Actions are to be defined in the .github repository:

Check

A "Python Check" Github Action in the .github repository

This will:

  • Be parameterised with

    • path to base of python project

    • whether to care about types

  • install requirements with UV

  • run ruff

  • run pytest

  • run mypy if relevant

Build

Two Python Build actions

  • one to build and publish Docker Images to ECR

These will also be parameterised similarly to Python Check

Getting there from here

Converting existing projects

Migrating the pipeline inferrers should exercise most of the challenges outlined above, which will demonstrate the value of the approach and simplify any subsequent migrations.

Initially, the shared github actions can be defined in the catalogue pipeline project. There is sufficient variety within the catalogue pipeline (inferrers, catalogue graph) to demonstrate that this approach can be applied across all repositories.

Following this, there is no urgent need to migrate all existing Python projects in this way, as many of them are rarely updated. However, once the framework has been implemented, it should be relatively simple to do so.

Project Template

Entirely new Python projects are not common, so the work involved in creating and maintaining a project template is likely to outweigh any advantage it might bring.

Because of the way configuration resolution works in the various build tools (e.g. ) We do need to define a certain amount of configuration in each project, rather than sharing it across projects, and that could become out of date.

provides the both linting and formatting together, it has already been used successfully in the Concepts Pipeline.

is currently in use in all projects, and there is no need to consider changing that.

.

Github Actions (This has been discussed elsewhere, and is of new projects)

Although harmonising how applications are to be run is out of scope, there are two methods that we currently use. Docker images running on ECS and Zipped packages running on AWS Lambda. Packages for running in these environments can be built according to the UV and example code showing their best practice when building and . This is to be harmonised via some common github actions (see below)

one to zip a project for use in AWS Lambda (replicating the behaviour in a common fashion)

We should gradually migrate existing projects to the new common approach, starting with the.

The next priority project to migrate would be .

Appendix
for Ruff
Ruff
pytest
UV
documentation
Docker images
Lambda Zips
Read the .python-version file
here
Catalogue Pipeline inferrers
editorial photography ingest
already an assumed requirement