Request For Comments (RFCs)
  • Request for comments (RFC)
  • RFC 001: Matcher architecture
  • RFC 002: Archival Storage Service
  • RFC 003: Asset Access
  • RFC 004: METS Adapter
  • RFC 005: Reporting Pipeline
  • RFC 006: Reindexer architecture
  • RFC 007: Goobi Upload
  • RFC 008: API Filtering
  • RFC 009: AWS account setup
  • RFC 010: Data model
  • RFC 011: Network Architecture
  • RFC 012: API Architecture
  • RFC 013: Release & Deployment tracking
    • Deployment example
    • Version 1
  • RFC 014: Born digital workflow
  • RFC 015: How we work
    • Code Reviews
    • Shared Libraries
  • RFC 016: Holdings service
  • RFC 017: URL Design
  • RFC 018: Pipeline Tracing
  • RFC 019: Platform Reliability
    • CI/CD
    • Observability
    • Reliability
  • RFC 020: Locations and requesting
  • RFC 021: Data science in the pipeline
  • RFC 022: Logging
    • Logging example
  • RFC 023: Images endpoint
  • RFC 024: Library management
  • RFC 025: Tagging our Terraform resources
  • RFC 026: Relevance reporting service
  • RFC 026: Relation Embedder
  • RFC 027: Pipeline Intermediate Storage
  • RFC 029: Work state modelling
  • RFC 030: Pipeline merging
  • RFC 031: Relation Batcher
  • RFC 032: Calm deletion watcher
  • RFC 033: Api internal model versioning
  • RFC 034: Modelling Locations in the Catalogue API
  • RFC 035: Modelling MARC 856 "web linking entry"
  • RFC 036: Modelling holdings records
  • RFC 037: API faceting principles & expectations
  • RFC 038: Matcher versioning
  • RFC 039: Requesting API design
  • RFC 040: TEI Adapter
  • RFC 041: Tracking changes to the Miro data
  • RFC 042: Requesting model
  • RFC 043: Removing deleted records from (re)indexes
  • RFC 044: Tracking Patron Deletions
  • RFC 045: Work relationships in Sierra, part 2
    • Work relationships in Sierra
  • RFC 046: Born Digital in IIIF
  • RFC 047: Changing the structure of the Catalogue API index
  • RFC 048: Concepts work plan
  • RFC 049: Changing how aggregations are retrieved by the Catalogue API
  • RFC 050: Design considerations for the concepts API
  • 051-concepts-adapters
  • RFC 052: The Concepts Pipeline - phase one
  • RFC 053: Logging in Lambdas
  • RFC 054: Authoritative ids with multiple Canonical ids.
  • RFC 055: Genres as Concepts
  • RFC 056: Prismic to Elasticsearch ETL pipeline
  • RFC 058: Relevance testing
    • Examples of rank CLI usage
  • RFC 059: Splitting the catalogue pipeline Terraform
  • RFC 060: Service health-check principles
  • RFC 061: Content API next steps
  • RFC 062: Content API: All search and indexing of addressable content types
  • RFC 062: Wellcome Collection Graph overview and next steps
  • RFC 063: Catalogue Pipeline services from ECS to Lambda
  • RFC 064: Graph data model
  • RFC 065: Library Data Link Explorer
  • RFC 066: Catalogue Graph pipeline
  • RFC 067: Prismic API ID casing
  • RFC 068: Exhibitions in Content API
  • RFC 069: Catalogue Graph Ingestor
  • RFC 070: Concepts API changes
  • RFC 071: Python Building and Deployment
    • The current state
  • RFC 072: Transitive Sierra hierarchies
  • RFC 073: Content API
    • Content API: articles endpoint
    • Content API: Events endpoint
    • Content API: exhibitions endpoint
    • The future of this endpoint
  • RFC 074: Offsite requesting
    • Sierra locations in the Catalogue API
Powered by GitBook
On this page
  • The projects
  • Differences
  1. RFC 071: Python Building and Deployment

The current state

PreviousRFC 071: Python Building and DeploymentNextRFC 072: Transitive Sierra hierarchies

Last updated 2 days ago

The projects

There are 23 non-archived repositories containing Python code. One is this documentation repository, where an example python file is included in an RFC.

Of these, nine contain Python only in the form of executed as part of a Github Action as part of preparing a release:

  • https://github.com/wellcomecollection/terraform-aws-ecs-service

  • https://github.com/wellcomecollection/terraform-aws-lambda

  • https://github.com/wellcomecollection/terraform-aws-sns-topic

  • https://github.com/wellcomecollection/terraform-aws-vhs

  • https://github.com/wellcomecollection/terraform-aws-gha-role

  • https://github.com/wellcomecollection/terraform-aws-acm-certificate

  • https://github.com/wellcomecollection/terraform-aws-api-gateway-responses

  • https://github.com/wellcomecollection/terraform-aws-sqs

  • https://github.com/wellcomecollection/terraform-aws-secrets

These should be harmonised by removing the duplication.

Four are only currently updated by a Digirati

  • https://github.com/wellcomecollection/iiif-builder

  • https://github.com/wellcomecollection/iiif-builder-infrastructure

  • https://github.com/wellcomecollection/londons-pulse

  • https://github.com/wellcomecollection/pronom-format-map

One is soon to be redundant, we do not typically make changes to the Python in it.

  • https://github.com/wellcomecollection/archivematica-infrastructure

That leaves eight repositories to consider

  • https://github.com/wellcomecollection/catalogue-pipeline

  • https://github.com/wellcomecollection/storage-service

  • https://github.com/wellcomecollection/platform-infrastructure

  • https://github.com/wellcomecollection/cost_reporter

  • https://github.com/wellcomecollection/rank

  • https://github.com/wellcomecollection/editorial-photography-ingest

  • https://github.com/wellcomecollection/catalogue-api

  • https://github.com/wellcomecollection/sierra_api

Differences

No build process

Dependency Management

One problem with using requirements files is that there is no consistent way to define and name the separate requirements used in production vs. development. It comes down to ad-hoc filenames likedev_requirements.txt or requirements.test.txt.

Testing

All projects that have tests currently use pytest. There are no differences to consider.

tox

This is unnecessary as we only run the tests on a single environment configuration

Linting

Formatting

Linting and Formatting Rules

Different projects specify their own rules for formatting and ignoring lint warnings. These should be harmonised where possible.

Buildkite vs Github Actions

Type Checking

Only the Catalogue Graph uses strict type checking.

In these, the catalogue pipeline repository contains a variety of Python projects, including , one of the largest, most modern and comprehensively built; and the much simpler and older projects.

has no build process or tests, runs on Python 3.9 in Lambda

is "meant for quick experiments and exploration", and does not have any build process or deployment

The only Python files in are some maintenance scripts. No build process, no tests.

Most of the Python files in are tests themselves. However, there are also two Lambdas used for alerting. No build process, no tests for the Python itself. This repository also defines a Docker image designed to harmonise our use of flake8.

uses Most others use requirements files, frozen using pip-compile.

Poetry solves this, as does the more modern .

Python tests in the are initiated with Tox.

The Catalogue Graph within the uses Ruff Other applications use Flake8 (e.g. )

The Catalogue Graph within the uses Ruff

Black and both use Black

Python tests in the Storage Service and run on Buildkite, apart from the catalogue graph steps which run using Github Actions

a script
catalogue-graph
inferrer
Cost reporter
Sierra_api
catalogue-api
platform-infrastructure
Rank
Poetry
UV
Storage Service
catalogue pipeline
editorial photography ingest
catalogue pipeline
editorial photography ingest
storage service
catalogue pipeline