The current state
Last updated
Last updated
There are 23 non-archived repositories containing Python code. One is this documentation repository, where an example python file is included in an RFC.
Of these, nine contain Python only in the form of executed as part of a Github Action as part of preparing a release:
https://github.com/wellcomecollection/terraform-aws-ecs-service
https://github.com/wellcomecollection/terraform-aws-lambda
https://github.com/wellcomecollection/terraform-aws-sns-topic
https://github.com/wellcomecollection/terraform-aws-vhs
https://github.com/wellcomecollection/terraform-aws-gha-role
https://github.com/wellcomecollection/terraform-aws-acm-certificate
https://github.com/wellcomecollection/terraform-aws-api-gateway-responses
https://github.com/wellcomecollection/terraform-aws-sqs
https://github.com/wellcomecollection/terraform-aws-secrets
These should be harmonised by removing the duplication.
Four are only currently updated by a Digirati
https://github.com/wellcomecollection/iiif-builder
https://github.com/wellcomecollection/iiif-builder-infrastructure
https://github.com/wellcomecollection/londons-pulse
https://github.com/wellcomecollection/pronom-format-map
One is soon to be redundant, we do not typically make changes to the Python in it.
https://github.com/wellcomecollection/archivematica-infrastructure
That leaves eight repositories to consider
https://github.com/wellcomecollection/catalogue-pipeline
https://github.com/wellcomecollection/storage-service
https://github.com/wellcomecollection/platform-infrastructure
https://github.com/wellcomecollection/cost_reporter
https://github.com/wellcomecollection/rank
https://github.com/wellcomecollection/editorial-photography-ingest
https://github.com/wellcomecollection/catalogue-api
https://github.com/wellcomecollection/sierra_api
One problem with using requirements files is that there is no consistent way to define and name the
separate requirements used in production vs. development. It comes down to ad-hoc filenames likedev_requirements.txt
or requirements.test.txt
.
All projects that have tests currently use pytest. There are no differences to consider.
This is unnecessary as we only run the tests on a single environment configuration
Different projects specify their own rules for formatting and ignoring lint warnings. These should be harmonised where possible.
Only the Catalogue Graph uses strict type checking.
In these, the catalogue pipeline repository contains a variety of Python projects, including , one of the largest, most modern and comprehensively built; and the much simpler and older projects.
has no build process or tests, runs on Python 3.9 in Lambda
is "meant for quick experiments and exploration", and does not have any build process or deployment
The only Python files in are some maintenance scripts. No build process, no tests.
Most of the Python files in are tests themselves. However, there are also two Lambdas used for alerting. No build process, no tests for the Python itself. This repository also defines a Docker image designed to harmonise our use of flake8.
uses Most others use requirements files, frozen using pip-compile.
Poetry solves this, as does the more modern .
Python tests in the are initiated with Tox.
The Catalogue Graph within the uses Ruff Other applications use Flake8 (e.g. )
The Catalogue Graph within the uses Ruff
Black and both use Black
Python tests in the Storage Service and run on Buildkite, apart from the catalogue graph steps which run using Github Actions