Creating canonical identifiers
Our source catalogues all use their own identifier schemes, which may have differently formatted or overlapping values.
For example, the identifier b14561980
could refer to:
a bib record for a book in Sierra, or
the METS file for a digitised copy of that book
To help us to refer to entities consistently and unambiguously, the catalogue pipeline creates its own "canonical" identifiers. These are distinct from identifiers in the source catalogue, and used whenever we need to refer to specific records, e.g. in URLs.
The format of canonical IDs
Canonical IDs are chosen for the following properties:
They should be short, ideally something that fits on a single post-it note
They should be unambiguous, so they don't use characters which can look ambiguous (e.g. letter
O
and numeral0
)They should be URL safe
How we create canonical IDs from source identifiers
There's a one-to-one relationship between canonical IDs and source identifiers.
A source identifier is made of three parts:
The source system of the identifier, e.g.
miro-image-number
orsierra-system-number
. This helps us distinguish between overlapping identifiers from different systems, like theb14561980
example above.The value of the identifier, e.g.
b14561980
orV0000287
.The ontology type of the identifier. This helps us create different canonical IDs for different contexts. e.g. Miro identifiers are used for both a Work and an Image, and we want the Work and the Image to have different canonical IDs. The Miro source identifiers will have ontology types
Work
andImage
, respectively, so they get distinct canonical IDs.
Last updated