# RFC 081: Identifiers in iiif-builder: beyond the B number

IIIF-Builder (aka DDS) understands various identifier forms (BNumbers, CALM Reference Numbers and Work IDs), and makes processing decisions based on the form of the identifier. For example, if asked to process a b number, it knows the item *must* have been processed by Goobi, and it *must* be in the `digitised` storage service space. These *musts* will soon no longer be true, and soon there will not even be b numbers.

**Last modified:** 2025-08-15T17:00+00:00

## Context

The original motivation (with added emphasis):

> \[Collections would like to] ingest born-digital items through Archivematica under bnumbers from Sierra. This would mostly happen for digital art and material that is commissioned or acquired outside of archives. Material outside of archives must be ingested into Sierra as Calm is just for archival material.

> Born-digital items under bnumbers should go into the born-digital workflow in Archivematica with some tweaking \[...] IIIF Builder would need to pick up the **Archivematica METS** with bnumbers and *handle them in the same way as they do with the Archivematica METS with the Calm ref nos*.

> Complicating this is the fact that Wellcome should be replacing Calm and Sierra

## Current functionality

IIIF Builder understands what a B Number is, and can validate that they are correct using the [check digit](https://github.com/wellcomecollection/iiif-builder/blob/df1e0697fde2ac70ac366cdb1d6728ed642b07ca/src/Wellcome.Dds/Wellcome.Dds.Common/WellcomeLibraryIdentifiers.cs#L28-L35). It also translates identifiers for multiple manifestations generated by Goobi during digitisation into IIIF Collection and Manifest IDs. Therefore:

* b24886932 becomes a [IIIF Manifest](https://iiif.wellcomecollection.org/presentation/b24886932)
* b33282262 becomes a [IIIF Collection](https://iiif.wellcomecollection.org/presentation/b33282262), where
* b33282262\_0008 is a [volume](https://iiif.wellcomecollection.org/presentation/b33282262_0008) of that 13-part collection
* b19974760\_207 is a [volume](https://iiif.wellcomecollection.org/presentation/b19974760_207) of Chemist and Druggist, and
* b19974760\_207\_0048 is an [issue](https://iiif.wellcomecollection.org/presentation/b19974760_207_0048) of Chemist and Druggist (three level hierarchical identifier)

To do this it *parses* incoming identifiers to understand what they are. All of the above start with a b number, but the following born digital, CALM identifiers do not:

* MS.9178
* SAPHY/Z/3/5/16/16
* SAPHY\_Z\_3\_5\_16\_16

The last of these is the same identifier as the second one, just in a path-safe form that can be used in Dashboard URLs.

Although identifiers always enter iiif-builder as strings (e.g., in API URIs, Dashboard URIs, SQS messages or text files to process), they are parsed into a [DdsIdentifer](https://github.com/wellcomecollection/iiif-builder/blob/2d73518b203151db6fe61f5ea23f461db2af1e84/src/Wellcome.Dds/Wellcome.Dds.Common/DdsIdentifier.cs) object. The current C# code makes use of [implicit operators](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/user-defined-conversion-operators) allowing for easy-to-read code where identifiers can behave as strings or as the more complex DdsIdentifier class as required, without explicit conversion between forms. This is viable because it is cheap to parse a string, and *at the moment* we learn everything we need to know from parsing the identifier with procedural code - we don't need to look up third-party sources of information. `DdsIdentifier` simply distinguishes between identifiers that have a b number, and those that do not; those that do not are assumed to be CALM. `DdsIdentifier` also pulls out volume and issue parts, and translates CALM identifiers between the path-safe form used in the dashboard and the regular form used everywhere else.

The current iiif-builder codebase often has conditional logic like this:

```c#
if(identifier.HasBNumber)
{
    // do something
} 
else 
{
    // do something else
}
```

That condition is a proxy for what we really want to know:

* What system processed it, Goobi or Archivematica? (and therefore what METS profile does it have?)
* What system is that identifier an authority from (Sierra or CALM)?
* Where are its files in the storage service (`/digitised/` or `/born-digital/`)?

## Upcoming challenges

At the moment we can parse a string using just the logic in [DdsIdentifer](https://github.com/wellcomecollection/iiif-builder/blob/2d73518b203151db6fe61f5ea23f461db2af1e84/src/Wellcome.Dds/Wellcome.Dds.Common/DdsIdentifier.cs) and know that we can find a Goobi-generated METS file in the `digitised` storage location, or an Archivematica-generated METS file in the `born-digital` storage location. *We know this just by looking at the string.*

But in future:

* Some Archivematica-processed born-digital items may have a b number and NOT have a CALM Reference Number
* There won't even be B Numbers when Sierra is replaced by some other Library Management System

which will mean *we **cannot** know METS formats, storage locations or anything else just by looking at the string.*

## Proposal: Introduce an Identity Service

The iiif-builder codebase will be significantly refactored. `DdsIdentifier` will be replaced by a new class `DdsIdentity` which:

* is obtained from a **service dependency**, rather than parsed from a string:

```c#
// old (showing implicit conversion):
DdsIdentifier ddsId = "b33282262_0008"; 

// new
DdsIdentity ddsId = identityService.GetIdentity("b33282262_0008");
```

* Has properties that directly reflect the things we need to know to process objects, rather than make decisions based on the form of the identifier:

```c#
if(ddsId.Generator == Generator.Goobi)
{
    // expect a Goobi METS
}

if(ddsId.StorageSpace == StorageSpace.BornDigital)
{
    // construct the right S3 key...
}

if(ddsId.Source == Source.Calm)
{
    // some archive-specific logic
}
```

* Retains the part-level volume and issue information we need for multiple manifestations, which does not exist in the Catalogue API

```c#
if(ddsId.VolumePart != null)
{
    // this is to useful to over-abstract away into `partOf` chains
}
```

This means that we introduce an `IIdentityService` interface that is introduced as a dependency in many parts of the codebase that could previously rely on automatic conversion between `string` and `DdsIdentifier`.

```c#
public interface IIdentityService
{
    DdsIdentity GetIdentity(string s);
    
    // Later:
    //Task<DdsIdentity> GetIdentityAsync(string s);
}
```

> \[!IMPORTANT] While we don't yet know how later implementations of this interface will obtain their information when they can no longer parse it out of the identifier string, **we have removed this concern from the rest of the iiif-builder codebase** and need only worry about a new implementation of `IIdentityService` for future functionality.

## Initial Experimental Implementation

This major refactor **has already been done and tested** in this pull request: <https://github.com/wellcomecollection/iiif-builder/pull/282>

This wires up the new IIdentityService interface as a service dependency and provides an implementation that essentially has the same parsing functionality as the previous version:

[/src/Wellcome.Dds/Wellcome.Dds.Common/ParsingIdentityService.cs](https://github.com/wellcomecollection/iiif-builder/blob/identity-service-experiment/src/Wellcome.Dds/Wellcome.Dds.Common/ParsingIdentityService.cs)

This returns the new [DdsIdentity](https://github.com/wellcomecollection/iiif-builder/blob/a0cbf6379102ddd6d3e6041d0fad745e89bbcd2c/src/Wellcome.Dds/Wellcome.Dds.Common/DdsIdentity.cs) object.

It also caches parsed `DdsIdentity` objects in memory for efficiency. This won't make much difference now as the string parsing is very quick, but will be significant when the `IIdentityService` implementation needs to make calls to other sources of information.

## Next steps

* Complete testing of this refactor, deploy to production. Current PR has 96 changed files.
* For the "b numbers in archivematica" scenario, work out **how** we will know that the `Generator` property should be `Generator.Archivematica` and the `StorageSpace` property should be `BornDigital`
* implement / update our IIdentityService implementation
* understand what the Sierra-replacement identifiers will look like and what they mean, so that:
* Given any identifier string, we can develop an implementation of IIdentityService that populates the fields of `DdsIdentity`


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.wellcomecollection.org/request-for-comments-rfcs/081-identifiers-in-iiif-builder.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
