Catalogue pipeline
  • Introduction
  • Fetching records from source catalogues
    • What is an adapter?
    • CALM: Our archive catalogue
    • MIRO: Our image collections
  • Transforming records into a single, common model
    • Our single model: the Work
    • Creating canonical identifiers
  • Combining records from multiple sources
    • Why do we combine records?
    • How we choose which records to combine
  • Other topics
    • Catalogue
    • Search
      • wellcomecollection.org query development index
      • Hypotheses
        • Concepts, subjects
        • Contributors
        • Titles
        • Genres
        • Reference numbers
        • Synonymous names and subjects
        • Mood
        • Phrases
        • Concepts, subjects with other field
        • Contributor with other field
        • Title with other field
        • Genre with other field
        • Reference number with other field
        • Behaviours
        • Further research and design considerations
      • Analysis
        • Less than 3-word searches
        • Searches with 3 words or more
        • Subsequent searches
      • Query design
      • Relevance tests
        • Test 1 - Explicit feedback
        • Test 2 - Implicit feedback
        • Test 3 - Adding notes
        • Test 4 - AND or OR
        • Test 5 - Scoring Tiers
        • Test 6 - English tokeniser and Contributors
        • Test 7 - BoolBoosted vs ConstScore
        • Test 8 - BoolBoosted vs PhaserBeam
      • Collecting data
      • Reporting and metrics
      • Work IDs crib sheet
    • Adapters
      • Adapter lifecycle
      • Fetching records from Sierra
    • Sierra
      • Sierra IDs
    • Pipeline
      • Merging
    • APM
Powered by GitBook
On this page
  • Current uses
  • In the Catalogue pipeline
  • In the Catalogue API
  • Computing the check digit
  1. Other topics
  2. Sierra

Sierra IDs

PreviousSierraNextPipeline

Last updated 2 years ago

The Sierra IDs returned by the Sierra API are seven digits long, for example 1234567.

However, there are several optional features of a Sierra ID:

  • A leading period (.)

  • A one-character tag to identify the type -- for example, b for a bibliographic record, or i for an item record.

    Note that bibs and items have overlapping namespaces -- so we need this prefix to unambiguously identify a record if we don't have more context. For example, 1234567 could be a bib or an item.

  • An eighth .

Unfortunately, the presentation (and searchability) of a Sierra ID is inconsistent between different systems.

Current uses

Usage
Leading period?
Record type?
Check digit?

Sierra API

·

·

·

Sierra client display

·

✔

✔

Sierra client search

·

✔

✔

Encore URL

·

✔

·

Encore display

✔

✔

✔

Encore search

·

·

·

OPAC URL

·

✔

·

OPAC display

✔

✔

✔

OPAC search

·

·

·

Viewer page URL (canonical)

·

✔

✔

Viewer page URL (redirect)

·

✔

·

Internet Archive URL

·

✔

✔

Internet Archive page

·

✔

✔

Goobi

·

✔

✔

METS filename

·

✔

✔

METS identifier

·

✔

✔

Asset filename

·

✔

✔

Miro XML exports

inconsistent

✔

·

So the IDs have two main representations:

  • The Sierra ID (no check digit, no record type), and

  • The Sierra system number (check digit and record type)

The . prefix is only displayed in two places, and never used for search, so we can ignore it.

In the Catalogue pipeline

Since we receive a Sierra ID from the Sierra API, we store all our records using this ID. Throughout the pipeline, it's unambiguous whether a given ID is for a bib or an item, so we don't need to worry about the record type.

In the Catalogue API

For completeness, we'll include both versions of the ID in the Catalogue API:

"identifiers": [
  {
    "identifierScheme": "sierra-system-number",
    "value": "b1234567x",
    "type": "Identifier"
  },
  {
    "identifierScheme": "sierra-id",
    "value": "1234567",
    "type": "Identifier"
  }
]

and this means we support searching on both of those forms.

We explicitly don't support searching on the following variants:

  • Prefix but no check digit (e.g. b1234567)

  • Using a as a wildcard for the check digit (e.g. b1234567a)

  • With a period as a prefix (e.g. .b1234567x)

Computing the check digit

Quoting from the Sierra manual:

Check digits may be any one of 11 possible digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or x).

The check digit is calculated as follows:

Multiply the rightmost digit of the record number by 2, the next digit to the left by 3, the next by 4, etc., and total the products. For example:

Divide the total by 11 and retain the remainder (for example, 78 / 11 is 7, with a remainder of 1). The remainder after the division is the check digit. If the remainder is 10, the letter x is used as the check digit.

Here's an implementation of the check digit calculation in Scala:

def checkDigit(s: String): String = {
  val remainder = s
    .reverse
    .zip(Stream from 2)
    .map { case (char: Char, count: Int) => char.toString.toInt * count }
    .foldLeft(0)(_ + _) % 11
  if (remainder == 10) "x" else remainder.toString
}
check digit