RFC 035: Modelling MARC 856 "web linking entry"
Last updated
Last updated
MARC field 856 is used to link to other resources, and it has a variety of uses at Wellcome. Among other things, it links to websites, electronic journals, and links to canned searches in our catalogue.
For example, on b30204021:
This is displayed as a link on wellcomelibrary.org:
We want to include these links on the new Wellcome Collection website, so we need to expose these links in the API. This RFC provides examples of how we use 856, and suggests an initial approach to modelling these in the API.
This analysis is based on a February 2021 snapshot of the Sierra catalogue. At time of writing, we have
262,754 instances of MARC 856 on bib records
9,611 instances of MARC 856 on holdings records
No instances of MARC 856 on item records
MARC 856 is repeatable, and in practice we do use multiple instances of the same field. In general, we can't assume any relationship between them:
Some bibs/holdings describe e-journals, and have an 856 for each volume of the journal. These are different resources.
Some bibs have different URLs for "library member access" and "free UK access". They point to the same resource, so these are two locations for one resource.
Some bibs point to different URLs, e.g. b20434406 links to the publisher description and table of contents for a printed book. These are different resources.
We will treat multiple instances of 856 as distinct resources.
The first indicator describes how you access the resource: for example, HTTP, FTP, or email.
On holdings records, the first indicator is always empty.
On bib records, the first indicator is mostly "4 - HTTP". We do have a handful of records with a different first indicator, but manual inspection shows that they all contain HTTP URLs.
We will ignore the first indicator in the Catalogue API.
The second indicator describes how this resource relates to the bib: is it the resource, or a version of it, or something related?
On holdings records, the second indicator is always empty.
On bib records, the second indicator is mostly "0 - Resource".
We don't distinguish between these relationships in the Catalogue API, and it's not worth adding that distinction for this field. At best, we'd have a few thousand records that use it.
We will ignore the second indicator in the Catalogue API.
The most interesting subfield for the catalogue is subfield $u, which contains the URL. A handful of instances of field 856 don't contain subfield $u or contain multiple instances; this seems like a cataloguing error.
These are the subfields I think we'll want to use:
$z = public note. This is used to label links on wellcomelibrary.org. It contains a mixture of values that could be link text (e.g. "View resource" or "Library member access") and labels (e.g. "Chicago Journals").
$3 = materials specified. This often provides more description (e.g. "Related archival materials" or "Cover image"). This is also exposed in the link text on wellcomelibrary.org.
And these are the subfields I think we can ignore:
$x = non-public note.
$a = host name. This field is used inconsistently. There are a few places where it's been used to store a URL instead of subfield $u; we should get those fixed in the catalogue rather than adding logic to the Catalogue API.
$q = electronic format type. This is only used on nine records, for text/html
and application/pdf
.
$m = contact for access assistance. Only used on eight record numbers to store what look like reference numbers.
$2 = access method. This is only used once with the value http
, which should be expressed by the indicator instead.
The companion field to field 856 "Electronic Location and Access" is field 956 "Local Electronic Local and Access".
In the Wellcome catalogue, this is only used for URLs of the form http://wellcomelibrary.org/item/{bnumber}
, which we don't need to expose in the Catalogue API.
This table lists the records I found which need some cataloguing fixes:
bib ID | proposed fix |
---|---|
The URL in subfield $a should be moved to subfield $u | |
The URL in subfield $a should be moved to subfield $u | |
The URL in subfield $a should be moved to subfield $u | |
The URL in subfield $a should be moved to subfield $u | |
The URL in subfield $a should be moved to subfield $u | |
Field 856 doesn't contain any useful information | |
ind1 = "7" and $2 = "http" should be replaced by ind1 = "4" | |
The multiple URLs in field 856 $u should be split into multiple instances of field 856 | |
The URL in subfield $z should be moved to subfield $u |
For every instance of field 856, we will create an unidentified Item with a single DigitalLocation. The location will have the following fields:
url
= the value of subfield $u.
If subfield $u is skipped or repeated, we will log a warning and not create the item.
If the contents of subfield $u doesn't look like a URL, we will log a warning and not create the item.
locationType
= OnlineResource
The label
on the item and linkText
on the location will be populated as follows:
Concatenate the contents of subfields $z, $y and $3, joined with spaces. If this is an empty string, we will omit both fields.
We don't want the linkText to be too long (no more than seven words), and some of these subfields contain very long strings. So we apply the following rule: if the concatenated string is seven words or less, and contains "access", "view" or "connect", we put it in the location "linkText"
field. Otherwise, we put it in the item's "label"
field.
At some point we might want to do quality control on this field and omit some of the values -- e.g. if removing generic "view online" descriptions from the Sierra data, so the front-end can use more consistent vocabulary.
If the 856 is attached to a bib record, then the Item is attached directly to the bib Work.
If the 856 is attached to a holdings record, then the Item is attached to every Work that the holdings is attached to. (i.e. every bib that the holdings links to)