Showcase recently digitised works on the Collections landing page

Last modified: 2025-10-15T10:50:00Z

Context

The online collection is continually evolving, primarily through ongoing digitisation efforts, as well as through cataloguing of born-digital works. Users could benefit from knowing what is newly available to view and download online.

See Notion page for more details.

The New online component has been built and uses hard-coded data as a first step. See Collections landing page.

WorkItem type expected by the component. NOTE: there is some flexibility here, as this was built as a temporary/POC solution.

type WorkItem = {
  url: string;
  title: string;
  image: {
    contentUrl: string;
    width: number;
    height: number;
    alt?: string;
  };
  labels: { text: string }[];
  partOf?: string;
  contributor?: string;
  date?: string;
};

This RFC aims to describe how we might integrate a createdDate into the Work model, and how the catalogue-api might expose this data.

Requirements

  • New Online is to only display works that have been digitised by the Wellcome Collection team and born-digital archives, ie. works that have a METS file. It is not meant to include other digital works such as EBSCO journals.

  • We want to display 4 works and they can all be of the same format, eg. archives and manuscripts

  • a "New Online" page which lists all the recently digitised works by descending createdDate order

  • the 4 "New Online" works on the landing page will be selected among the above, and editorialised through Prismic

Integrate the digitised date in the Work model and the catalogue pipeline.

  • The createdDate will not always be strictly accurate: when a work is digitised again, the CREATEDATE is that of the latest digitisation. We can mitigate this by only extracting and loading createdDate for METS works that are on their v1.

  • What do we do for work with advisory? We can filter for items.locations.accessConditions.status.id": ["open"]

  • Amending the Work model and API should not be done lightly:

    • we must ensure that the model remains consistent and describes works in a sustainable, future-proof way,

    • the API being public, what use could be made of the extended functionality by external users?

As of now, the createdDate would only be used for some Items's DigitlaLocation but could be extended to describe the the accession or cataloguing date of physical items.

METS file - digitised or born-digital

The digitisation date is present in the METS's header, both in digitised and born-digital items.

NOTE: there can be multiple versions of a digital item, each with their respective CREATEDATE. Since we extract and transform the most recent version, the CREATEDATE will not always be the one when the item was digitised for the first time.

  <mets:metsHdr CREATEDATE="2018-04-28T10:33:56Z">
    <mets:agent OTHERTYPE="SOFTWARE" ROLE="CREATOR" TYPE="OTHER">
      <mets:name>Goobi - ugh-3.0-6d40b80 - 06−September−2017</mets:name>
      <mets:note>Goobi</mets:note>
    </mets:agent>
  </mets:metsHdr>

Work model

WorkData contains data common to all types of works that can exist at any stage of the pipeline, including a list of Item and their (in this case, Digital) Location

DigitalLocation extends Location and is distinct from PhysicalLocation.

case class DigitalLocation(
  url: String,
  locationType: DigitalLocationType,
  license: Option[License] = None,
  credit: Option[String] = None,
  linkText: Option[String] = None,
  accessConditions: List[AccessCondition] = Nil,
  createdDate: Option[String] = None, 🆕
) extends Location

Other works with a DigitalLocation

Ebsco and Miro works do not have a digitised date or version. Much like linkText, createdDate is an Option and defaults to None if not present or applicable.

works-source/works-denormalised data.items

"items": [
  {
    "id": {
      "type": "Unidentifiable"
    },
    "locations": [
      {
        "url": "https://iiif.wellcomecollection.org/presentation/v2/b30601241",
        "locationType": {
          "id": "iiif-presentation"
        },
        "license": {
          "id": "pdm"
        },
        "accessConditions": [
          {
            "method": {
              "type": "ViewOnline"
            },
            "status": {
              "type": "Open"
            }
          }
        ],
        "createdDate": "2019-09-13T14:33:15.254Z",
        "type": "DigitalLocation"
      }
    ]
  }
]

works-indexed mapping

{
  "works-indexed-2025-08-14": {
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "aggregatableValues": {},
        "debug": {},
        "display": {
          "type": "object",
          "enabled": false
        },
        "filterableValues": {
          "properties": {
            "availabilities": {},
            "contributors": {},
            "format": {
              "properties": {
                "id": {
                  "type": "keyword"
                }
              }
            },
            "genres": {},
            "identifiers": {},
            "items": {
              "properties": {
                "id": {
                  "type": "keyword"
                },
                "identifiers": {
                  "properties": {
                    "value": {
                      "type": "keyword"
                    }
                  }
                },
                "locations": {
                  "properties": {
                    "accessConditions": {},
                    "license": {},
                    "locationType": {},
                    "createdDate": { 
                      "type": "date"
                    }
                  }
                }
              }
            },
            "languages": {},
            "partOf": {},
            "production": {},
            "subjects": {},
            "workType": {}
          }
        },
        "query": {},
        "redirectTarget": {
          "type": "object",
          "dynamic": "false"
        },
        "type": {
          "type": "keyword"
        }
      }
    }
  }
}

NOTE: the display object is not strictly mapped, so as to offer flexibility in what the API returns to the client. In this case it would not be necessary to extend the display object to include the createdDate as there is no plan for this to appear in the "New online" Work card.

Catalogue-api

Once the createdDate is part of the Work model and indexed in works-indexed-pipeline-date, we can extend the SearchApi to enable additional sorting.

Essentially we want to return:

  • documents filtered by accessConditions.status": ["open"]

  • sorted by most recent createdDate

ES query would need to look like this:

{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "filterableValues.items.locations.accessConditions.status.id": ["open"] 
          }
        }
      ]
    }
  },
    "sort": [
    {
      "filterableValues.items.locations.digisedDate": {
        "order": "desc",
        "missing": "_last" // Place documents with missing fields at the end
      }
    }
  ]
}

Use existing /works endpoint

We can exercise the existing AccessStatusFilter and add a new SortRequest alongside ProductionDateSortRequest, eg. createdDateSortRequest

The open items sorted by most recent createdDate can be requested like so:

search/works?items.locations.accessConditions.status=open&sortOrder=desc&sort=items.locations.createdDate

Last updated