Request For Comments (RFCs)
  • Request for comments (RFC)
  • RFC 001: Matcher architecture
  • RFC 002: Archival Storage Service
  • RFC 003: Asset Access
  • RFC 004: METS Adapter
  • RFC 005: Reporting Pipeline
  • RFC 006: Reindexer architecture
  • RFC 007: Goobi Upload
  • RFC 008: API Filtering
  • RFC 009: AWS account setup
  • RFC 010: Data model
  • RFC 011: Network Architecture
  • RFC 012: API Architecture
  • RFC 013: Release & Deployment tracking
    • Deployment example
    • Version 1
  • RFC 014: Born digital workflow
  • RFC 015: How we work
    • Code Reviews
    • Shared Libraries
  • RFC 016: Holdings service
  • URL Design
  • Pipeline Tracing
  • Platform Reliability
    • CI/CD
    • Observability
    • Reliability
  • RFC 020: Locations and requesting
  • RFC 021: Data science in the pipeline
  • RFC 022: Logging
    • Logging example
  • RFC 023: Images endpoint
  • RFC 024: Library management
  • RFC 025: Tagging our Terraform resources
  • RFC 026: Relevance reporting service
  • RFC 026: Relation Embedder
  • RFC 027: Pipeline Intermediate Storage
  • RFC 029: Work state modelling
  • Pipeline merging
  • RFC 031: Relation Batcher
  • RFC 032: Calm deletion watcher
  • RFC 033: Api internal model versioning
  • RFC 034: Modelling Locations in the Catalogue API
  • RFC 035: Modelling MARC 856 "web linking entry"
  • RFC 036: Modelling holdings records
  • API faceting principles & expectations
  • Matcher versioning
  • Requesting API design
  • TEI Adapter
  • Tracking changes to the Miro data
  • How do we tell users how to find stuff?
  • Removing deleted records from (re)indexes
  • RFC 044: Tracking Patron Deletions
  • Work relationships in Sierra, part 2
    • Work relationships in Sierra
  • Born Digital in IIIF
  • Transitive hierarchies in Sierra
  • RFC 047: Changing the structure of the Catalogue API index
  • RFC 048: Concepts work plan
  • RFC 049: Changing how aggregations are retrieved by the Catalogue API
  • RFC 050: Design considerations for the concepts API
  • RFC 051: Ingesting Library of Congress concepts
  • RFC: 052: The Concepts Pipeline - phase one
  • RFC 053: Logging in Lambdas
  • RFC 054: Authoritative ids with multiple Canonical ids.
  • RFC 055: Genres as Concepts
  • RFC 055: Content API
    • Content API: articles endpoint
    • Content API: Events endpoint
    • Content API: exhibitions endpoint
    • The future of this endpoint
  • RFC 056: Prismic to Elasticsearch ETL pipeline
  • RFC 57: Relevance testing
    • Examples of rank CLI usage
  • RFC 059: Splitting the catalogue pipeline Terraform
  • RFC 060: Service health-check principles
  • RFC 060: Offsite requesting
    • Sierra locations in the Catalogue API
  • Content-api: next steps
Powered by GitBook
On this page
  • What is it like now?
  • What is in MARC?
  • Proposal - Catalogue Pipeline
  • Compound Genres as a whole
  • Proposal - Concepts Pipeline
  • Proposal - website
  • Rationale
  • Determining Genre-ness with the current data
  • Determining Genre-ness with the proposed data
  • Determining equivalence with the current data
  • Determining equivalence with the proposed data
  • Further Considerations

RFC 055: Genres as Concepts

Genres (type/technique) should be treated in the same manner as Concepts. This includes introducing Concept pages for them that work in a similar manner to those for Agents (where works about and by the Agent are listed separately). A Concept page for a Genre should list works about and using that technique.

What is it like now?

Currently, Genres behave a little like (compound) Subjects, in that a genre may be cracked into its constituent concepts, making a concepts list.

e.g. in b10721599

655  7 Novenas|zMexico.|2rbgenr 

Becomes the following entry in genres, consisting of the two concepts: Novenas and Mexico

{
  "label": "Novenas - Mexico",
  "concepts": [
    {
      "id": "ggsfmp3a",
      "identifiers": [
        {
          "identifierType": {
            "id": "label-derived",
            "label": "Identifier derived from the label of the referent",
            "type": "IdentifierType"
          },
          "value": "novenas",
          "type": "Identifier"
        }
      ],
      "label": "Novenas",
      "type": "Concept"
    },
    {
      "id": "puj4yvts",
      "identifiers": [
        {
          "identifierType": {
            "id": "label-derived",
            "label": "Identifier derived from the label of the referent",
            "type": "IdentifierType"
          },
          "value": "mexico",
          "type": "Identifier"
        }
      ],
      "label": "Mexico.",
      "type": "Place"
    }
  ],
  "type": "Genre"
}

However, unlike Subjects, the Genre-as-a-whole does not have its own name and identifier

Genre is not one of the types extracted from Works by the Concepts Aggregator. It is also not a type that is currently assigned to a Concept.

The constituent concepts that make up a Genre also behave in a similar manner to Subjects, where they are either a Concept ($a), or a more specific sort of Concept (e.g. Place, Period) depending on which subfield they come from.

What is in MARC?

  • Genre is extracted from marcTag:655 fields

Proposal - Catalogue Pipeline

The primary Concept that is currently in a genre's concepts list (a Concept of type Concept, derived from the $a subfield) should now become a Concept of type Genre.

Extract lcgft ids as authoritative identifiers for them.

Compound Genres as a whole

There are three options for dealing with compound genres as a whole in the data.

  • make it work in the same manner as Subjects,

  • treat it more explicitly as an extra Concept.

  • leave them, they don't exist. (recommended)

Like a Subject

The most consistent approach is to treat Genres in the same manner as Subjects. The Genre-as-a-whole becomes Identifiable, and an id is minted for it.

This would require little if any change to the Catalogue API and Work pages.

Currently, the Concepts pipeline and API extracts "things that look like Concepts" from anywhere in a Work. This is anything that has one of the known Concept types, and is identified. This currently includes Subjects, and would include Genres with this change.

However, the relationship between a compound subject and its identifier differs from the relationship between compound genres and their identifiers. An identified genre is an atomic unit that can be further embellished by extra subfields, whereas the various subfields form part of the subject.

In this MARC field, (DE-588)4135467-9 refers to exhibition catalogues, regardless of their subject or museum of origin. The values of x and y are embellishments on the Austellungskatalog genre.

655  7  |0(DE-588)4135467-9 |aAusstellungskatalog |xMiltärhistorisches Museum der Bundeswehr |y27.04.2018-30.10.2018 |zDresden. |2gnd-content
650  0 Science|xStudy and teaching (Elementary)|0sh 85118594 

The consistency gained by treating a Genre like a Subject may therefore be confusing.

As an extra Concept

A less consistent, but possibly more future-looking approach would be to treat the whole Genre as an extra concept in its concepts list.

The Genre-as-a-whole stays as it is, unidentified, but bearing the type Genre.

In the case of a compound Genre, a new Concept of type Genre, representing the Genre as a whole, could be inserted as the first concept in the concepts list.

This would be inconsistent with the way Subjects are represented, but is a better representation of what is happening in the Concepts pipeline and API.

Rather than Genres being another thing that looks like a Concept to be extracted by the Concepts Pipeline, the Works pipeline would be putting it in the concepts list.

As this would also require a more significant change to the Catalogue API and Works pages to link to the Genre's Concept page, and would result in API data containing inconsistent approaches for Genres vs Subjects or Contributors, it would be better to consider this approach as part of API v3, if desired.

Leave them

Compound Genres are unlike Subjects, in that rare situation (three instances) that they are identified in MARC, the identifier refers to the primary Concept within it, and not to the overall Genre. Similarly, LCGFT does not contain compounds.

e.g. from b30834107

655  7  |0(DE-588)4135467-9 |aAusstellungskatalog |xMiltärhistorisches Museum der Bundeswehr |y27.04.2018-30.10.2018 |zDresden. |2gnd-content

As a result, the correct target for a genre link should be to the genre of the primary concept. This is the same UI behaviour as contributors.

There is some conflict here between the apparent semantics of the three fields. A Genre feels more like a subject, in that the compounds are "things that exist in their own right", whereas the compounding of a Contributor is about the relationship between an Agent and a Work.

Some compound genres seem excessively specific for this purpose, leading to fragmentation (e.g. Almanacs, which have many entries like Almanacs - Pennsylvania - 1773 and Almanacs - Massachusetts - 1702, each with less than half a dozen entries), so having Almanacs as the concept page is probably more useful.

Proposal - Concepts Pipeline

The concepts pipeline will start extracting Genre as one of the types of Concept.

Include LCGFT as an authoritative source.

Proposal - website

Genre links on Work pages will link to the Concept page for their primary Concept. The remaining parts of the compound will still be displayed in the link text, but are not used to further refine that link.

Rationale

There are two goals for the data that need to be supported by this work.

Determining Genre-ness - When rendering a Concept Page or API result for a Genre, how do we know that it is a Genre, in order to do genre-specific things, e.g. populate a "works of this type" list.

Determining equivalence - The ability to determine that when a work is about a genre and another work is in that genre, that when they both refer to the same thing.

Determining Genre-ness with the current data

With the current data, the only way to determine whether a Concept is a Genre would be to assume that all Concepts may be Genres, and perform a text search in the genre.label and genre.concepts.label field in order to populate the "works in this genre" list.

This is problematic. Because Genres may be compound, searching on genre.concepts will pick up things that are not genres. Searching on genre alone would not allow us to link things that should be linked.

For example: Advertising fliers - England - London - 18th century is a compound genre, where "Advertising fliers" is currently a Concept.

So, either only Advertising fliers - England - London - 18th century is treated as a Genre, meaning that Advertising fliers is not. Or all of Advertising fliers, England, London, 18th century behave as genres.

Neither of these solutions are satisfactory.

We could treat only Concepts of type Concept as potential Genres, which would achieve the desired result.

Determining Genre-ness with the proposed data

In the example above, Advertising fliers would be a Genre, so that can be used as the signal to look for other works using that technique.

Determining equivalence with the current data

Currently, because both the primary Concept in a Genre's concept list, and the primary Concept in a Subject's concept list are of type Concept, they are already the same.

However, a compound Genre and an identical compound Subject differ on type, so even if they were identified, they would not be the same.

Determining equivalence with the proposed data

The primary Concept within a genre will be of type Genre.

This breaks that automatic link between the primary Concept of a Subject and that of a Genre. However, this proposal includes a mechanism for determining sameAs relationships in this case.

When requesting works containing a given Concept, the sameAs list will be consulted and the resulting query to Elasticsearch will fetch works containing both the originally requested Concept, and its equivalents.

Both compound and simple concepts will work in a consistent fashion.

Further Considerations

The data exists, and will continue to exist, to allow for filtering on non-genres in the genre.concepts field. For example, it will remain possible to query for genre.concepts=London.

Whether this becomes a feature that gets exposed via the API is out of scope of this RFC, but it is one that can be supported by the data format.

There are 220 different Almanacs (e.g. Almanacs - Pennsylvania - 1765), 89 different Poems (e.g. Poems - 1740), but most top level genres are not compounds.

PreviousRFC 054: Authoritative ids with multiple Canonical ids.NextRFC 055: Content API

Last updated 10 months ago

Genre Links on Work pages lead to a search for the whole genre in the genres.label field. e.g. in the exmaple above.

there are such fields

of them have an identifier

of those have an identifier in the scheme

of them are compounds

of those have an identifier (all in the / scheme)

Whereas in this MARC field (from b17259654), sh 85118594 refers to the whole subject of The value of x is part of the value of that subject.

The id in $0 refers to the genre and not to anything in the x, y, or z subfields.

covers the technique that will be used to match Genre-as-a-Subject (where it can only be a Concept) with Genre-as-a-Genre (where it will be a Genre).

Novenas | Mexico
1,368,770
1755
1688
LCGFT
14905
3
DNB
GND
"Science|xStudy and teaching (Elementary)"
Austellungskatalog
RFC 054