RFC 017: URL Design

This RFC proposes a set of principles for designing URLs on wellcomecollection.org, ensuring they are persistent, user-friendly, and globally unique.

Last modified: 2022-12-09T15:25:52+00:00

Context

URLs are part of the user's online experience--they affect a user's ability to reliably share and reference online resources and discover and interact with the content and services we produce.

They can also have an impact on the amount of long-term technical debt we carry since getting the URL scheme wrong means we need to maintain redirects for the duration of the site.

We therefore follow the following principles:

URLs are designed to be persistent—this is the most important thing to consider when minting new URLs or devising a URL scheme. In practice, this means that URLs MUST NOT include:
a. reference to specific technology; b. dates unless the URL is about a date; c. status (old/new/draft etc.) d. subject unless the URL is about that subject
If a URL changes we SHOULD redirect to a new URL and maintain that redirect forever where a semantically equivalent resource exists;
When content is removed and there is no equivalent resource we SHOULD return 410 (HTTP Gone) with a link to an archived copy of the resource.
URLs SHOULD be as short as possible but no shorter. Deeply nested URLs have an impact on SEO and long URLs can cause problems when used in emails etc. short URLs are therefore preferable while maintaining sufficient entropy to support future growth.
There MUST be one URL per thing and all things have a URL. URLs are there to identify things on the Web and people use URLs to point to those things, this means:
a. all resources MUST have a unique URL; b. URLs can't be used to identify two or more resources; c. all fragments SHOULD dereference i.e. .../foo#bar should be addressable at .../foo/bar d. fragments (anything after a #) don't count as unique URLs; e. hash-bang URLs (#!) and other techniques that rely on client side JS MUST NOT be used.
URLs are globally unique. A user MUST be able to share a URL and anyone, anywhere in the world MUST be able to de reference the same resource.
URLs can identify: things, lists of things and forms. Query parameters SHOULD be avoided for anything that’s not a list.
URLs should use nouns (never verbs) - URLs are for identifying things.
The base of a URL path should be a plural (e.g. stories) - it identifies the collections of things. The resource can be singular.
A resource can be a singleton or a collection e.g. /stories/$storyID (the URL for a story) or /stories/by/formats/$format (the URL for all stories of a specific format)
URLs SHOULD be hackable. A user should be able to hack back a URL and get a broader set of resources e.g. it should be possible to hack back the URL for a story: .../stories/$story to .../stories/ and be returned a list of all stories or .../stories/by/formats/ for all story formats or ...stories/by/date/yyyy/mm/dd should hackable to return all stories published on the year, month or day.
URLs MUST NOT include any personally identifiable information, tracking parameters nor state.
All content MUST be served over https
URLs MUST be designed alongside the user interface and given the same level of care as any other UI component (possibly more because they are harder to change). We SHOULD try to have beautiful URLs

We are publishing a website not a book - make links, link between things and make those links hackable (whther or not they are linked to yet).

PreviousRFC 016: Holdings service NextRFC 018: Pipeline Tracing

Last updated 6 months ago