Data Model

Introduction

This is a description of the data model for catalogues chosen by The National Library of Sweden (KB). It has taken form during the construction of our new cataloguing system. The design is guided by various Linked Data principles and patterns, and is comprised of:

  • A Data Structure based on RDF.
  • A Domain Vocabulary defining the terms used for describing our business entities with this data model (their types, properties and relations). This is defined using RDFS with some use of OWL.
  • A Conversion Mechanism for turning data in MARC into the above structure and terms.

Requirements

  1. Open up for improvements including (but not limited to):

    • reuse data succinctly, avoiding costly repetition;
    • extend classes, properties and definitions/terms/enumerations for new content and carrier types and use cases (e.g. accessibility);
    • coordinate multiple identifiers to facilitate distributed collaboration.
  2. Continue to provide MARC21 to member libraries of the LIBRIS collective, to ensure their business can continue uninterrupted.

Domain Vocabulary

We have opted to use one, single, unified namespace for our bibliographic descriptions, agents, concepts, events, places and "intangible" terms (definitions). This core vocabulary has the IRI https://id.kb.se/vocab/ and will be referred to as KBV from hereon. In examples, we will use the prefix kbv: for this purpose:

prefix kbv: <https://id.kb.se/vocab/>

This is an Umbrella Vocabulary, where concepts are equivalent to known models, tailored for our specfic application needs. These include:

  • labelling,
  • restricted domains/ranges,
  • display lenses for ordered properties,
  • mappings for wider alignments,
  • legacy terms for interoperability with otherwise unmapped data.

Alignments

This vocabulary extends BibFrame, which is used as the conceptual foundation for the design. This gives us some leeway for changes (provided that we knowingly maintain this mapping). We find that BibFrame in version 2.0 is quite close to the shapes we've been forming in our development. It is a bit lacking in certain details though, and maintains some idiosyncrasies which we still consider as open issues.

(This is similar to the original design of Schema.org. While the business case is quite (albeit not fundamentally) different, the scope of Schema.org is vaster in scope (but not in depth). This approach shows that lots of terms can, if judiciously named and coordinated, fit in one design. We have the possibility at any time to divide our namespace, if we encounter a limit somewhere down the line.)

(On the BibFrame mailing list, it seems that LoC themselves plan a similar approach:

Instead, LC will define an LC-specific ontology for our bibframe implementation, which will import the bibframe ontology as well as the relators.

Source: https://listserv.loc.gov/cgi-bin/wa?A2=BIBFRAME;da10a154.1612.)

Design Process

We base our vocabulary explicitly on BibFrame 2.0, by The Library of Congress.

Where we need more generic or specialized terms, we carefully choose from sensible definitions that are in accordance with our general design, primarily in:

For compatibility with common forms of published bibliographic linked data, we also strive to coordinate with:

For cases in our MARC21 data where we cannot map to anything from above, we defined terms in our own "MARC vocabulary". This is done to preserve information we cannot readily discard. Our long-term ambition is to get away from those forms of expression.

It is possible that we also define new, unmapped terms we see need for in the main vocabulary. We may eventually work to standardize and promote these, preferably in BibFrame, if that remains viable long-term.

Domain Coverage

  1. Administrative skeleton

    (While the core Record structure is domain independent, certain fairly domain-depdentend details may appear here in the form of properties such as status, and provenance details such as modifying library, (data) sources and licensing.)

  2. Bibliographic records in the MARC sense are partitioned roughly like this in the new domain model:

    • Instance
    • Work (data at the Expression and Work levels in the FRBR sense, where we currently link to UniformWork:s using expressionOf (derived from Name+Title and Title Authority records).)
  3. Authority Data:

    • UniformWork
    • Person
    • Organization
    • Meeting
    • Event
    • GenreFormTerm, TopicalTerm, ...
  4. Holding Data:

    • Item
  5. The remaining data constitutes an extension of the old Authority notions. These items describe:

    • the main vocabulary
    • Support vocabularies, blending into...
    • various enumerations
    • remaining definitions of intangibles

In MARC, these items are represented by:

  • fixed definitions: "enumerations"
  • code lists: Language, Country, Nationality and Relator.

We currently mint URIs for these in the following spaces:

Data Structure

The foundation is RDF, upon which we apply usage patterns. Judicious use of RDF Schema and OWL is applied to formally document the properties, classes and their relationships.

Upon the Basic RDF structure, we have applied a couple of conventions and constructs (design patterns) (see also LD-Patterns).

Language Tags

The use of language tags has been kept to a minimum. We use them mainly for vocabulary terms and related "intangible" definitions.

See also Literal Structures Design Issue.

The Use of BNodes

BNodes are used for Qualified Roles and Structured Values.

We also (currently) use bnodes for locally described resources for which there is no authoritative URI. If need arise, we might assign fragment identifiers for these, in order to allow linking and further description of these "unknowns". We do not recommend this use at this time though, since it might lead to misunderstandings (such as seemingly knowing the identity of something – and refering to that elsewhere – which turns out to be false).

To be clear, the distinction between structured values and roles (e.g. hasTitle, contribution) is determined by nodes belonging to one of those classes (including subclasses thereof), not by the presence/absence of an identifier on the node.

RDFS and OWL Semantics

We use owl:sameAs pervasively to handle aliases and legacy IDs.

Distinct SubClasses

Similar to common vocabularies like Schema.org and BibFrame. We do some more lavish subclassing to capture diversities seen in our data.

Distinct SubProperties

  • Use distinct subproperties where it is apparently usable for interfaces (finding, understanding and editing descriptions). Example: publication subPropertyOf provisionActivity.

Property Chain Axioms As Shorthand Forms

owl:propertyChainAxiom is used for defining simple properties out of chained statements.

We also use them to be able to assign simple labels to such chains, for use in search, presentation and editing interfaces.

For example, connecting dcterms:publisher, dcterms:issued from a bf:provisionActivity sub-property with rdfs:range bf:Publication, chained to bf:agent and bf:date respectively.

(The astute observer should notice that this is only formally permitted in an OWL Full Regime, since the rdfs:domain of owl:propertyChainAxiom is owl:ObjectProperty, and we readily apply this on owl:DatatypeProperty properties. Our application is admittedly novel, and as with most of our OWL usage, we use it primarily for structured documentation and custom implementations, and only in theory do we consider the actual use of OWL reasoners for utilizing this information. We do on occasion test this model data with examples put through reasoners, but it is not part of our operations.)

Extended Design

The following designs are inspired by (and where applicably explicitly specializations of) designs found in the PROV and Schema.org vocabularies. We found them palatable and pragmatic.

Qualified Roles

This is an application of qualified forms (as also exemplified in PROV qualified terms and schema:Role.)

We can use owl:propertyChainAxiom here too, to correlate these with simple links and properties.

Types/Aspects

We avoid multiple types in the data we create (including data mapped from MARC21). We find the implementation of services and interfaces to be far more simple by doing so.

Related to this, we put aspects of "type" into the distinct "RDA" properties. This is mainly due to the partitioning of type information in BibFrame into both an intrinsic type, and additional aspects given using bf:content (for bf:Work entities) and bf:media and bf:carrier (for bf:Instance entities).

For discussions about the merit of this, see Content/Carrier Design Issue.

Typed Enumerations

(c.f. Schema.org Enumeration and skos:Concept / skos:Collection)

Used for our definitions (Vocabulary, Relators, Language, Country, Nationality).

For more thought on this, see Concept/Thing Design Issue.

Structured Values

Base on schema:StructuredValue.

  • Complex Structured Values (beyond using rdf:value, e.g. bf:mainTitle)

  • Using property-chain-axioms to "shrink" these.

  • for granularity

  • datatypes/structured value folding (@value <-> rdf:value)

For more thought on this, see Concept/Thing Design Issue.

See also Literal Structures Design Issue.

Example: Annotated Identifiers (BF)

Given:

:identifiedBy a owl:ObjectProperty .

:isbn a owl:DatatypeProperty;
    owl:propertyChainAxiom (
        [ rdfs:subProperty of :identifiedBy; rdfs:range :ISBN ]
        rdf:value
      ) .

These are considered equivalent for value access:

<> :identifiedBy [ a :ISBN ;
            ns1:value "123-456-789-0" ;
            :note "print" ] .

<> :isbn "123-456-789-0" .

Record and mainEntity

Structure:

GRAPH <record> {

<record>
    :mainEntity <thing> .

}

In the search index, we promote the things, and use our inverse relation, meta to point to the record.

<thing>
    :meta <record> .

Records are Administrative Information Resources

Records only carry library business information. They are not skos:Concepts (the latter are just very abstract and/or naive notions of things.)

For discussions about this, see Record/Thing Design Issue.

Further Development

We see the direction of BibFrame as a gradual step towards utilizing RDF more effectively in the library industry. RDF is merely a tool (which at worst just exposes disconnectedness and arbitrary divisions hidden in isolated structures and patterns). It must be emphasised that future coordination, simplification and establishing of simpler patterns of expression has to happen, in order for this path to yield value both for usage, user experience and service, and for efficiency in cataloguing (in the widest sense).

The only practical means for paving the path for such work that we can see, is to make certain mappings explicit, so that designs can converge. Depending on the discussions we have here, we can elaborate more on the list. Right now, interested parties should feel free to explore the ongoing construction of our vocabulary.

Further Reading

Design and Vocabulary Issues

See Issues.

Related Standards

See Standards.

Services

This model is used to build our services.