BnF
BnF

Semantic Web and data model

Summary

The semantic web in the data.bnf.fr project


The data.bnf.fr project has to be placed in the context of our move towards open data. This approach has been defined by the W3C, regarding the “semantic web” or “linked data”.
This is about structuring resources in order to make them reusable by machines in a better way. The data.bnf.fr project uses data which have been created in various formats such as InterMarc for the main catalogue, XML-EAD for archives inventories and Dublin Core for the digital library.
Such data is automatically gathered, modeled, enriched and published in the RDF semantic web language. The result is available on the website, in different RDF syntaxes: RDF-XML, RDF-N3, RDF-NT, JSON.

The Bibliothèque nationale de France provides:

  • URIs for resources: all resources have permanent identifiers, granted via the ARK process which is the way to find all resources of the library.
  • a display of data in RDF as “linked open data”. It is available for every page and for the whole database
logo Stanford

Data.bnf.fr et Gallica won the Stanford Prize for Innovation in Research Libraries (SPIRL). See the Stanford complete report

The software is: CubicWeb

CubicWeb It is an open source platform for semantic web applications under LGPL licence.

logo cubicweb

CubicWeb has won the Dataconnexions 2013 awards , organized by Etalab, the mission for Open Data promotion under the authority of the French Prime Minister.

How to retrieve data.bnf.fr data


You can retrieve or request our data:

  • en cliquant sur l'icône "RDF" en bas des pages ;
  • by clicking on the RDF icon, at the bottom of the pages
  • by adding the following suffixes to the URL: NT, N3, RDF-XML, according to the format needed.

For instance:
http://data.bnf.fr/11928016/jules_verne/rdf.xml,
http://data.bnf.fr/11928016/jules_verne/rdf.nt,
http://data.bnf.fr/11928016/jules_verne/rdf.n3.
  • via content negotiation, using a RDF web browser, from the URL.

A dump of all data in data.bnf.fr is also available on our FTP server: hôte : echanges.bnf.fr, port : 21
login : databnf, password : databnf
et via HTTP: whole dump rdf (rdf/xml)

This dump is split in specific dumps:

See the Open License, to reuse the data.

Last update: 2014/07/26.

Links to external data sets and repositories

Our data is linked to the equivalent pages in other data repositories. They are matched to external datasets.


data.bnf.fr aligned to data sets that are found in CKAN, in particular dbpedia and VIAF.

RAMEAU subject headings are matched to:

Data about authors is also linked to:

Data.bnf.fr also matches id.loc.gov to BnF languages and country codes, dewey.info to subjects headings, and uses DCMI type to specify the types of documents.

ARK identifiers and URIs

BnF identifies bibliographic and authority descriptions and digital documents with ARK identifiers.

This identifier is built this way:

More information about ARK identifiers at BnF (French).

This record identifier is also used to link different records and different BnF databases together.
Example:
the record http://catalogue.bnf.fr/ark:/12148/cb30625225, is linked to the record " Victor Hugo ":
100 $311907966 $w.0..b.....$aHugo$mVictor$d1802-1885$40070

In data.bnf.fr, URIs are built with the ARK of the authority record of the Main catalogue. They identify the concepts that are described in the skos:concept class, in our data model.

Example:
the authority record Victor Hugo http://catalogue.bnf.fr/ark:/12148/cb11907966z and the "concept" of Victor Hugo in data.bnf.fr http://data.bnf.fr/ark:/12148/cb11907966z are built on the same ARK identifier.
they are permanent identifiers, HTTP, actionable and enable to display our pages on the semantic Web.

Web redirection and content negotiation

To facilitate pages indexing by search engines, data.bnf.fr URL have explicit labels.
The URL of work, author and themes pages is made this way: http://data.bnf.fr/ID/label

Example: http://data.bnf.fr/11907966/victor_hugo/

there is a HTTP redirection mechanism from ARK identifiers and URIs to URL:


http://data.bnf.fr/ark:/12148/cb11907966z via HTTP 303 leads to http://data.bnf.fr/11907966/victor_hugo/
http://data.bnf.fr/11907966 via HTTP 303 leads to http://data.bnf.fr/11907966/victor_hugo/
http://data.bnf.fr/11907966/victor_hugo via HTTP 301 leads to http://data.bnf.fr/11907966/victor_hugo/

We have a content negotiation mechanism:
http://data.bnf.fr/11907966/victor_hugo/” brings a representation of the page that depends on the HTTP header.
For instance:


http://data.bnf.fr/11907966/victor_hugo/fr.html
http://data.bnf.fr/11907966/victor_hugo/en.html
http://data.bnf.fr/11907966/victor_hugo/rdf.xml
http://data.bnf.fr/11907966/victor_hugo/rdf.n3
http://data.bnf.fr/11907966/victor_hugo/fr.pdf
http://data.bnf.fr/11907966/victor_hugo/en.pdf
There isn’t any language for the RDF files.

The FRBR model

Data.bnf.fr is carried out in the context of the recent evolutions of bibliographic description, by experimenting and adapting the FRBR (Functional requirements for Bibliographic Records) model, elaborated by the IFLA (International Federation for Library Associations).
The model has three entity groups which are linked together by relationships: information about documents, persons and organizations, and subjects.

  • “Work” pages

The first group of the FRBR model describes the different aspects of an intellectual or art creation, and discerns 4 levels: work, expression, manifestation and item.

The work level is about the intellectual and artistic creation. For instance: Le colonel Chabert by Honoré de Balzac. “Work” pages are created using the related authority records from BnF Main Catalogue.

The expression level is a version of a work. In data.bnf.fr, you will find: the type of document, the language of the document and the relation to a contributor (illustrator, translator, author of a preface…). Example : http://data.bnf.fr/ark:/12148/cb32262848x#frbr:Expression.

The manifestation level is the physical embodiment of a work. For instance an edition of Les Misérables like “Nouvelle impression illustrée. 1879-1882. Paris. E. Hugues”. This level corresponds to the bibliographic record in BnF Main catalogue, or to a manuscript that is identified by a label in the Archives and Manuscript Catalogue (BnF archives et manuscrits).

There can be a part-whole relationship between a work and another work. For example: Le Père Goriot (Honoré de Balzac), is part of the work Scenes de la vie privée, by the same author, and both are considered as works and have a page in data.bnf.fr (http://data.bnf.fr/ark:/12148/cb427567440).
  • “Author” pages:

A person or an organization can be either the “author” of a work (then there is a link between the “author” page and the related “work” page) or “contributor” of an expression (translator, preface writer, librettist…).
Nevertheless, as the expression level is not different from the manifestation level in the html pages of data.bnf.fr, contributors do only appear at the manifestation level. The different creation or contribution roles are listed in a BnF repository, in the Intermarc format, and in the Library of Congress repository, in Marc. This kind of data enriches the RDF of the pages.

Link to the Intermarc code list for relators and creators (BnF).

Link to the Marc code list for relators of the Library of Congress.

  • “Subject” pages

Among retrievable data, there are subjects records from the Bibliothèque nationale de France (RAMEAU, which is the French indexation language). They have been converted into the RDF language SKOS (Simple Kowledge Organisation), in the context of the European project TELplus. This repository is now updated on data.bnf.fr with the whole current database from the Bibliothèque nationale de France.
In order to get dereferenceable URIs in our website, URIs from the initial project such as http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb12650268p, have to be converted to simple and uniform URIs with: the root: http://data.bnf.fr and the ARK identifier of the authority subject record.
For instance:
The URI http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb12650268p, the subject “ornithologie”, will be replaced by: http://data.bnf.fr/ark:/12148/cb12650268p.
Manifestations which have a RAMEAU term as a subject are brought together in the appropriate “subject” page.

Moreover the site holds pages that gather works and manifestations about a work or an author. These pages are not indexed by search engines and are available from the “work” or “author” pages.
For instance: on the page “Napoleon”, there is a link towards a page presenting documents about Napoleon such as Vie de Napoléon Buonaparte, 1827.

Alignments and clustering by work

In “work” and “author” pages, all manifestations by a single author are gathered around his works, thanks to the explicit link to title authority record (Titre Conventionnel or TIC, in French) , inside the original bibliographic record.

In the meantime some manifestations are not linked to the title authority record and remain “orphan”. In order to improve the way our data is translated in FRBR and to bring a better service to the public, it is important to align these orphan manifestations, which means bringing them together around the corresponding work.

Example:
Bibliographic record (BnF) with a link to the title authority record “Fables” and the author authority record “Jean de La Fontaine”.
Bibliographic record (BnF)without any link to the title authority record “La cigale et la fourmi” but with a link to the author authority record “Jean de La Fontaine”.

That is why we have already produced simple alignments in data.bnf.fr . When a manifestation is explicitly linked to an author authority record in the bibliographic record, and when the character string of this manifestation is exactly the same as the work’s title, then the manifestation is aligned with the work.

Yet, after this simple alignment, many manifestations remain orphan. In the long term two solutions are possible:

  • alignment: : linking manifestations to a work, which has its own title authority record and, thus, its own page.
    We use a simple and advanced alignment algorithm (word beginning with, exact match, words with a X distance, Levenstein distance, matching algorithm) to determine whether two character strings correspond to the same work. The link to the author authority record remains essential to align works.
  • clustering: if there is no title authority record, some manifestations are gathered around a new documentary unit . This work is now in process at BnF.

RDF datamodel

The data model is presented here:

schema ontologie

See also : model for places (geographic records)

Ontologies and vocabularies

We preferred to reuse existing vocabularies in order to foster interoperability.

PrefixURI
bibohttp://purl.org/ontology/bibo/
biohttp://vocab.org/bio/0.1/
dchttp://purl.org/dc/elements/1.1/
dcmi-boxhttp://dublincore.org/documents/dcmi-box/
dctermshttp://purl.org/dc/terms/
foafhttp://xmlns.com/foaf/0.1/
frbr-rdahttp://rdvocab.info/uri/schema/FRBRentitiesRDA/
geohttp://www.w3.org/2003/01/geo/wgs84_pos#
geonameshttp://www.geonames.org/ontology#
ignhttp://data.ign.fr/ontology/topo.owl#
inseehttp://rdf.insee.fr/geo/
isnihttp://isni.org/ontology#
marcrelhttp://id.loc.gov/vocabulary/relators/
mohttp://musicontology.com/
orehttp://www.openarchives.org/ore/terms/
owlhttp://www.w3.org/2002/07/owl#
rdagroup1elementshttp://rdvocab.info/Elements/
rdagroup2elementshttp://RDVocab.info/ElementsGr2/
rdarelationshipshttp://rdvocab.info/RDARelationshipsWEMI/
rdfshttp://www.w3.org/2000/01/rdf-schema#
skoshttp://www.w3.org/2004/02/skos/core#

BnF ontology : for specific needs, BnF created classes and properties, published under the address: http://data.bnf.fr/ontology/bnf-onto/.

BnF vocabulariesBnF specific vocabularies are displayed at this address : http://data.bnf.fr/vocabulary-en. .

Embedded data: Schema.org and Opengraph Protocol

“Author”, “work” and “subject” pages are open on the Web and can be reached by search engines.
This is why, except from the traditional methods used for indexing the homepage, we have chosen to embed two kinds of data to structure these pages:

  • Schema.org, provides a vocabulary to add information to the HTML content, with a microdata format, to foster the indexing by search engines.
  • data.bnf.fr used: http://schema.org/Person, http://schema.org/Organization, http://schema.org/Book,http://schema.org/Place, et http://schema.org/TheaterEvent.

  • Opengraph Protocol (OG), so that the pages can be represented in social networks.
  • It is a very simple vocabulary to encode in RDFa metadata to be retrieved when the user adds the resource to its Facebook profile.

    Mappings between MARC and EAD format to RDF

    Here is a mapping between the MARC format (Intermarc and Unimarc) and EAD in use at BnF, to RDF, as implemented in data.bnf.fr.

    Data.bnf.fr relies on bibliographic data that is structured and linked together, on order to build pages about authors, works and themes. In particular:

    • The structure: fields and sub-fields in the MARC format.

    For instance the page gathering all documents about an author or a work is created automatically, using all bibliographic records that are linked in the field 6XX to the authority record of a person or a work (which is the field for subject indexing in Intermarc, the a MARC adapted for BnF).
    • The links: between bibliographic records (main catalogue) or Finding aids (for archives and manuscripts) and the authority records.

    Thanks to reliable links to person and works authority data we can gather the bibliographic descriptions of documents in pages about authors and works.
    Example: the description of the edition " l’Alchimiste " (http://catalogue.bnf.fr/ark:/12148/cb31009441) by Alexandre is linked to the authority record of Alexandre Dumas (http://catalogue.bnf.fr/ark:/12148/cb119010630), in the field 100 $3 (Intermarc).
    In Intermarc: 100 $311901063 $w.0.2b.....$aDumas$mAlexandre$d1802-1870$40070
    • Role codes specifying these links:

    The different activities that can be found in pages about an author correspond to different types of roles of persons or organizations on a document (translator, author of a preface, illustrator…).
    They come from the role codes that specify the link between a bibliographic record and an authority record.
    These codes are displayed here: http://data.bnf.fr/vocabulary/roles.
    Example: Baudelaire translated "Dix contes d'Edgar Poe" (http://catalogue.bnf.fr/ark:/12148/cb311263053).
    The bibliographic record of the document is linked to the authority record of Charles Baudelaire, with the role code " 0680 ", which means " translator ".
    700 $311890582 $w 0 b.....$aBaudelaire$mCharles$d1821-1867$40680