The data.bnf.fr project has to be placed in the context of our move towards open data. This approach has been defined by the W3C, regarding the “semantic web” or “linked data”.
This is about structuring resources in order to make them reusable by machines in a better way. The data.bnf.fr project uses data which have been created in various formats such as InterMarc for the main catalogue, XML-EAD for archives inventories and Dublin Core for the digital library.
Such data is automatically gathered, modelled and enriched and are published in the RDF semantic web language. The result is available on the website, in different RDF syntaxes: RDF-XML, RDF-N3, and RDF-NT.
Part of the data is matched with external value vocabularies: id.loc.gov for languages and nationalities, dewey.info for subjects,
DCMI type for document types.
They are also matched with data sets that are identified by
CKAN: dbpedia and VIAF.
and HTTP: full rdf dump (rdf/xml)
The licence to use our data is available here.
CubicWeb It is a open source platform for semantic web applications under LGPL licence.
Data.bnf.fr is carried out in the context of the recent evolutions of bibliographic description, by experimenting and adapting the FRBR (Functional requirements for Bibliographic Records)model, elaborated by the IFLA (International Federation for Library Associations).
The model has three entity groups which are linked together by relationships: information about documents, persons and organisations, and subjects.
The first group of the FRBR model describes the different aspects of an intellectual or art creation, and discerns 4 levels: work, expression, manifestation and item.
The work level is about the intellectual and artistic creation. For instance: Le colonel Chabert by Honoré de Balzac. “Work” pages are created using the related authority records from the BnF Main Catalogue.
The expression level (different versions of this work such as a translation, an adaptation or an abridgment) does not appear in the html pages but can be seen in the corresponding RDF pages.
The manifestation level is the physical embodiment of a work. For instance an edition of Les Misérables like “Nouvelle impression illustrée. 1879-1882. Paris. E. Hugues”. The manifestations are listed in the documentary unit and gathered in the section entitled “Vie et éditions de l’œuvre” (Life and editions of the work). This level corresponds to the bibliographic record in the BnF Main catalogue, or to a manuscript that is identified by a label in the Archives and Manuscript Catalogue (BnF archives et manuscrits).
There can be a part-whole relationship between: A person or an organisation can be either the “author” of a work (then there is a link between the “author” page and the related “work” page) or “contributor” of an expression (translator, preface writer, librettist…).
Nevertheless, as the expression level is not different from the manifestation level in the html pages of data.bnf.fr, contributors do only appear at the manifestation level. The different creation or contribution roles are listed in a BnF repository, in the Intermarc format, and in the Library of Congress repository, in Marc. This kind of data enrich the RDF of the pages.
Link to the Intermarc code list for relators and creators (BnF).
Link to the Marc code list for relators of the Library of Congress.
Among retrievable data, there are subjects records from the Bibliothèque nationale de France (RAMEAU, which is the French indexation language). They have been converted into the RDF language SKOS (Simple Kowledge Organisation), in the context of the European project TELplus. This repository is now updated on data.bnf.fr with the whole current database from the Bibliothèque nationale de France.
In order to get dereferenceable URIs in our website, URIs from the initial project such as http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb12650268p, have to be converted to simple and uniform URIs with:
the root: http://data.bnf.fr and the ARK identifier of the authority subject record.
For instance:
The URI http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb12650268p, the subject “ornithologie”, will be replaced by: http://data.bnf.fr/ark:/12148/cb12650268p.
Manifestations which have a RAMEAU term as a subject are brought together in the appropriate “subject” page.
Moreover the site holds pages that gather works and manifestations about a work or an author.
These pages are not indexed by search engines and are available from the “work” or “author” pages.
For instance: on the page “Napoleon”, there is a link towards a page presenting documents about Napoleon such as
Vie de Napoléon Buonaparte, 1827.
In “work” and “author” pages, all manifestations by a single author are gathered around his works, thanks to the explicit link to title authority record (Titre Conventionnel or TIC, in French) , inside the original bibliographic record.
In the meantime some manifestations are not linked to the title authority record and remain “orphan”. In order to improve the way our data is translated in FRBR and to bring a better service to the public, it is important to align these orphan manifestations, which means bringing them together around the corresponding work.
Example:That is why we have already produced simple alignments in data.bnf.fr . When a manifestation is explicitly linked to an author authority record in the bibliographic record, and when the character string of this manifestation is exactly the same as the work’s title, then the manifestation is aligned with the work.
Yet, after this simple alignment, many manifestations remain orphan. In the long term two solutions are possible:
The data model is presented here :
Example 1 : Victor Hugo, author of Les Contemplations.
The full data model is available here.
“Author”, “work” and “subject” pages are open on the Web and can be reached by search engines.
This is why, except from the traditional methods used for indexing the homepage, we have chosen to embed two kinds of data to structure these pages:
The following elements are used:
itemtype=http://schema.org/Person
itemprop="description" itemprop="birthdate" itemprop="deathdate" itemprop="nationality" itemprop="memberOf"
itemtype=http://schema.org/Book
itemprop="description" itemprop="inLanguage" itemprop="datePublished" itemprop="genre"
itemtype= http://schema.org/Organization
itemprop="description" itemprop="image" itemprop="name" itemprop="url" itemprop="members" itemprop="founding date" itemprop="founders"
And for sub groups of the organizations:
itemscope itemtype= http://schema.org/PerformingGroup itemscope itemtype= http://schema.org/DanceGroup itemscope itemtype= http://schema.org/TheaterGroup itemscope itemtype= http://schema.org/MusicGroup
It is a very simple vocabulary to encode in RDFa metadata to be retrieved when the user adds the resource to its Facebook profile. The following metadata is embedded in the HTML header, thanks to META markups:
og: title (title of the page)
og: description (description of the page content)
og: type (type of resource)
og: url (page URL)
og: image (URL of the image that illustrates page)
og: author (name of the author in the “work” page)
Nevertheless some properties and classes have to be expressed by an ontology specific to the BnF: bnf-onto. To publish the ontology, the BnF has chosen the harmonized namespace http://data.bnf.fr/ontology/.
The ontology "bnf-onto" can be seen at this adress : http://data.bnf.fr/ontology/bnf-onto-en/".
List of properties:
BnF specfific vocabularies are displayed at this address : http://data.bnf.fr/vocabulary-en.
List of vocabularies:
Person | RDF | Intermarc field for authors |
name | skos:prefLabel @in_lang | 100 400 |
other name | skos:altLabel foaf:familyName foaf:givenName dc:date | |
nationality | foaf:nationality | 008 position 12-13 |
language | RDAgroup2elements: languageOfThePerson | 008 position 14 16 |
gender | foaf:gender | 008 position 17 |
date of birth | RDAgroup2elements:dateOfBirth | 008 position 27-36 |
date of death | RDAgroup2elements:dateOfDeath | 008 position 37-46 |
place of birth | RDAgroup2elements:placeOfBirth | 603 $a |
place of death | RDAgroup2elements:placeOfDeath | 603 $b |
beginning of activity | RDAgroup2elements:periodOfActivityOfThePerson | 008 position 47-51 |
end of activité | RDAgroup2elements:periodOfActivityOfThePerson | 008 position 52-55 |
sources (note about the record's sources) | skos:editorialNote | 610 |
summary, note | RDAgroup2elements: biographicalInformation | 600 |
domains | RDAgroup2elements: fieldOfActivityOfThePerson | 624 |
link to the DBpedia resource | owl:sameAs | |
code for relators | marcrel:[from the Library of Congress, http://id.loc.gov] | |
image of the author from Gallica | foaf: depiction | |
Organisation | RDF | Intermarc field for organisations |
name | skos:prefLabel @in_lang | 100 400 |
nationality | foaf:nationality | 008 position 12-13 |
language | RDAgroup2elements: languageOfThePerson | 008 position 14-16 |
beginning | RDAgroup2Elements:dateAssociatedWithTheCorporateBody | 008 pos 27-36 |
end stop_date_info | RDAgroup2Elements:dateAssociatedWithTheCorporateBody | 008 pos 37-46 |
beginning of activity | dc:date | 008 pos 47-51 |
end of activity | RDAgroup2elements:periodOfActivityOfTheCorporateBody | 008 pos 52-55 |
website | foaf:homepage | 606 |
sources | skos:editorialNote | 610 |
summary/note | RDAgroup2elements:corporateHistory | 600 |
domain | RDAgroup2elements:fieldOfActivityOfTheCorporateBody | 624 |
link to the DBpedia resource | owl:sameAs | |
RAMEAU subjects headings | RDF | |
orginal title | skos: prefLabel | 16X 46X |
other title | skos: altLabel | 16X 46X |
source (thesaurus Rameau) | skos: inScheme | |
source (note about the record's note) | skos: editorialNote | 610-612 |
other note | skos: scopeNote | 600 |
broader concepts | skos: broader | 3XX, 5XX |
narrower concepts | skos: narrower | 3XX, 5XX |
related concepts | skos: related | 3XX, 5XX |
alignement with external datasets | skos: closematch | 620 |
alignement with external datasets | skos: exactmatch | |
Work | RDF | Intermarc field for titles |
title (main title) | dc:title skos:prefLabel, rdfs:label @in_lang | 145 415 |
other title | skos:altLabel @in_lang | |
langue | dc:language | 008 position 14-16 |
dates | dc: date | 008 position 27-26 |
source | skos:editorialNote | 610 |
summary/note | dc: description | 600 |
domain | dc:subject | 624 |
link to the authority record in the BnF catalog | owl: sameAs | |
Part of | dc:isPartOf | |
Relations | ||
main author | dc: creator | 100 101 110 110 |
relators | dc:contributor bnf_onto:coderole | 711/702/700/701/710/712 |
relator's code | dc:contributor bnf_onto:coderole | code libre 321 322 |
image for the digitised work in Gallica | foaf: depiction | |
Manifestation | RDF | Intermarc field (bibliographic record) |
manifestation of a work | rdarelationships:workManifested | |
title | dc: title | 245 |
has part | dc:hasPart | |
publishing date | dc:date | 260 |
publishing place | rdvocab:placeOfPublication | 250 |
publisher's name | rdvocab:publishersName | 260 |
physical description | dc:description | |
ISBN | bnf-onto:ISBN | 20 |
Type of document | dc:type | |
Language | dc: language | 41 |
adaptation for the youth | bnf-onto: ouvrageJeunesse | |
Expression | RDF | |
Relators | marcrel: [relator's role from the Library of Congress, http://id.loc.gov] | |
Relator's code | bnf-onto: coderole | sub-field $4 |
Contribution | bnf-onto: role | |
type of document | dc: type | |