About data.bnf.fr

Latest release

This website, available online since July 2011, is still developed and regularly updated.

You are currently visiting the Version [1.31.1] of data.bnf.fr, uploaded on September 20th, 2016.

How to retrieve data.bnf.fr data


Summary

To contact the team: data@bnf.fr


A presentation of the project

The main objectives of data.bnf.fr are to:

  • make the data produced by the BnF more visible on the Web,
  • federate the data produced by the BnF, both within and outside the catalogues,
  • contribute to collaboration and metadata exchange by creating links between structured and trustable resources,
  • facilitate reuse of metadata (under Open License) by third parties.

The data.bnf.fr project endeavours to make the data produced by Bibliothèque nationale de France (French National Library) more useful on the Web. These data are various, they particularly make it possible to describe and identify the preserved documents by BnF, and the persons or organisations who have created them. It gathers BnF resources and external resources on pages devoted to an author, a work, a subject, a year or a place. These pages organize the Web contents, links and services provided by BnF, which are scattered in several applications for technical reasons.
Available online since July 2011, data.bnf.fr is still evolving and expanding.

With data.bnf.fr, you can:

  • reach BnF resources directly from a Web page, without any previous knowledge of the services provided by the library;
  • get oriented in the BnF resources and possibly find external resources.

The objective is to put forward the BnF's collections and to provide a hub between different resources. Data.bnf.fr is meant to support the BnF's other applications. The project belongs to the BnF's policy of becoming part of the Web of data and adopting Semantic Web standards.

Data.bnf.fr and Gallica have won the Stanford Prize for Innovation in Research Libraries (SPIRL).

Data.bnf.fr in the Open Data movement

Raw data from data.bnf.fr is available under the French Open licence, used by data.gouv.fr. This licence is a kind of CC-by adapted for the French copyright legislation. Data in RDF can be freely reused and copied, for a profit or non-profit use. It is compulsory to quote the source.
Find out more about the Conditions of re-using BnF metadata

The data.bnf.fr project is definitely part of the Open Data movement. Supported by civic and governmental actors, the Open Data is a global movement that aims at making available non-nominative data, not related to privacy or security and collected by public organisations in our connected societies. Open Data is now a national policy, as the 2003 European Directive on re-use of public sector information (Directive 2003/98/EC) has been incorporated in the French legislation, in the Ordinance number 2005-650 of 6 June 2005 for freedom of access and use to public documents and of public information.

The main purposes are democratic and economical, in other words, to makt public action more transparent and efficient; rationalise the creation of public data, by broadcasting and gathering data, and on the other hand, foster economical activity by providing reusable and useful information for a commercial or non commercial use.

This is in line with the missions of the National Library of France: "to enable as many people as possible to have access to the collections" ("assurer l'accès du plus grand nombre aux collections, sous réserve des secrets protégés par la loi, dans des conditions conformes à la législation sur la propriété intellectuelle et compatibles avec la conservation de ces collections"), and to enable people to "remote access thanks to state of the art technologies of data transmission" ("permettre la consultation à distance en utilisant les technologies les plus modernes de transmission des données" (Decree of the 3rd January 1994 forming the BnF).

The purpose is therefore to share with citizens the results of libraries' efforts to identify and describe the collections they hold, including digital items. Thus we can optimize dissemination and reuse of data produced by the BnF, by pushing them out of our internal silos and giving them an enhanced audience and visibility on the Web. Potential usages are various and innovative. Other libraries can now not only retrieve data from the BnF but also create links to it. Moreover, data is bound to get out of the library world in order to be broadly widespread.

Roadmap

The developments planned for 2019 follow these axes:

  • improving the diffusion of data, by enhancing data freshness and completeness with more regular updates, by publishing the whole references to digital documents of Gallica and by integrating new resources like informations from the Web legal deposit.
  • aligning BnF referentials with other referentials from Web trusted operators and reinforcing data.bnf.fr role as a hub for French cultural data online.
  • supporting open data reuses and testing out innovative visualizations, to propose new ways of exploring, analizing and displaying data and collections, as the pages "Atelier" (the French for "Workshop" already do.
  • extending data processing, particularly for the creation of links between works and documents, with the implementation of FRBR model in our catalogs in mind. It is question to calculate additional links from bibliographic records to authority records for textual or musical works, and in the longer term, to create authority records for works, even if they don't exist yet, by clustering the editions (manifestations) of a work.

Find out more about the project : Presentation of data.bnf.fr for IFLA meeting 2016

Content of the data

Data.bnf.fr displays structured data.

HTML pages are automatically created with data and identifiers that are in the different databases of the library: BnF main catalogue, BnF archives et manuscripts, and Gallica. The HTML pages are created using Semantic Web technologies.

As the Legal deposit is compulsory for French publications, collections that are available about authors and works are comprehensive and various. Some millions copyright-free documents are digitized and available online from Gallica.

The pages are based on our authority records: authority records for persons and corporate bodies provide the matter for the "author" pages, authority records for works provide the matter for the "work" pages, and the RAMEAU subject authority records (the indexing language used at the BnF) provide the matter for the "subject" pages.

In June 2016, data.bnf.fr consists of around 2 000 000 authors, linked to over 8 millions documents from BnF catalogue général, and BnF archives et manuscripts, that is all of the data of good quality from BnF catalogues.

External links in data.bnf.fr

Data.bnf.fr is part of the Web and provides external links to Web sites, either maintained by the BnF or completely independent.

There are several kinds of links:

  • links to other external repositories, to which data produced by the BnF is aligned, such as the Library of Congress, the Deutsche Nationalbibliothek, VIAF (Virtual International Authority File), IdRef, Geonames, Agrovoc, and Thesaurus W (the French National Archives’ thesaurus).
  • links to search forms in which query terms (author name, subject, work title) are automatically pre-typed: BnF catalogue général, CCFr, BnF archives et manuscrits, CNLJ-La Joie par les livres, Europeana, SUDOC (Système universitaire de documentation), Worldcat, Wikipedia.
  • Wikipedia provides thumbnails for authors, whenever no one could be found on Gallica, and a short biography. This data is retrieved through Dbpedia.
External links in data.bnf.fr

How does it work?

The data belongs to separate databases. It is produced and stored in different formats. Data.bnf.fr extracts, transforms and gathers datasets in a unique database and makes them interoperable.

These pages are indexed by search engines, whereas data and metadata that is hidden in the BnF's unindexable databases cannot be retrieved. The data.bnf.fr pages describe resources from the BnF that are often concealed in the "deep Web" and give access to digital documents from Gallica.

We use the following tools:

  • unique and permanent identifiers assigned to every record: the BnF uses ARK identifiers for records from the Catalogue general and digital documents from Gallica,
  • bibliographical description standards, such as FRBR model and its modelisation in RDF for exposing it in the Web of data
  • authority records for persons, corporate bodies, works and subjects,
  • data matching and federation techniques.

We rely on authority records, which form the basis for all author, work, and subject pages, in order to gather and organise the different data silos. The different resources are collocated through the authority record’s identifier.

Author pages collocate all bibliographic records that are linked to the author’s identifier.
Work pages collocate all records that are linked to both the author's and the work's identifier. When there is no link to the work authority record, there is a simple matching mechanism based on string recognition techniques ("words beginning with").
Subject pages collocate all records that have a link to the same subject.

Exposing our data in RDF (Resource Description Framework)

In the long term, useful, reliable and controlled data will be displayed and integrated in the growing world of the Web of data, by abiding to the semantic Web standards. This must be done in conformance with international initiatives to facilitate the use of informational or administrative public data.

Linked Open Data fosters data exchange between library and other communities, and brings solutions for formats interoperability. The Deutsche Nationalbibliothek, the British Library, and the Library of Congress have also adopted these tools in order to open their bibliographic data.

Semantic web is based on RDF (Ressource Description Framework), which is a W3C recommendation. This defines a graph model to describe resources and their metadata and to allow automatic data processing.

The reusable data that we display include subject authority records from the RAMEAU repository, which is used to index bibliographic records at the BnF. They have been converted to the RDF SKOS language (Simple Knowledge Organization System), within the framework of the European project TELplus. This repository is now regularly updated on data.bnf.fr with inputs from the whole database maintained by the BnF.

Bibliography

BERMES Emmanuelle, Vers de nouveaux catalogues, Paris : Cercle de la librairie, 2016.

BERMES Emmanuelle, « Les bibliothèques sur le Web », dans Les catalogues au défi du Web (session 2), 26 novembre 2014. Available online : http://video.cnfpt.fr/conferences-1/les-catalogues-au-defi-du-web-les-bibliotheques-sur-le-web [consulted on Februar 28th, 2017].

BERMES Emmanuelle, avec la collaboration d’Antoine Isaac et Gautier Poupeau, Le Web sémantique en bibliothèque, Paris : Cercle de la librairie, 2013.

BERMES Emmnuelle, BOULET Vincent, LECLAIRE Céline, « Améliorer l’accès aux données des bibliothèques sur le web : l’exemple de data.bnf.fr », dans IFLA World Library and Information Congress, 2016. Available online: http://library.ifla.org/1447/1/081-bermes-fr.pdf [consulted on Februar 28th, 2017].

SIMON Agnès, « Illustrations et démonstrations », dans Les catalogues au défi du Web (session 2), 26 novembre 2014. Available online : http://video.cnfpt.fr/conferences-1/les-catalogues-au-defi-du-web-illustrations-et-demonstrations-agnes-simon [consulted on Februar 28th, 2017].