About data.bnf.fr

Table of Contents

Objectives of the data.bnf.fr (Data) project

The main objectives of Data are as follows:

  • increase the visibility of BnF data on the Web,
  • promote the discoverability of the works and creators represented in BnF's collections,
  • federate BnF data, both within and beyond catalogs,
  • contribute to cooperation and metadata exchange by creating links between structured and trusted resources,
  • facilitate the reuse of metadata (under Open License) by interested parties.

Data project intends to improve the utility of the Bibliothèque nationale de France's data on the Web. Initially conceived to describe and identify documents held at the BnF, as well as the people or organizations who created them, these data are grouped together in pages of authors, works, themes, places, dates and periodicals, allowing users to easily identify relevant resources. The project is also part of BnF's commitment to expanding the Web of data and adopting Semantic Web standards.

Published in 2011, Data keeps developing and growing.

Open data

Data are available under the Licence ouverte de l'État, used in particular by data.gouv.fr. The reuse and reproduction of RDF data are free and free of charge for any use, including commercial. Attribution is required.

More information on license page.

Data project thus firmly places itself within the movement of open public data. Carried by civic actors and governments, the opening of public data aims to make accessible non-nominative data, not related to privacy or security, and collected or produced by public organizations. Integrated into French legislation through the transposition of 2003 European directive (Directive 2003/98/EC of the European Parliament and of the Council of November 17, 2003, on the re-use of public sector information) in Ordinance No. 2005-650 of June 6, 2005, relating to freedom of access to administrative documents and the re-use of public information, the opening of public data is part of a national policy.

Its main issues are democratic and economic: on the one hand, to make public action more transparent and efficient and to rationalize the creation of public data through dissemination and sharing of data, and on the other hand, to develop economic activity by making information available for reuse, commercially or not.

Its objectives are aligned with the missions of the Bibliothèque nationale de France, namely “to ensure access for the greatest number to the collections, subject to the secrets protected by law, under conditions in accordance with intellectual property legislation and compatible with the preservation of these collections”, and to allow “remote consultation using the most modern data transmission technologies” (art. R341-2 of the Code du Patrimoine).

The opening of data thus aims to share with citizens the benefits of the work of libraries on the identification and signaling of the collections they possess, including digital collections. It contributes to improving the circulation and reuse of BnF data by making them interoperable.

Overview of data published on the site

Data exposes high-quality structured data.

The HTML pages of Data are automatically generated from data and identifiers from several data repositories of the BnF: the general catalog, Gallica, BnF archives and manuscripts, virtual exhibitions, the database of 16th-century Parisian printers (bp16.bnf.fr), the database of digitized bindings of the Reserve (reliures.bnf.fr), and finally, the web addresses collected under the legal deposit of the web. Data also highlights bibliographies established by librarians.

The HTML pages are generated according to computer processes using semantic web technologies.

The legal deposit of documents published in France being mandatory, the collections available on authors and works are very complete and reflect the diversity of French cultural production. Several million public domain documents are digitized and freely accessible in Gallica.

In 2024, the volume of resources by source repository is as follows:

Sources URL Volume
BnF General Catalog https://catalogue.bnf.fr/ 18993322
BnF Archives and Manuscripts https://archivesetmanuscrits.bnf.fr/ 112068
Gallica https://gallica.bnf.fr/ 1460329
Virtual Exhibitions and Educational Resources https://essentiels.bnf.fr/ 5167
Sites collected by Dlweb 66387
16th-century Parisian printers https://bp16.bnf.fr/ 3024
Bindings https://reliures.bnf.fr/ 153
Bibliographies 369

How does Data work?

Data extracts, transforms, and aggregates in a common database data from distinct databases, produced in different formats in order to link them together and make them interoperable.

Its pages are indexed by search engines, and link to digital documents.

For that purpose, Data relies on several elements:

  • unique and persistent identifiers assigned to each notice: these are ARK identifiers at the BnF, assigned to notices of the General Catalog, digitized documents of Gallica, educational resources and virtual exhibitions, archives and manuscripts,
  • bibliographic description standards, such as IFLA LRM conceptual model and its RDF modeling for better exposure in linked data,
  • authority records describing people, organizations, works, and concepts,
  • data alignment and clustering techniques.

Authority records constitute the core of Data: information from different sources that are linked to the same authors, works, or concepts are thus aggregated on these pages.

Author pages gather all bibliographic records containing a link to the author's identifier.
Work pages gather all notices containing both a link to the author and a link to his works. In the absence of a link, a simple alignment mechanism by character comparison is implemented.
Concept pages aggregate information on a given concept (labels, at the BnF and in other institutions, according to several vocabularies) and the works about this concept.


Data also features:
Location pages from two kinds of records (Rameau on the one hand, Department of Maps and Plans on the other hand), progressively merged into unique pages providing, in particular, geographical coordinates.
Date pages including relations between works, organizations, authors, documents, etc., and this date.
Performance pages about source works, stage directors, performers...
Periodical pages providing information on the title and on related authors.

Algorithmic creation of works

Data experiment with a new way of structuring information, no longer centred on the document (describing the 2001 edition of Madame Bovary in a paperback format, or a read version of the same novel), but on the author's work, grouping together all its successive editions (Madame Bovary, by Flaubert, written in 1856).

But the work to which each document relates is rarely described in the catalogue (less than 8% of documents). If we wanted to carry out this work manually on the 15 million records in the catalogue, we would have to devote 45 years to it, at a rate of 2 minutes per document. However, a national process is underway, La Transition bibliographique, aimed at adopting this new way of doing things (adoption of the IFLA-LRM, Library Reference Model).

The BnF has therefore experimented with a semi-automatic process to generate a description of each work from information describing its successive editions. The first corpus processed concerns twentieth-century printed works.

For each author, the titles of their publications are extracted, to group them by similarity. And for each group a program calculates the information relating to the work from what it finds in the documents (alternative forms of title, translation titles, date of first publication, other authors).

The results of these calculations are then posted online on Data to assess the relevance of the process.

It is also subject to the critical eye of Internet users, who are invited to react to point out any problems.

As these problems may have several origins (original data, grouping criteria, etc.), BnF cannot undertake to correct them quickly. It can, however, include it in the corrections to be made, for the day when these same works are loaded after a few months in the general catalogue. You can also take part in this major project and help us to improve the reliability of the data by reporting any errors you notice in these automatically calculated works.

Links to external sites

Data links to internal or external resources. Several types of links can be found:

  • links to external repositories with which BnF data are aligned, such as the Library of Congress, the Deutsche Nationalbibliothek, VIAF (Virtual International Authority File), IdRef, Geonames, Agrovoc, and Thesaurus W.
  • data from Wikimedia ecosystem (Wikipedia, Wikidata) allowing for example recovering of portraits of the authors, if there are none in Gallica. Those are retrieved via DBpedia and Wikidata.

From bibliographic formats to the semantic web

Data project uses input in various formats, notably Intermarc for book catalogs, XML-EAD for archive inventories and manuscripts, and Dublin Core for the digital documents.

These data are restructured, clustered and enriched by automatic processes, and published according to the W3C recommendation for the semantic web, RDF. The result is available on this site, in several RDF syntaxes: RDF-XML, RDF-N3, and RDF-NT.
More information in Semantic Web and Data Model.

ARK identifiers

General remarks on BnF ARKs

BnF assigns identifiers in the 12148 ark domain (Bibliothèque nationale de France) according to the following principles.

  • No ARK identifier will be reassigned; that is, once a link between an ARK identifier and an object has been published, this link must be considered unique for an indefinite period.
  • ARK identifiers assigned by the BnF contain, as far as possible, no easily recognizable semantic information; this contributes to facilitating their use independently of a context of time or place.
  • ARK identifiers assigned by the BnF contain a check character that guarantees them against isolated character errors and transposition errors. A user who makes a typo when typing an ARK will receive an HTTP 400 response and a message informing that the provided ARK is incorrect.

Mutability of resources

The mutability of resources in data.bnf.fr and identified by ARKs is defined as follows.

  • Resources in Data identified by ARKs come from different catalogs and applications of the BnF (General Catalog, Gallica, BAM). Nevertheless, the descriptive metadata disseminated by Data differ from those of the source applications. The modifications can be of several orders:
    • metadata may have been enriched with external data (Wikimedia, VIAF, Library of Congress, etc.);
    • metadata may have been enriched thanks to inferences allowing to deduce information and links not in the source data;
    • some metadata initially present in the source notices of the BnF catalogs and applications may not be displayed in the data of data.bnf.fr.
  • The site displays data it did not produce. The source notices may be split, replaced, deleted, or de-published from the BnF applications and catalogs. In these different cases, it is necessary to refer to the ARK maintenance policies emanating from the different sites producing the notices. Ultimately, Data will implement redirection mechanisms so that resources are accessible.

Addressing authority

The addressing authority Data manages the following generic service qualifiers:

  • .description: the data in data.bnf.fr are divided into two groups:
    • The information carried by an ARK suffixed by #about concerns the entities themselves, the things of the real world;
    • The information carried by an unsuffixed ARK concerns the notices, the descriptions of the entities
  • .policy: permanent resource policy. Permanent resource policy available on Data site is to be consulted on the sites producing source data, notably the General Catalog, Gallica, BAM, etc.

Availability

Data services are accessible 24/7. It should be mentioned that temporary unavailability may be linked to internal service issues and are not always predictable.

For more information, on the BnF site: The ARK identifier (Archival Resource Key).

Main milestones of the project

The main evolutions of the project are summarized on this page.

Data and Gallica have received the Stanford Prize for Innovation in Research Libraries (SPIRL).

In 2022, Data is a winner of the 1st call for projects Discoverability of French-language cultural content online from the Ministry of Culture as part of the France Relance plan.

References

GRIMALDI Elisa, 2024. « The evolution of Data.bnf.fr: past, present and future of the BnF linked open data project », JLIS.it 15, 2, p. 119-133. Doi: 10.36253/jlis.it-588.

FOUCHER Tiphaine. « Le web de données en pratique : data.bnf.fr », Vidéo coproduite par la BnF et le Cnfpt.

LEVOIN Xavier, 2021. « Data.bnf.fr : améliorer la découvrabilité des contenus culturels sur le web », Archimag, n°341, p. 28-29. Doi : 10.3917/arma.341.0028.

LAPÔTRE Raphaëlle, 2018. « Data.bnf.fr as a sandbox for FRBRization: automated work creation in data.bnf.fr », SWIB18 : https://youtu.be/-cabjegojNw.

LAPÔTRE Raphaëlle, 2017. « Library Metadata on the Web: the Example of data.bnf.fr », JLIS.it 8, 3, p. 58-70. Doi: 10.4403/jlis.it-12402.

BERMES Emmanuelle, 2016. Vers de nouveaux catalogues. Paris : Cercle de la librairie.

BERMES Emmanuelle, BOULET Vincent, LECLAIRE Céline, 2016. « Améliorer l’accès aux données des bibliothèques sur le web : l’exemple de data.bnf.fr.IFLA World Library and Information Congress » : http://library.ifla.org/1447/1/081-bermes-fr.pdf.

BERMES Emmanuelle, 2014. « Les bibliothèques sur le Web.Les catalogues au défi du Web (session 2) » : http://video.cnfpt.fr/conferences-1/les-catalogues-au-defi-du-web-les-bibliotheques-sur-le-web.

BERMES Emmanuelle, avec la collaboration d’Antoine Isaac et Gautier Poupeau, 2013. Le Web sémantique en bibliothèque. Paris : Cercle de la librairie.