About data.bnf.fr

Summary

Aims of data.bnf.fr

The main goals of data.bnf.fr are:

  • increasing the visibility of BnF data, through better exposure on the Web,
  • and
  • federating BnF's data, within and beyond the catalogs,
  • contributing to the cooperation and exchange of metadata by creating links between structured and trusted resources,
  • facilitating the reuse of metadata (under Open License) by others.

The data.bnf.fr project aims to make the National Library of France's data more useful on the Web. This data is of various orders; in particular, it makes it possible to describe and identify the documents curated at the BnF, as well as the people or organizations that created them. The site makes it possible to gather around its pages of authors, works, themes, places, dates and periodicals resources of the Bibliothèque nationale de France, as well as external resources. These pages connect the various contents, links and services that the institution provides on the web, which for technical reasons are scattered in the several applications of the BnF. The project is also part of a process of opening up BnF to the Web of data and adopting the standards of the Semantic Web.

Launched in July 2011, data.bnf.fr continues to evolve and grow.

Open data

Data on data.bnf.fr are available under the French Open License, notably used by data.gouv.fr. Reuse and reproduction of RDF data is free and open to any uses, including commercial ones. An attribution statement is required.
More on this: Terms of use for data.bnf.fr

Data.bnf.fr is thus strongly positioned in the open data initiative. Driven by civic actors and governments, open public data aims to make available non-nominative data, which is neither privacy nor security related and collected or produced by public organizations. Incorporated into French legislation through the transposition of the 2003 European Directive (Directive 2003/98/EC of the European Parliament and of the Council of 17 November 2003 on the re-use of public sector information) in Ordinance n°2005-650 of June 6, 2005 on freedom of access to administrative documents and the re-use of public information, the opening of public data is part of a national policy.

Their main concerns are democratic and economic, namely on the one hand to make public action more transparent and efficient and to rationalize the creation of public data, by sharing data, and on the other hand to develop economic activity by making re-usable information available, whether commercially or not.

These purposes are congruent with the missions of the Bibliothèque nationale de France, namely "to provide access to the collections for most people, while respecting the secrets protected by law, in accordance with the legislation on intellectual property and in a way that is compatible with the preservation of these collections," and to allow "remote consultation using the most modern technologies for data delivery" (art. R341-2 of the Heritage Code).

Thus, it is a matter of sharing with citizens the benefits of libraries' work on identifying and reporting the collections they hold, including digital collections. It is a path to improve the circulation and reuse of BnF data, making it interoperable to give it a larger audience on the Web.

.

Overview of data provided

Data.bnf.fr displays high-quality structured data.

HTML pages of data.bnf.fr are autogenerated from data and identifiers found in the several databases of the BnF: BnF general catalog, BnF archives and manuscripts, Gallica. HTML pages are generated according to computer workflows using semantic web technologies.

As legal deposit of documents published in France is mandatory, the collections available on authors and works are very comprehensive and reflect the diversity of French cultural production. Several million documents, free of rights, are digitized and freely accessible in Gallica.

The authority records are the root of the site's pages: the "person and organization authorities" for author pages, the "title authorities" for work pages, and the "RAMEAU authorities" (the subject indexing language used at BnF) for subject pages.

In 2021, data.bnf.fr has almost complete coverage of good quality catalog records, including over 2 million authors.

How data.bnf.fr works

Data.bnf.fr extracts, transforms, and aggregates data from separate databases produced in different formats into a common database in order to link them together and make them interoperable.

His pages are indexed by search engines, whereas they do not reference the data and metadata available in BnF's non-indexable databases, and point to digitized documents.

To achieve this, data.bnf.fr relies on several components:

  • unique and persistent identifiers assigned to each entry: these are ARK identifiers at BnF, assigned to the entries in the General Catalog and to digitized documents in Gallica,
  • bibliographic description standards, such as the IFLA-LRM model and its modeling in RDF for their expression in the web of data,
  • authority records describe people, organizations, works, and topics,
  • alignment and data federation techniques.

How data.bnf.fr works

Data.bnf.fr extracts, transforms, and aggregates data from separate databases produced in different formats into a common database in order to link them together and make them interoperable.

His pages are indexed by search engines, whereas they do not reference the data and metadata available in BnF's non-indexable databases, and point to digitized documents.

To achieve this, data.bnf.fr relies on several components:

  • unique and persistent identifiers assigned to each entry: these are ARK identifiers at BnF, assigned to the entries in the General Catalog and to digitized documents in Gallica,
  • bibliographic description standards, such as the IFLA-LRM model and its modeling in RDF for their expression in the web of data,
  • authority records describe people, organizations, works, and topics,
  • alignment and data federation techniques.

Authority records are the core of data structure: information from different sources that are related to the same authors, works, or themes are thus aggregated in these pages.

The Author pages gather all bibliographic records containing a link to the author's identifier.
Work pages collect all records containing both a link to the author's identifier and a link to the work ID. Without a link, a simple string comparison alignment mechanism is activated.
The Theme pages aggregate information about a given theme (the different ways of naming it, preferred and rejected forms, at BnF and at other institutions, according to several vocabularies) and the works about this theme.


Also found in data.bnf.fr:
Place pages built from two distinct types of records (Rameau on the one hand, Department of Maps and Plans on the other), gradually merged as single pages providing, in particular, geographical coordinates.
Date pages that display relationships between works, organizations, authors, documents, etc. and that date.
Performance pages that collect related bibliographic records.
Serial pages, also constructed from periodical bibliographic records, provide brief information about the title, and where applicable, related authors.

Algorithmic creation of works

Data.bnf.fr enables experimenting a new way of structuring information, no longer centered on the document but on the author's work. However, the work each document relates to is rarely described in the catalog (less than 8% of the documents). If we wanted to do this manually on the 12 million records of the catalog, we would have to spend 45 years, at a rate of 2 minutes per document. Yet a national process is currently underway, the Bibliographic Transition, to put this new pattern in practice (by adopting the IFLA-LRM, Library Reference Model).

The BnF is therefore experimenting a semi-automatic process to generate the description of each work based on information describing its successive releases. The first corpus processed concerns 20th century printed works.

For each author, the titles of his publications are retrieved to cluster them by similarity. And for each group, a program computes the work-related information according to what is found in the documents (alternative title forms, translation titles, date of first publication, other authors). The results of these computations are then put online at data.bnf.fr to evaluate the relevance of the process.

.

It is also exposed to the scrutiny of users, who are invited to give feedbacks.

As the issues may have several origins (source data, clustering criteria, etc.), the BnF cannot guarantee to correct them quickly. It can, however, include it in the corrections to be made, for the time when these same works are versed in the general catalog.

You can also participate in this major project and help us improve the reliability of the data by reporting the errors you identify in these autogenerated works: data[at]bnf.fr.

External links in data.bnf.fr

Data.bnf.fr fits into the Web by providing links that redirect the user to resources inside or outside the BnF.

There are several types of links:

  • links to other external repositories to which BnF data are aligned, such as the Library of Congress, Deutsche Nationalbibliothek, VIAF (Virtual International Authority File), IdRef, Geonames, Agrovoc, and Thesaurus W.
  • links to search forms in which the author, topic, or work search was automatically filled in: BnF general catalog, CCFr, BnF archives and manuscripts, CNLJ-La Joie par les livres, Europeana, SUDOC (Système universitaire de documentation), Worldcat, Wikipedia.
  • Wikipedia data: they enable thumbnails illustrating authors to be displayed, if they do not exist in Gallica. These data are retrieved via DBpedia and Wikidata.

From bibliographic records to semantic web

>Data.bnf.fr uses data produced in a various formats, including Intermarc for book catalogs, XML-EAD for archive inventories and manuscripts, and Dublin Core for the digital library.

These data are restructured, aggregated, enriched by automatic processing, and published according to the W3C recommendation for the Semantic Web, RDF. The result is available on this site, in several RDF syntaxes: RDF-XML, RDF-N3, and RDF-NT.
More about this: Semantic Web and Data Model

ARK identifiers

General considerations on BnF ARKs

BnF assigns ID in the ARK 12148 (Bibliothèque nationale de France) domain according to the following principles.

  • No ARK IDs will be reallocated; that is, once a link between an ARK identifier and an object has been published, that link should be considered unique, and for an unlimited period.
  • ARK IDs assigned by BnF do not contain any easily recognizable semantic information, whenever possible; this contributes to their ease of use regardless of time or place context.
  • ARK IDs assigned by BnF contain a control character that guarantees them against isolated character errors and transposition errors. A user who has made a typo while typing an ARK will get an HTTP 400 answer and a message informing them that the ARK provided is incorrect.

Resource mutability

The mutability of resources present in data.bnf.fr and identified by ARKs is defined as follows.

  • Data in data.bnf.fr and identified by ARKs come from different BnF catalogs and applications (General Catalog, Gallica, BAM). Nevertheless, the descriptive metadata disseminated by data.bnf.fr present a difference compared to data from the source applications. The changes can be of several orders:
    • Metadata may have been augmented with external data (Wikimedia, VIAF, Library of Congress, etc.);
    • Metadata may have been augmented through inferences that allow for the deduction of information and links not present in the source data;
    • Some metadata initially present in the source records of BnF catalogs and applications may not be displayed in data.bnf.fr.
  • Data.bnf.fr highlights data not produced by the application. Source records can be splitted, overwritten, deleted or de-published from BnF applications and catalogs. In these different cases, it is necessary to refer to the ARK policies issued by the different record producing sites. In due time, data.bnf.fr will implement redirection mechanisms to ensure that resources are accessible.

Addressing authority

Data.bnf.fr addressing authority manages the following generic service qualifiers:

  • "description": data in data.bnf.fr are divided into two groups:
    • The information carried by an ARK suffixed with #about is about the entities themselves, the things in the real world;
    • Information carried by an ARK not suffixed is about records, descriptions of entities;
  • "policy": resource permanence policy. The permanence policy for resources made available on data.bnf.fr is to be found on the relevant data-producing sites, including the General Catalogue, Gallica, BAM, etc.

Availability

The services (except sparql) and data of data.bnf.fr are accessible 24 hours a day 7 days a week. It should be mentioned that temporary unavailability may be related to internal service issues and are not always foreseeable.

For more information, on the BnF website: The ARK (Archival Resource Key) identifier.

Main project milestones

The major project developments are summarized on this page.

Data.bnf.fr and Gallica were awarded the Stanford Prize for Innovation in Research Libraries (SPIRL).

References

FOUCHER Tiphaine, « Le web de données en pratique : data.bnf.fr », Vidéo coproduite par la BnF et le Cnfpt.

LEVOIN Xavier, 2021. Data.bnf.fr : améliorer la découvrabilité des contenus culturels sur le web, Archimag, n°341, p. 28-29.

LAPÔTRE Raphaëlle, 2018. Data.bnf.fr as a sandbox for FRBRization: automated work creation in data.bnf.fr, SWIB18 : https://youtu.be/-cabjegojNw.

LAPÔTRE Raphaëlle, 2017. Library Metadata on the Web: the Example of data.bnf.fr, JLIS.it 8, 3, p. 58-70. Doi: 10.4403/jlis.it-12402.

BERMES Emmanuelle, 2016. Vers de nouveaux catalogues. Paris : Cercle de la librairie.

BERMES Emmanuelle, BOULET Vincent, LECLAIRE Céline, 2016. Améliorer l’accès aux données des bibliothèques sur le web : l’exemple de data.bnf.fr.IFLA World Library and Information Congress : http://library.ifla.org/1447/1/081-bermes-fr.pdf.

BERMES Emmanuelle, 2014. Les bibliothèques sur le Web.Les catalogues au défi du Web (session 2) : http://video.cnfpt.fr/conferences-1/les-catalogues-au-defi-du-web-les-bibliotheques-sur-le-web.

BERMES Emmanuelle, avec la collaboration d’Antoine Isaac et Gautier Poupeau, 2013. Le Web sémantique en bibliothèque. Paris : Cercle de la librairie.