This site, which has been available online since July 2011, is continuously being developed and undergoes regular updates.
The version currently displayed is Version [1.3] of data.bnf.fr, posted online on 2013/06/26.
How to retrieve data from data.bnf.fr: download the data.bnf.fr dump file as of 2013/06/26.
To contact the team: firstname.lastname@example.org
The data.bnf.fr project endeavours to make the data produced by Bibliothèque nationale de France (French National Library) more useful on the Web. It gathers various BnF resources and external resources on pages devoted to an author, a work, or a subject. These pages organize the Web contents, links, and services provided by BnF. Available online since July 2011, data.bnf.fr is still evolving and expanding.
With data.bnf.fr, you can:
The objective is to put forward the BnF's collections and to provide a hub between different resources. Data.bnf.fr is meant to support the BnF's other applications. The project belongs to the BnF's policy of becoming part of the Web of data and adopting Semantic Web standards.
For more information:
The main objectives are to:
The evolutions expected during 2013 will go along the following lines:
1. In existing catalogues:
Improving the pages devoted to specific types of resources:
Incorporating new types of records:
2. Beyond catalogues:
Incorporating metadata from various databases maintained by the BnF:
Creating links to digital libraries that have a partnership with the BnF:
New pages will be created. They will be labelled "Atelier" (the French for "Workshop"), and will allow one to experiment with the BnF's data, e.g. by displaying it as timelines, geographic maps, or picture galleries. It will be possible to test functionalities that are being developed, to mine the BnF's data, and to draw inspiration for reusing it or crossing it with other data sets.
The technical infrastructure of the site will be made more robust and performing.
Improvement of the Web pages:
Optimization of the RDF views of the raw data:
Organization of the pages so as to enable:
The outcomes of the alignment and clustering algorithms used in data.bnf.fr will be exploited in existing applications:
Data.bnf.fr consists of a portion of the content of the BnF's catalogues. Information about further authors and works and associated documents is gradually incorporated.
Such information was created by the National Library of France or institutions that have a partnership with it, or is derived from external sites.
The three main resources that are gathered in data.bnf.fr are the main catalogue (BnF catalogue général ) for publications, the archive and manuscript catalogue (BnF archives et manuscrits), and Gallica (the digital library). The BnF holds over 30 million documents that have been acquired for centuries. As the Legal deposit is compulsory for French publications, collections that are available about authors and works are comprehensive and various. Over one million copyright-free documents are digitized and available online from Gallica.
In June 2013, data.bnf.fr consists of more than 600,000 pages, pointing to 5,600,000 bibliographical records stored in BnF catalogue général, several thousands of descriptions stored in BnF archives et manuscrits, which represents 40% of the BnF catalogues.
The initial corpus of data.bnf.fr focuses on the most prominent authors and works described in the BnF's databases, but also includes authors and works about which the BnF provides rare and relevant information–indeed, there are cases where the BnF is the only source of information available on the Web–: e.g., scarcely known classical authors, lawyers, or composers. We noticed that these resources are actually the most consulted pages of the site, which means that they meet the public's various, specific needs. The corpus was progressively extended to authors and works pertaining to other domains, and to the entire set of subjects headings used at the BnF.
In the future, the scope of data.bnf.fr will be extended further, e.g. to authors that are related in some way to those are already represented on data.bnf.fr, to recent publications, to a broader variety of digitized items, and to historically significant holdings of any kind.
The following types of pages have been progressively included since 2011:
Further additions will include, on the mean term:
On the long term, the objective is to cover almost the totality of BnF's information sources:
Data.bnf.fr displays structured data. They collocate:
HTML pages are automatically created with data and identifiers that are in the different databases of the library: BnF catalogue general, BnF archives et manuscripts, and Gallica. The HTML pages are created using "Semantic Web" technologies.
The pages are based on our authority records: authority records for persons and corporate bodies provide the matter for the "author" pages, authority records for works provide the matter for the "work" pages, and the RAMEAU subject authority records (the indexing language used at the BnF) provide the matter for the "subject" pages.
To end with, these pages are indexed by search engines, whereas data and metadata that is hidden in the BnF's unindexable databases cannot be retrieved. The data.bnf.fr pages describe resources from the BnF that are often concealed in the "deep Web" and give access to digital documents from Gallica.
The data model used in data.bnf.fr makes it possible to federate data extracted from internal applications, but also to include links to external sources. Resources produced by the BnF (authority and catalogue records, finding aids for archives and manuscripts, digital documents) are assigned permanent identifiers – ARK identifiers– that enable the creation of persistent links.
The first step was to develop bibliographical frameworks that are being tested at an international level, especially the "FRBR" model.
This step was followed by modelling efforts aiming at displaying the data in RDF (Resource Description Framework) on the Web of data. In the BnF's view, the implementation of these technical standards must ensure interoperability between external and internal databases, through machine readable and structured data.
For further information:
In the long term, useful, reliable and controlled data will be displayed and integrated in the growing world of the Web of data, by abiding to the semantic Web standards. This must be done in conformance with international initiatives to facilitate the use of informational or administrative public data.
Being on the Web of data implies the use of specific technical solutions in order to create links: dereferenceable and permanent URIs (Uniform Resource Identifiers), a content negotiation mechanism, and an access to raw data.
Linked Open Data fosters data exchange between library and other communities, and brings solutions for formats interoperability. The Deutsche Nationalbibliothek, the British Library, and the Library of Congress have also adopted these tools in order to open their bibliographic data.
The reusable data that we display include subject authority records from the RAMEAU repository, which is used to index bibliographic records at the BnF. They have been converted to the RDF SKOS language (Simple Knowledge Organization System), within the framework of the European project TELplus. This repository is now regularly updated on data.bnf.fr with inputs from the whole database maintained by the BnF.
For further information:
Data.bnf.fr is part of the Web and provides external links to Web sites, either maintained by the BnF or completely independent.
There are several kinds of links:
The data belongs to separate databases. It is produced and stored in different formats. Data.bnf.fr extracts, transforms and gathers datasets in a unique database and makes them interoperable.
We use the following tools:
We rely on authority records, which form the basis for all author, work, and subject pages, in order to gather and organise the different data silos. The different resources are collocated through the authority record’s identifier.
Author pages collocate all bibliographic records that are linked to the author’s identifier.
Work pages collocate all records that are linked to both the author's and the work's identifier. When there is no link to the work authority record, there is a simple matching mechanism based on string recognition techniques ("words beginning with"). "Subject" pages collocate all records that have a link to the same subject.
We use the free software CubicWeb
CubicWeb is an open source platform to develop Web semantic applications and is published under LGPL licence.
Within the project, this software is used to:
In 2013, CubicWeb won the Dataconnexions award, organized by Etalab, a body affiliated to France's Prime Ministry, whose objective is to encourage efforts towards public open data.
For further information:
Raw data from data.bnf.fr is available under the French Open licence, used by data.gouv.fr. This licence is a kind of CC-by adapted for the French copyright legislation. Data in RDF can be freely reused and copied, for a profit or non-profit use. It is compulsory to quote the source.
The data.bnf.fr project is definitely part of the Open Data movement.
Supported by civic and governmental actors, the Open Data is a global movement that aims at making available non-nominative data, not related to privacy or security and collected by public organisations in our connected societies. Open Data is now a national policy, as the 2003 European Directive on re-use of public sector information (Directive 2003/98/EC) has been incorporated in the French legislation, in the Ordinance number 2500-650 of 6 June 2005 for freedom of access and use to public documents and of public information.
The main purposes are:
This is in line with the missions of the National Library of France: "to enable as many people as possible to have access to the collections" (assurer l'accès du plus grand nombre aux collections, sous réserve des secrets protégés par la loi, dans des conditions conformes à la législation sur la propriété intellectuelle et compatibles avec la conservation de ces collections), and to enable people to "remote access thanks to state of the art technologies of data transmission" (permettre la consultation à distance en utilisant les technologies les plus modernes de transmission des données (Decree of the 3rd January 1994 forming the BnF).
The purpose is therefore to share with citizens the results of libraries' efforts to identify and describe the collections they hold, including digital items. Thus we can optimize dissemination and reuse of data produced by the BnF, by pushing them out of our internal silos and giving them an enhanced audience and visibility on the Web. Potential usages are various and innovative. Other libraries can now not only retrieve data from the BnF but also create links to it. Moreover, data is bound to get out of the library world in order to be broadly widespread. Examples for such initiatives include: the iF-Verso project by the Institut Français, or private projects to create I-Phone applications or geographic visualisations of places related to works and authors, enabling one to discover digital documents relating to a city or monument.
For further information: