About data.bnf.fr

Summary:

To contact the team: data@bnf.fr


A presentation of the project

The data.bnf.fr project endeavours to make the data produced by Bibliothèque nationale de France (French National Library) more useful on the Web. It gathers various BnF resources and external resources on pages devoted to an author, a work, or a subject. These pages organize the Web contents, links, and services provided by BnF. Available online since July 2011, data.bnf.fr is still evolving and expanding.

With data.bnf.fr, you can:

  • reach BnF resources directly from a Web page, without any previous knowledge of the services provided by the library;
  • get oriented in the BnF resources and possibly find external resources.

The objective is to put forward the BnF's collections and to provide a hub between different resources. Data.bnf.fr is meant to support the BnF's other applications. The project belongs to the BnF's policy of becoming part of the Web of data and adopting Semantic Web standards.

For more information:

Roadmap

On the agenda

The main objectives are to:

  • make the data produced by the BnF more visible on the Web,
  • federate the data produced by the BnF, both within and outside the catalogues,
  • contribute to collaboration and metadata exchange by creating links between structured and trustable resources,
  • facilitate reuse of metadata (under Open License) by third parties.

The evolutions expected during 2014 will go along the following lines:

Development of data.bnf.fr's function as a hub between resources

1. In existing catalogues:

Improving the pages devoted to specific types of resources:

  • Archival and manuscript resources
  • Musical works.

Incorporating new types of records:

  • records for performances,
  • authority records for geographic names,
  • authority records for printers and booksellers,
  • eventually, records for periodicals (work in progress).

2. Beyond catalogues:

Incorporating metadata from various databases maintained by the BnF:

Creating links to digital libraries that have a partnership with the BnF:

  • Incorporating OAI sets provided by libraries that have a partnership with Gallica, so that users can access books that were digitized by other libraries.

Paving the way for innovative uses of data as Open Data

New pages will be created. They will be labelled "Atelier" (the French for "Workshop"), and will allow one to experiment with the BnF's data, e.g. by displaying it as timelines, geographic maps, or picture galleries. It will be possible to test functionalities that are being developed, to mine the BnF's data, and to draw inspiration for reusing it or crossing it with other data sets.

Evolutions made necessary by the expansion of the site

The technical infrastructure of the site will be made more robust and performing.

Improvement of the Web pages:

  • page queries will be facilitated,
  • pages devoted to authors who created many works and are often quoted will be optimized.

Optimization of the RDF views of the raw data:

  • the RDF dump files will be redefined in order to enable targeted reuses, e.g. there will be a dump file for complete descriptions of publications (manifestations),
  • manifestations will be dereferenced: URIs will be assigned to bibliographic descriptions of publications in RDF.

Modifications to the functionalities and ergonomic aspects of the site

Organization of the pages so as to enable:

  • the retrieval of the various versions of a given work, either by language (in the case of translations) or contributor (author of foreword, illustrator, etc.),
  • the improvement of the "subject" pages by incorporating elements from the RAMEAU subject authority records or optimizing their links to authors and works,
  • the improvement of the "date" pages devoted to a year or a period (in both HTML and RDF).

Reuse of the outcomes of data processing in the BnF's existing applications

The outcomes of the alignment and clustering algorithms used in data.bnf.fr will be exploited in existing applications:

  • in BnF archives et manuscrits: links will created from EAD research tools to authority records for persons and corporate bodies,
  • in Gallica: links will be created among digitized items, e.g. among various versions of a given work,
  • in BnF catalogue général: additional links will be created from bibliographic records to authority records for works.

Content and selection policy

Data.bnf.fr consists of a portion of the content of the BnF's catalogues. Information about further authors and works and associated documents is gradually incorporated.

Such information was created by the National Library of France or institutions that have a partnership with it, or is derived from external sites.

The three main resources that are gathered in data.bnf.fr are the main catalogue (BnF catalogue général ) for publications, the archive and manuscript catalogue (BnF archives et manuscrits), and Gallica (the digital library). The BnF holds over 30 million documents that have been acquired for centuries. As the Legal deposit is compulsory for French publications, collections that are available about authors and works are comprehensive and various. Over one million copyright-free documents are digitized and available online from Gallica.

In July 2014, data.bnf.fr consists of 400 000 authors, linked to over 7 millions documents from BnF catalogue général, and BnF archives et manuscrits.

The initial corpus of data.bnf.fr focuses on the most prominent authors and works described in the BnF's databases, but also includes authors and works about which the BnF provides rare and relevant information–indeed, there are cases where the BnF is the only source of information available on the Web–: e.g., scarcely known classical authors, lawyers, or composers. We noticed that these resources are actually the most consulted pages of the site, which means that they meet the public's various, specific needs.

The following types of pages have been progressively included since 2011:

  • The most significant authors in French literature (July 2011),
  • Authors of most frequently requested items, i.e., lawyers and classical authors (September 2011),
  • All works about which the BnF holds at least one study, and their authors (November 2011),
  • The complete set of RAMEAU subject authority records (January 2012),
  • All the authors indexed in BnF archives et manuscrits (March 2012),
  • All the authors the authority record of which contains a link to a RAMEAU subject authority record (July 2012),
  • All the authors whose authority record is associated with descriptions of book bindings (November 2012),
  • All authors whose name appears in subject headings (June 2013).
  • All periodics and related authors ; authors linked to a digital document from Gallica.

On the long term, the objective is to cover all good quality data from the BnF’s catalogues, which represent almost the totality of BnF's information sources.

HTML pages

Data.bnf.fr displays structured data. They collocate:

HTML pages are automatically created with data and identifiers that are in the different databases of the library: BnF catalogue general, BnF archives et manuscripts, and Gallica. The HTML pages are created using "Semantic Web" technologies.

The pages are based on our authority records: authority records for persons and corporate bodies provide the matter for the "author" pages, authority records for works provide the matter for the "work" pages, and the RAMEAU subject authority records (the indexing language used at the BnF) provide the matter for the "subject" pages.

To end with, these pages are indexed by search engines, whereas data and metadata that is hidden in the BnF's unindexable databases cannot be retrieved. The data.bnf.fr pages describe resources from the BnF that are often concealed in the "deep Web" and give access to digital documents from Gallica.

A new data model

The data model used in data.bnf.fr makes it possible to federate data extracted from internal applications, but also to include links to external sources. Resources produced by the BnF (authority and catalogue records, finding aids for archives and manuscripts, digital documents) are assigned permanent identifiers – ARK identifiers– that enable the creation of persistent links.

The first step was to develop bibliographical frameworks that are being tested at an international level, especially the "FRBR" model.

This step was followed by modelling efforts aiming at displaying the data in RDF (Resource Description Framework) on the Web of data. In the BnF's view, the implementation of these technical standards must ensure interoperability between external and internal databases, through machine readable and structured data.

For further information:

Exposing our data in RDF (Resource Description Framework)

In the long term, useful, reliable and controlled data will be displayed and integrated in the growing world of the Web of data, by abiding to the semantic Web standards. This must be done in conformance with international initiatives to facilitate the use of informational or administrative public data.

Being on the Web of data implies the use of specific technical solutions in order to create links: dereferenceable and permanent URIs (Uniform Resource Identifiers), a content negotiation mechanism, and an access to raw data.

Linked Open Data fosters data exchange between library and other communities, and brings solutions for formats interoperability. The Deutsche Nationalbibliothek, the British Library, and the Library of Congress have also adopted these tools in order to open their bibliographic data.

The reusable data that we display include subject authority records from the RAMEAU repository, which is used to index bibliographic records at the BnF. They have been converted to the RDF SKOS language (Simple Knowledge Organization System), within the framework of the European project TELplus. This repository is now regularly updated on data.bnf.fr with inputs from the whole database maintained by the BnF.

Récupérer les données en RDF et accéder au dumps (le Web sémantique dans data.bnf.fr)

For further information:

External links in data.bnf.fr

Data.bnf.fr is part of the Web and provides external links to Web sites, either maintained by the BnF or completely independent.

There are several kinds of links:

  • Links to other external repositories, to which data produced by the BnF is aligned, such as the Library of Congress, the Deutsche Nationalbibliothek, VIAF (Virtual International Authority File), IdRef, Geonames, Agrovoc, and Thesaurus W (the French National Archives’ thesaurus).
  • Links to search forms in which query terms (author name, subject, work title) are automatically pre-typed: BnF catalogue général, CCFr, BnF archives et manuscrits, CNLJ-La Joie par les livres, Europeana, SUDOC (Système universitaire de documentation), Worldcat, Wikipedia.
  • Wikipedia provides thumbnails for authors, whenever no one could be found on Gallica, and a short biography. This data is retrieved through Dbpedia.

How does it work?

The data belongs to separate databases. It is produced and stored in different formats. Data.bnf.fr extracts, transforms and gathers datasets in a unique database and makes them interoperable.

We use the following tools:

  • Unique and permanent identifiers assigned to every record: the BnF uses ARK identifiers for records from the Catalogue general and digital documents from Gallica,
  • Bibliographical description standards,
  • Authority records for persons, corporate bodies, works and subjects,
  • and data matching and federation techniques.

We rely on authority records, which form the basis for all author, work, and subject pages, in order to gather and organise the different data silos. The different resources are collocated through the authority record’s identifier.

Author pages collocate all bibliographic records that are linked to the author’s identifier.

Work pages collocate all records that are linked to both the author's and the work's identifier. When there is no link to the work authority record, there is a simple matching mechanism based on string recognition techniques ("words beginning with"). "Subject" pages collocate all records that have a link to the same subject.

Software

We use the free software CubicWeb

CubicWeb is an open source platform to develop Web semantic applications and is published under LGPL licence.

Within the project, this software is used to:

  • Extract and integrate data from heterogeneous sources and in various formats (CSV, MARC, Dublin Core, EAD-XML, RDF, …),
  • Merge, match and gather them in a SQL base,
  • Generate pages in any format, in this case: HTML, JSON, RDF-XML or PDF.

It is based on the query language RQL (Relation Query Language), which is similar to the W3C'sSPARQL and the Python language.

.

In 2013, CubicWeb won the Dataconnexions award, organized by Etalab, a body affiliated to France's Prime Ministry, whose objective is to encourage efforts towards public open data.

For further information:

logo cubicWeb

Data.bnf.fr in the Open Data movement

Raw data from data.bnf.fr is available under the French Open licence, used by data.gouv.fr. This licence is a kind of CC-by adapted for the French copyright legislation. Data in RDF can be freely reused and copied, for a profit or non-profit use. It is compulsory to quote the source.

The data.bnf.fr project is definitely part of the Open Data movement.

Supported by civic and governmental actors, the Open Data is a global movement that aims at making available non-nominative data, not related to privacy or security and collected by public organisations in our connected societies. Open Data is now a national policy, as the 2003 European Directive on re-use of public sector information (Directive 2003/98/EC) has been incorporated in the French legislation, in the Ordinance number 2500-650 of 6 June 2005 for freedom of access and use to public documents and of public information.

The main purposes are:

  • Democratic: making public action more transparent and efficient; rationalise the creation of public data, by broadcasting and gathering data;
  • Economical: foster economical activity by providing reusable and useful information for a commercial or non commercial use.

This is in line with the missions of the National Library of France: "to enable as many people as possible to have access to the collections" (assurer l'accès du plus grand nombre aux collections, sous réserve des secrets protégés par la loi, dans des conditions conformes à la législation sur la propriété intellectuelle et compatibles avec la conservation de ces collections), and to enable people to "remote access thanks to state of the art technologies of data transmission" (permettre la consultation à distance en utilisant les technologies les plus modernes de transmission des données (Decree of the 3rd January 1994 forming the BnF).

The purpose is therefore to share with citizens the results of libraries' efforts to identify and describe the collections they hold, including digital items. Thus we can optimize dissemination and reuse of data produced by the BnF, by pushing them out of our internal silos and giving them an enhanced audience and visibility on the Web. Potential usages are various and innovative. Other libraries can now not only retrieve data from the BnF but also create links to it. Moreover, data is bound to get out of the library world in order to be broadly widespread. Examples for such initiatives include: the iF-Verso project by the Institut Français, or private projects to create I-Phone applications or geographic visualisations of places related to works and authors, enabling one to discover digital documents relating to a city or monument.

For further information: