Experimental process: autogenerated works

The automatically generated work pages are the result of an experimental process that the Bibliothèque nationale de France is carrying out thanks to the data.bnf.fr project.

Since 2011, data.bnf.fr allows the BnF to publish its data on the web according to international standards (Linked Open Data). It also allows to experiment with a new way of organising information, no longer focused on the document (describing the 2001 paperback edition of Madame Bovary, or a lectured version of the same novel), but on the author's work, bringing together all his successive editions (Madame Bovary, by Flaubert, written in 1856).

However, the work to which each document relates is rarely described in the catalog (less than 8% of the documents). If one wanted to resume this work manually on the 12 million records in the catalog, one would have to spend 45 years on it, at a rate of 2 minutes per document. However, a national process is in progress, the Bibliographic Transition, aimed at adopting this new way of doing things (the adoption of the LRM model, Library Reference Model).

BnF is therefore experimenting with a semi-automatic process to generate the description of each work from information describing its successive editions. The first corpus addressed concerns 20th century printed works.

For each author, we select the titles of his or her publications, to group them by similarity. And for each group a program calculates the information about the work from what it finds in the documents (alternative forms of title, translation titles, date of first publication, other authors).

The result of these computations is then put online on data.bnf.fr to assess the relevance of the process.

It is also submitted to the critical eye of Internet users, who are invited to feedback to report problems.

As these problems may have several origins (original data, grouping criteria, etc.), BnF cannot promise that they will be corrected quickly. We can, however, include them in the corrections to be made, for the day when these same works will be loaded into the general catalog after a few months.

You can also take part in this great work and help us improve the reliability of the data by pointing out the errors you notice on these autogenerated works.