Experimental process: autogenerated works
The automatically generated work pages are the result of an experimental process that the Bibliothèque nationale de France is carrying out thanks to the Data project.
Since 2011, Data has enabled BnF to publish its data on the Web according to international standards (linked data). It also makes it possible to experiment with a new way of structuring information, no longer centred on the document (describing the 2001 edition of Madame Bovary in a paperback format, or a read version of the same novel), but on the author's work, grouping together all its successive editions (Madame Bovary, by Flaubert, written in 1856).
But the work to which each document relates is rarely described in the catalogue (less than 8% of documents). If we wanted to carry out this work manually on the 15 million records in the catalogue, we would have to devote 45 years to it, at a rate of 2 minutes per document. However, a national process is underway, La Transition bibliographique, aimed at adopting this new way of doing things (adoption of the IFLA-LRM, Library Reference Model).
The BnF has therefore experimented with a semi-automatic process to generate a description of each work from information describing its successive editions. The first corpus processed concerns twentieth-century printed works.
For each author, the titles of their publications are extracted, to group them by similarity. And for each group a program calculates the information relating to the work from what it finds in the documents (alternative forms of title, translation titles, date of first publication, other authors).
The results of these calculations are then posted online on Data to assess the relevance of the process.It is also subject to the critical eye of Internet users, who are invited to react to point out any problems.
As these problems may have several origins (original data, grouping criteria, etc.), BnF cannot undertake to correct them quickly. It can, however, include it in the corrections to be made, for the day when these same works are loaded after a few months in the general catalogue. You can also take part in this major project and help us to improve the reliability of the data by reporting any errors you notice in these automatically calculated works.