Open Data

Summary

  • Developments for 2018
  • Complete schema
  • FRBR construction and author, work and subject concepts
  • Embedded data: schema.org and Opengraph Protocol
  • Vocabularies
  • Presentation of the ontology bnf-onto
  • Bibliothèque nationale de France vocabularies
  • Mappings between the Intermarc format and the RDF language we use
  • Developments for 2018

    Modification of URI suffixes

    URI were taking a suffix according to the type of resources they were identifying: #foaf:Person for a person, foaf:Organization for an organisation, #spatialThing for a place, #frbr:Work for a work. URI now equally take the same suffix #about (instead of #foaf:Person, #foaf:Organization, #spatialThing, #frbr:Work).

    Information about manifestations (title, ISBN, page number, etc) was related to an entity, which was identified by an URI without any suffix. Metadata of the record of this manifestation (date of creation, date of modification) was related to an entity identified by the same ark, but with a #record suffix. Data model is now standardised : URI with a #about suffix is identifying the entity whose type is frbr-rda:Manifestation, like the other entities. The URI with a #about suffix is related to the URI of manifestation by an equivalence relationship owl:sameAs. Information about the manifestation are now related to an URI with a #about suffix and informations about the record will be related to an URI without any suffix.

    Records of expressions don't exist yet in the BnF Main Catalogue, the skos:Concept entities don't exist neither in data.bnf.fr for expressions. There only exists an entity whose type is frbr-rda:Expression, with the same ark as the manifestation, with a #Expression suffix.

    Utilisation des URI dans le modèle de données de data.bnf.fr

    Durability of queries

    Put the command DEFINE input:same-as "yes" at the beginning of the query allows to assure the durability of most of the queries. Entities linked by an equivalence relationship owl:sameAs are so regarded as the same in the query database : query engine deduces that any property about one of the entity is valid for both.

    • Retrieve URI of works and expressions from the same author (Georges Delerue) using the permalink at the botton of his page :

    DEFINE input:same-as "yes"
    PREFIX dcterms: <http://purl.org/dc/terms/>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    SELECT ?oeuvre ?expression
    WHERE {<http://data.bnf.fr/ark:/12148/cb138931135> foaf:focus ?uri_auteur.
    ?oeuvre dcterms:creator ?uri_auteur.
    ?expression dcterms:contributor ?uri_auteur.}
    Results of the query

    • Retrieve latitude and longitude of a place by adding #about suffix to the permalink from the bottom of the page :

    DEFINE input:same-as "yes"
    PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
    SELECT ?latitude_longitude
    WHERE {<http://data.bnf.fr/ark:/12148/cb15320577r#about> geo:lat_long ?latitude_longitude.}
    Results of the query

    Complete schema

    See the data for the work The Raven from Edgar Allan Poe

    See the full data model:

    See the full data model

    FRBR construction and author, work and subject concepts

    The FRBR model

    Data.bnf.fr is carried out in the context of the recent evolutions of bibliographic description, by experimenting and adapting the FRBR (Functional requirements for Bibliographic Records) model, elaborated by the IFLA (International Federation for Library Associations).
    The model has three entity groups which are linked together by relationships: information about documents, persons and organisations, and subjects.

    Entités des trois groupes du modèle FRBR

    Source : Bénézet Joly, http://slideplayer.fr/slide/3213771/.

    • “Work” pages

    The first group of the FRBR model describes the different aspects of an intellectual or art creation, and discerns four levels: work, expression, manifestation and item.

    The work level is about the intellectual and artistic creation. For instance: Le colonel Chabert by Honoré de Balzac. “Work” pages are created using the related authority records from the BnF Main Catalogue.

    The expression level (different versions of this work such as a translation, an adaptation or an abridgment) does not appear in the HTML pages but can be seen in the corresponding RDF pages.

    The manifestation level is the physical embodiment of a work. For instance an edition of Les Misérables like “Nouvelle impression illustrée. 1879-1882. Paris. E. Hugues”. The manifestations are listed in the documentary unit and gathered in the section entitled “Vie et éditions de l’œuvre” (Life and editions of the work). This level corresponds to the bibliographic record in the BnF Main Catalogue, or to a manuscript that is identified by a label in the Archives and Manuscript Catalogue (BnF archives et manuscrits).

    There can be a part-whole relationship between:

    o a work and another work
    For example, Le Père Goriot (Honoré de Balzac), is part of the work Scenes de la vie privée, by the same author, and both are considered as works and have a page in data.bnf.fr.

    o a manifestation and another one
    For example, a specific edition of Le Père Goriot (Honoré de Balzac) is part of the manifestation Etudes de moeurs which is an edition gathering several texts by Balzac.

    • “Author” pages

    A person or an organisation can be either the “author” of a work (then there is a link between the “author” page and the related “work” page) or “contributor” of an expression (translator, preface writer, librettist…).
    Nevertheless, as the expression level is not different from the manifestation level in the HTML pages of data.bnf.fr, contributors do only appear at the manifestation level. The different creation or contribution roles are listed in a BnF repository, in the Intermarc format, and in the Library of Congress repository, in Marc. This kind of data enrich the RDF of the pages.

    • “Subject” pages

    Among retrievable data, there are subjects records from the Bibliothèque nationale de France (RAMEAU, which is the French indexation language). They have been converted into the RDF language SKOS (Simple Kowledge Organisation), in the context of the European project TELplus. This repository is now updated on data.bnf.fr with the whole current database from the Bibliothèque nationale de France.
    Manifestations which have a RAMEAU term as a subject are brought together in the appropriate “subject” page.

    Moreover the site holds pages that gather works and manifestations about a work or an author. These pages are not indexed by search engines and are available from the “work” or “author” pages.
    For instance: on the page “Napoleon”, there is a link towards a page presenting documents about Napoleon such as Vie de Napoléon Buonaparte, 1827.

    • “Date” and “Place” pages

    “Date” pages cover a period, for example the page of the year 1789.
    These pages gather :

    o related subjects to this period,

    o authors born or died in the year,

    o organisations created or whose activity ended in the year,

    o works created or achieved, performances played and documents published during this year.

    These pages don't exist in the Main Catalogue.

    “Place” pages gather cartographic documents about a place.
    They enable to find :

    o authors who were born or died in this place,

    o organisations which were created there,

    o periodicals and documents which were published there, performances which were played, recordings which have been made, and battles and treaties which were signed.

    Theses “place” pages are linked to the related page about the place as subject, which gather documents about the place.

    Alignments and clustering by work

    In “work” and “author” pages, all manifestations by a single author are gathered around his works, thanks to the explicit link to title authority record (Titre Conventionnel or TIC, in French) , inside the original bibliographic record.

    In the meantime some manifestations are not linked to the title authority record and remain “orphan”. In order to improve the way our data is translated in FRBR and to bring a better service to the public, it is important to align these orphan manifestations, which means bringing them together around the corresponding work.

    Example:
    Bibliographic record (BnF) with a link to the title authority record “Fables” and the author authority record “Jean de La Fontaine”.
    Bibliographic record (BnF) without any link to the title authority record “La cigale et la fourmi” but with a link to the author authority record “Jean de La Fontaine”.

    That is why we have already produced simple alignments in data.bnf.fr . When a manifestation is explicitly linked to an author authority record in the bibliographic record, and when the character string of this manifestation is exactly the same as the work’s title, then the manifestation is aligned with the work.

    Yet, after this simple alignment, many manifestations remain orphan. In the long term two solutions are possible:

    • alignment: attaching manifestations to a work which has its own title authority record and, thus, its own page. These manifestations do not have any link to the title authority record: they come from bibliographic records of the Main Catalogue or from descriptions of BnF Archives et Manuscrits.
      We use a simple and advanced alignment algorithm (word beginning with, exact match, words with a X distance, Levenstein distance, matching algorithm) to determine whether two character strings correspond to the same work. The link to the author authority record remains essential to align works.
    • clustering: if there is no title authority record, some manifestations are gathered around a new documentary unit.

    Embedded data: schema.org and Opengraph Protocol

    “Author”, “work” and “subject” pages are open on the Web and can be reached by search engines.
    This is why, except from the traditional methods used for indexing the homepage, we have chosen to embed two kinds of data to structure these pages:

    • Schema.org, provides a vocabulary to add information to the HTML content, with a microdata format, to foster the indexing by search engines.

    The following elements are used:

    itemtype=http://schema.org/Person
    itemprop="description" itemprop="birthdate" itemprop="deathdate" itemprop="nationality" itemprop="memberOf"

    itemtype=http://schema.org/Book
    itemprop="description" itemprop="inLanguage" itemprop="datePublished" itemprop="genre"

    itemtype= http://schema.org/Organization
    itemprop="description" itemprop="image" itemprop="name" itemprop="url" itemprop="members" itemprop="founding date" itemprop="founders"

    And for sub groups of the organisations:
    itemscope itemtype= http://schema.org/PerformingGroup itemscope itemtype= http://schema.org/DanceGroup itemscope itemtype= http://schema.org/TheaterGroup itemscope itemtype=http://schema.org/MusicGroup

    It is a very simple vocabulary to encode in RDFa metadata to be retrieved when the user adds the resource to its Facebook profile. The following metadata is embedded in the HTML header, thanks to META markups:

    og: title (title of the page)
    og: description (description of the page content)
    og: type (type of resource)
    og: url (page URL)
    og: image (URL of the image that illustrates page)
    og: author (name of the author in the “work” page)

    Vocabularies

    We preferred to reuse existing vocabularies in order to foster interoperability.

    rdf

    http://www.w3.org/1999/02/22-rdf-syntax-ns

    rdfs

    http://www.w3.org/2000/01/rdf-schema

    skos

    http://www.w3.org/2004/02/skos/core

    dcterms

    http://purl.org/dc/terms

    foaf

    http://xmlns.com/foaf/0.1/

    RDAgroup2elements

    http://rdvocab.info/uri/schema/FRBRentitiesRDA

    rdvocab

    http://RDVocab.info/Elements

    Nevertheless some properties and classes have to be expressed by an ontology specific to the BnF: bnf-onto. To publish the ontology, the BnF has chosen the harmonized namespace http://data.bnf.fr/ontology/.

    Presentation of the BnF ontology bnf-onto

    The ontology "bnf-onto" can be seen at this adress : http://data.bnf.fr/ontology/bnf-onto/".
    List of properties:

    Label

    Definition

    URI

    cote

    Shelfmark of an archival document: unique number identifying the item which is kept in the collections

    http://data.bnf.fr/ontology/bnf-onto/cote

    EAN

    European article numbering (Bar Code)

    http://data.bnf.fr/ontology/bnf-onto/EAN

    expositionVirtuelle

    URL for a virtual exhibition of the BnF

    http://data.bnf.fr/ontology/bnf-onto/expositionVirtuelle

    firstYear

    First date (year only) of an entity : year of birth, year of creation, year of first publication of a work

    http://data.bnf.fr/ontology/bnf-onto/firstYear

    FRBNF

    BnF record number

    http://data.bnf.fr/ontology/bnf-onto/FRBNF

    isbn

    International standard book number

    http://data.bnf.fr/ontology/bnf-onto/isbn

    ISMN

    International standard music number for printed music

    http://data.bnf.fr/ontology/bnf-onto/ismn

    ouvrageJeunesse

    An adapted edition of a work for the younger public
    Meant to sort editions, which often offer a different content even though the title is the same

    http://data.bnf.fr/ontology/bnf-onto/ouvrageJeunesse

    code_role

    Coded role describing a contribution of a person/organisation in a work
    Numeric values are used to describe relators, according to the BnF Intermarc referential of coded roles for authors and corntributors.
    Completed by the mark-up related to the code list for contributors and creators of the Library of Congress.

    http://data.bnf.fr/ontology/bnf-onto/code_role

    role

    French labels of contribution roles

    http://data.bnf.fr/ontology/bnf-onto/role

    translation

    Link to a translated version of a periodical

    http://data.bnf.fr/ontology/bnf-onto/translation

    Bibliothèque nationale de France vocabularies

    BnF specfific vocabularies are displayed at this address : http://data.bnf.fr/vocabulary-en.
    List of vocabularies:

    Mappings between the Intermarc format and the RDF language we use

    Persons

    RDF

    Intermarc fields for persons

    name

    skos:prefLabel @in_lang

    100, 400

    other name

    skos:altLabel, foaf:familyName, foaf:givenName


    nationality

    foaf:nationality

    008 position 12-13

    language

    RDAgroup2elements:languageOfThePerson

    008 position 14-16

    gender

    foaf:gender

    008 position 17

    date of birth

    RDAgroup2elements:dateOfBirth

    008 position 27-36

    date of death

    RDAgroup2elements:dateOfDeath

    008 position 37-46

    place of birth

    RDAgroup2elements:placeOfBirth

    603 $a

    place of death

    RDAgroup2elements:placeOfDeath

    603 $b

    beginning of activity

    RDAgroup2elements:periodOfActivityOfThePerson

    008 position 47-51

    end of activity

    RDAgroup2elements:periodOfActivityOfThePerson

    008 position 52-55

    note about records sources

    skos:editorialNote

    610

    summary, note

    RDAgroup2elements:biographicalInformation

    600

    domains

    RDAgroup2elements:fieldOfActivityOfThePerson

    624

    link to the DBpedia resource

    owl:sameAs


    code for relators

    marcrel:[code from Library of Congress]


    image of the author from Gallica

    foaf:depiction


    Organisations

    RDF

    Intermarc fields for organisations

    name

    skos:prefLabel @in_lang

    100, 400

    nationality

    foaf:nationality

    008 position 12-13

    language

    RDAgroup2elements:languageOfThePerson

    008 position 14-16

    date of beginning

    RDAgroup2Elements:dateAssociatedWithTheCorporateBody

    008 pos 27-36

    date of end

    RDAgroup2Elements:dateAssociatedWithTheCorporateBody

    008 pos 37-46

    beginning of activity

    dc:date

    008 pos 47-51

    end of activity

    RDAgroup2elements:periodOfActivityOfTheCorporateBody

    008 pos 52-55

    website

    foaf:homepage

    606

    sources

    skos:editorialNote

    610

    summary, note

    RDAgroup2elements:corporateHistory

    600

    domain

    RDAgroup2elements:fieldOfActivityOfTheCorporateBody

    624

    link to the DBpedia resource

    owl:sameAs


    RAMEAU subjects headings

    RDF

    Intermarc fields for RAMEAU subject headings

    original title

    skos:prefLabel

    16X, 46X

    other title

    skos:altLabel

    16X, 46X

    origin (thesaurus Rameau)

    skos:inScheme


    note about records sources

    skos:editorialNote

    610, 612

    other note

    skos:scopeNote

    600

    broader subjects

    skos:broader

    3XX, 5XX

    narrower subjects

    skos:narrower

    3XX, 5XX

    related subjects

    skos:related

    3XX, 5XX

    alignment with external datasets

    skos:closematch

    620

    alignment with external datasets

    skos:exactmatch


    Œuvre

    RDF

    Intermarc fields for works

    main title

    dc:title, skos:prefLabel, rdfs:label @in_lang

    145, 415

    other title

    skos:altLabel @in_lang


    language

    dc:language

    008 position 14-16

    date of work

    dc:date

    008 position 27-36

    source

    skos:editorialNote

    610

    summary, note

    dc:description

    600

    domain

    dc:subject

    624

    link to the authority record in the BnF Main Catalogue

    owl:sameAs


    part of

    dc:isPartOf


    main author

    dc:creator

    100, 101, 110

    relators

    dc:contributor, bnf-onto:[coderole]

    711, 702, 700, 701, 710, 712

    relators code

    dc:contributor, bnf-onto:[coderole]

    code libre, 321, 322

    image for the digitised work in Gallica

    foaf:depiction


    Manifestation

    RDF

    Intermarc fields for bibliographic records

    manifestation of a work

    rdarelationships:workManifested


    title

    dc:title

    245

    has part

    dc:hasPart


    date of publication

    dc:date

    260

    place of publication

    rdvocab:placeOfPublication

    250

    publisher

    rdvocab:publishersName

    260

    physical description

    dc:description


    ISBN

    bnf-onto:ISBN

    20

    type of document

    dc:type


    language

    dc:language

    41

    adaptation for the youth

    bnf-onto:ouvrageJeunesse


    Expression

    RDF


    relators

    marcrel:[code de fonction de la Bibliothèque du Congrès]


    relators code

    bnf-onto:coderole

    sous-zone $4

    contribution

    bnf-onto:role


    type of document

    dc:type