I have been working for some years on specification and implementation of several APIs and exchange formats for data used in, and provided by libraries. Unfortunately most existing library standards are either fuzzy, complex, and misused (such as MARC21), or limited to bibliographic data or authority data, or both. Libraries, however, are much more than bibliographic data – they involve library patrons, library buildings, library services, library holdings, library databases etc.
During the work on formats and APIs for these parts of library world, Patrons Account Information API (PAIA) being the newest piece, I found myself more and more on the way to a whole library ontology. The idea of a library ontology started in 2009 (now moved to this location) but designing such a broad data model from bottom would surely have lead to yet another complex, impractical and unused library standard. Meanwhile there are several smaller ontologies for parts of the library world, to be combined and used as Linked Open Data.
In my opinion, ontologies, RDF, Semantic Web, Linked Data and all the buzz is is overrated, but it includes some opportunities for clean data modeling and data integration, which one rarely finds in library data. For this reason I try to design all APIs and formats at least compatible with RDF. For instance the Document Availability Information API (DAIA), created in 2008 (and now being slightly redesigned for version 1.0) can be accessed in XML and in JSON format, and both can fully be mapped to RDF. Other micro-ontologies include:
- Document Service Ontology (DSO) defines typical document-related services such as loan, presentation, and digitization
- Simple Service Status Ontology (SSSO) defines a service instance as kind of event that connects a service provider (e.g. a library) with a service consumer (e.g. a library patron). SSSO further defines typical service status (e.g. reserved, prepared, executed…) and limitations of a service (e.g. a waiting queue or a delay
- Patrons Account Information API (PAIA) will include a mapping to RDF to express basic patron information, fees, and a list of current services in a patron account, based on SSSO and DSO.
- Document Availability Information API (DAIA) includes a mapping to RDF to express the current availability of library holdings for selected services. See here for the current draft.
- A holdings ontology should define properties to relate holdings (or parts of holdings) to abstract documents and editions and to holding institutions.
- GBV Ontology contains several concepts and relations used in GBV library network that do not fit into other ontologies (yet).
- One might further create a database ontology to describe library databases with their provider, extent APIs etc. – right now we use the GBV ontology for this purpose. Is there anything to reuse instead of creating just another ontology?!
The next step will probably creation of a small holdings ontology that nicely fits to the other micro-ontologies. This ontology should be aligned or compatible with the BIBFRAME initiative, other ontologies such as Schema.org, and existing holding formats, without becoming too complex. The German Initiative DINI-KIM has just launched a a working group to define such holding format or ontology.
P.S: Mit Beschränkung auf Bibliotheksdienstleistungen, die sich auf Dokumente beziehen, habe ich inzwischen mit der Spezifikation einer Document Service Ontology begonnen.
Im Rahmen der Entwicklung der Patrons Account Information API (PAIA) und der Document Availability Information API (DAIA) bin ich wiederholt auf die Frage gestoßen, was für Arten von Dienstleistungen Bibliotheken eigentlich anbieten. Auf meine Frage bei Libraries.StackExchange antwortete Adrian Pohl, der sich in seiner Masterarbeit für lobid.org mit dem Thema beschäftigt hat.
Meine Fragestellung ist natürlich einseitig und zwar ausgerichtet auf die mögliche Verwendung in APIs, mit denen Nutzer und Programme verschiedene Bibliotheksdienstleistungen abrufen oder anfordern können sollen. Das bekannteste Beispiel ist die Dienstleistung der Ausleihe. PAIA und DAIA wurden primär dafür entworfen, damit Dienste wie die Ausleihe von Büchern aus Bibliotheken offen und maschinenlesbar möglich ist. Offen heisst, dass praktisch jeder oder zumindest alle Bibliotheksnutzer direkt per gut dokumentierter API auf die Dienste zugreifen können statt nur über bestimmte Benutzeroberflächen.
Dieser Blogartikel ist erst einmal ein lautes Denken, bei dem ich Adrians Klassifikation auf Eignung für meine Fragestellung untersuche. Wie jede Klassifikation sind sowohl die von Adrian bereitgestellte Liste als auch meine Kommentare also nicht neutral sondern nur eine mögliche Sichtweise.
1. Webbased services without direct personnel involvement:
- OPAC and other research services
- List of recent acquisitions
- Online Tutorials
- Place an order
- Renewal of a loan
- Online acquisition request service
- Digitization Service
- Consulting Service
Ein Katalog ist keine Dienstleistung die angefordert oder reserviert werden müsste, ebenso sieht es für andere Informationen auf der Webseite der Bibliothek aus. Die Punkte „place an order“ und „renewal of a loan“ gehören zur Dienstleistung Ausleihe (bzw. Ansicht bei Präsenzexemplaren). Zu klären ist noch, wie sich Anschaffungsvorschläge einordnen lassen. Die Digitalisierung von Bibliotheksbeständen (z.B. DigiWunschbuch ist vermutlich eine eigene Dienstleistung, die zu den bestehenden DAIA-Services hinzukommen könnte. Wieso „Consulting Service“ hier eingeordnet ist, verstehe ich nicht.
2. Webbased services with direct personnel involvement:
- Chat reference/consulting
Ist ein Chat eine Dienstleistung? Wie sieht es mit anderen Kommunikationswegen (Telefon, Gespräch vor Ort…) aus)? Ich würde das alles unter Auskunft zusammenfassen.
3. In-house services with direct personnel involvement
- In-house guided tours and courses
- Digitization Service
- Microfiche readers
- Reference Desk
- Loan Desk
- Registration Desk
- Consulting Service
- 3D printer
Schulungen und Führungen sind sicher eine eigene Art von Dienstleistung, allerdings gibt es hier verschiedene Arten. Ausleihtheke und Anmeldung sind keine eigene Dienstleistung sondern Teil anderer Services. Bei den Arbeitsmitteln wie Mirofiche-Leser, 3D-Drucker etc. kommt es darauf an, ob diese direkt nutzbar sind oder reserviert werden müssen und ob sie nur vor Ort nutzbar oder auch ausleihbar sind.
4. In-house services without direct personnel involvement
- Reading Room
- Study room
- Computers with Internet Access
- Scanner for use by visitors
- 3D printer
- Computer for self-loan
- Drinks machine
Arbeitsräume und Arbeitsplätze, Kopierer, Scanner, Kaffeeautomat etc. gehören zu den oben genannten Arbeitsmitteln. Ich denke dass sich DAIA einfach erweitern lässt, so dass Nutzer per API abrufen können, ob ein Arbeitsmittel frei ist und PAIA erweitern lässt, so dass Nutzer Arbeitsmittel per API reservieren können.
Integrated Library Systems often lack open APIs or existing services are difficult to reuse because of access restrictions, complexity, and poor documentations. This also applies to patron information, such as loans, reservations, and fees. After reviewing standards such as NCIP, SLNP, and the DLF-ILS recommendations, the Patrons Account Information API (PAIA) was specifed at the Common Library Network (GBV).
PAIA consists of a small set of precisely defined access methods to look up patron information including fees, to renew and request documents, and to cancel requests. With PAIA it should be possible to make use of all patron methods that can be access in OPAC interfaces, also in third party applications, such as mobile Apps and discovery interfaces. The specification is divided into core methods (PAIA core) and methods for authentification (PAIA auth). This design will facilitate migration from insecure username/password authentification to more flexible systems based on OAuth 2.0. OAuth is also used by major service providers such as Google, Twitter, and Facebook.
The current draft of PAIA is available at http://gbv.github.com/paia/ and comments are very welcome. The specification is hosted in a git repository, accompanied by a wiki. Both can be accessed publicly to correct and improve the specification until its final release.
PAIA complements the Document Availability Information API (DAIA) which was created to access current availability information about documents in libraries and related institutions. Both PAIA and DAIA are being designed with a mapping to RDF, to also publish library information as linked data.
Um die aktuelle Verfügbarkeit von Büchern und anderen Medien in GBV-Bibliotheken über eine standardisierte API abrufen zu können, entwickle ich derzeit einen zentralen DAIA-Server under daia.gbv.de. Da die verschiedenen Bibliotheken ihre lokalen Bibliothekssysteme allerdings sehr unterschiedlich konfiguriert haben, dauert die Bereitstellung von DAIA für alle Bibliotheken noch eine Weile.
Eine alternative Lösung, die auch für Bibliotheken funktioniert, die nicht im GBV sind und/oder PICA-LBS einsetzen, ist die Erstellung eines eigene DAIA-Servers. Als Grundgerüst habe ich dafür das Perl-Modul Plack::App::DAIA entwickelt und stelle es als Open Source zur Verfügung. Das Modul enthält zudem Routinen, um eigene DAIA-Server ausbgiebig auf korrekte Umsetzung zu testen – schließlich sind technische Standards, die nicht (automatisch) getestet werden können, eher unverbindliche Absichtserklärungen als wirkliche Standards. Das Perl-Modul enthält ein Beispielskript, das mittels Sceeenscraping (dank des Moduls pQuery), dem Katalog der Universitätsbibliothek Bielefeld eine DAIA-Schnittstelle aufsetzt.
When I started to create an API for availability lookup of document in libraries in 2008, I was suprised that such a basic service was so poorly defined. The best I could find was the just-published recommendation of the Digital Library Federation (DLF-ILS). Even there availability status was basically a plain text message (section 6.3.1 and appendix 4 and 5). Other parts of the DLF-ILS GetAvailability response were more helpful, so they are all part of the Document Availability Information API (DAIA). Here is a simple mapping from DLF-ILS to DAIA:
- bibliographicIdentifer (string) → document (URI)
- itemIdentifier (string) → item (URI)
- dateAvailable (dateTime) → expected (xs:dateTime or xs:date or „unknown“) or delay (xs:duration or „unknown“)
- location (string) → storage (URI and/or string, plus optional URL)
- call number (string) → label (string)
- holdQueueLength (int) → queue (xs:nonNegativeInteger)
- status (string) and circulating (boolean) → available/unavailable (with service type and additional information)
So you could say that DAIA implements the abstract GetAvailability function from DLF-ILS. I like abstract, language independent specifications, but they must be precise and testable (see Meek’s forgotten paper The seven golden rules for producing language-independent standards). DAIA is more than an implementation: it provides both, an abstract standard and bindings to several data languages (XML, JSON, and RDF). The conceptual DAIA data model defines some basic concepts and relationships (document, items, organisations, locations, services, availabilities, limitations…) independent from whether they are expressed in XML elements, attributes, RDF properties, classes, or any other data structuring method. The only reference to specific formats is the requirement that all unique identifiers must be URIs. Right now there is an XML Schema if you want to express DAIA in XML and an OWL ontology for RDF.
In its fourth year of development (see my previous posts from 2009) DAIA seems to have enough momentum to finally get accepted in practice. We use it in GBV library union (public server at http://daia.gbv.de/), there are independent implementations such as in Doctor-Doc, there is client-support in VuFind and I heard rumors that DAIA capabilities will be build into EBSCO and Summon Discovery Services. Native support in Integrated Library Systems, however, is still lacking – I already have given up hope and prefer a clean DAIA wrapper over a broken DAIA-implementation anyway. If you are interested in creating your own DAIA server/wrapper or client, have a look at my reference implementation DAIA and Plack::App::DAIA at CPAN and Oliver Goldschmidt’s PHP implementation in our common github repository. A conceptual overview as tree (DAIA/JSON, DAIA/XML) and as graph (DAIA/RDF) can be found here.
Still there are some details to be defined and I’d like to solve these issues to come to a version DAIA 1.0. These are
- How to deal with partial publications (you requested an article but only get the full book or you requested a series but only get a single volume).
- How to deal with digital publications (especially its possible service types: is „download“ a service distinct to „loan“ or is „presentation“ similar to online access restricted to the library’s intranet?).
- Final agreement on service types (now there are presentation: item can be used in the institution, loan: item can be used outside of the institution for a limited time, interloan: item can be send to another institution, openaccess: item can be access unrestricted, just get a free copy). Some extensions have been proposed.
- A set of common limitation types (for instance IP-based access restriction, permission-based access etc.).
I’d be happy to get some more feedback on these issues, especially concrete use cases. We are already discussing on the daia-devel mailing list but you can also comment in your own blog, at public-lld, code4lib, ils-di etc.).
I just participated in the German conference Semantic Web in Bibliotheken which took place in Hamburg this week. This year there were two slots for lightning talks, but unfortunately participants did not catch on, so we only had four of them. Lightning talks are a good chance to present something unfinished that you need input for, so I presented the Simplified Ontology for Bibliographic Resources (SOBR) as „FRBR light“. You can find the current specification of SOBR at github, which means the specification is still evolving and I’d like to get more feedback.
SOBR was caused by a discussion on the Library Linked Data mailing list about the (disputed) disjointedness of FRBR classes. SOBR has a history in the Document Availability Information API (DAIA), which SOBR might be merged into. The use case of both is publishing information about holdings from library catalogs as Linked Open Data. The information most requested is probably connected to holdings: library users only ask „where is it?“ and „how can I get it?“. In this questions, the little word „it“ refers to a specific publication. In the answers, however, „it“ usually refers to some holding or copy of this publication. Sometimes the holding contains more than the publication (for instance if you ask for an article in a book) and sometimes you get multiple holdings (for instance if you ask for a a large work that is split in multiple volumes). Sometimes there are multiple holdings with different content to choose from, because there are different editions, forms, translations etc. of the requested publication.
A long time ago, some librarians thought about similar questions and answers and came up with the Functional Requirements for Bibliographic Records (FRBR). I tried hard to accept FRBR (I even draw this ugly diagrams that people find when they look up FRBR in Wikipedia). But FRBR does not help me to publish existing library catalogs as Linked Open Data. In our catalog databases we have records that refer to editions, connected with records that refer to holdings (I’ll ignore the little exceptions and nasty special cases such as multiple holdings described by one look-like-a-holding-record). In addition there are some records that refer to series, works, and other types of abstract documents without direct holdings, which are connected to records that refer to editions.
Maybe we can simplify this to two entities: general documents (bibo:Document) and items (with frbr:Item) as special kind of documents. The current design of SOBR also contains a class for editions, but I am not sure whether this class is also needed. At least we need three properties to relate documents to items (daia:exemplar), to relate documents to editions (daia:edition?) and to relate documents to its parts (dcterms:hasPart). To avoid the need of blank nodes, I’d also define properties that relate documents to partial items (daia:extract = dcterms:hasPart + daia:exemplar) and to relate documents to partial editions (daia:editionPart?)
Feedback on SOBR is welcome, especially if you provide examples with existing URIs (or at least local identifiers to already existing data) instead of theoretical FRBR-like-made-up examples. The best way to find a good ontology for publishing library holdings is to actually publish data that describes library holdings! The following image is based on an example that connects a work from LibraryThing and from DBPedia with a partial edition from Worldcat, a full edition from German National Library, and a holding from Hamburg University:
@prefix bibo: <http://purl.org/ontology/bibo/> . @prefix daia: <http://purl.org/ontology/daia/> . @prefix frbr: <http://purl.org/vocab/frbr/core#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . <http://www.librarything.com/work/70394> a bibo:Document ; owl:sameAs <http://dbpedia.org/resource/Living_My_Life> ; daia:edition <http://d-nb.info/1001703464> , [ a bibo:Collection # , daia:Document ; dct:hasPart <http://www.worldcat.org/oclc/656754414> ] ; # daia:exemplar <http://d-nb.info/1001703464> ; ? daia:editionPart <http://www.worldcat.org/oclc/656754414> . <http://d-nb.info/1001703464> a frbr:Item , bibo:Document ; daia:exemplar <http://uri.gbv.de/document/opac-de-18:epn:1220640794> .
image created with rdfdot
A few days ago Ed Summers pointed me to the specification of the Open Publication Distribution System (OPDS) which was just released as version 0.9. OpenPub (an alias for OPDS) is part of the Internet Archive’s BookServer project to build an architecture for vending and lending digital books over the Internet. I wonder why I have not heard more of BookServer and OpenPub at recent library conferences, discussion lists, and journals but maybe current libraries prefer to stay in the physical world to become museums and archives. Anyway, I had a look at OpenPub, so here are my public notes of the first impressions – and my answer to the call for comments. Please comment if you have corrections or additions (or create an issue in the tracker)!
OPDS is a syndication format for electronic publications based on Atom (RFC 4287). Therefore it is fully based on HTTP and the Web (this place that current libraries are still about to discover). Conceptually OPDS is somehow related to OAI(-ORE) and DAIA but it is purely based on XML which makes it difficult to compare with RDF-based approaches. I tried to reengineer the conceptual data model to better seperate model and serialization like I did with DAIA. The goal of OPDS catalogs is „to make Publications both discoverable and straightforward to acquire on a range of devices and platforms“.
OPDS uses a mix of DCMI Metadata Terms (DC) elements and ATOM element enriched with some new OPDS elements. Furthermore it interprets some DC and ATOM elements in a special way (this is common in many data formats although frequently forgotten).
The core concepts of OPDS are Catalogs which are provided as ATOM Feeds (like Jangle which should fit nicely for library resources), Catalog Entries that each refer to one publication and Aquisition Links. There are two disjunct types of Catalogs: Navigation Feeds provide a browseable hierarchy and Acquisition Feeds contain a list of Publication Entries. I will skip the details on Navigation Feeds and search facities (possible via OpenSearch) but focus on Elements and Aquisition.
The specification distinguishes between Partial and Complete Catalog Entries but this is not relevant on the conceptual level. There we have two concepts that are not clearly seperated in the XML serialization: the Catalog Record and the Publication which a Catalog Record describes are mixed in one Catalog Element. The properties of a Catalog Record are:
- identifier of the catalog entry (MANDATORY)
- modification timestamp of the catalog entry (MANDATORY)
- timestamp of when the catalog entry was first accessible
The properties of a Publication are:
- identifier of the publication
- title of the publication (MANDATORY)
- creator of the publication (possibly with sub-properties)
- additional contributors to the publication (dito)
- publication’s category, keywords, classification codes etc. (with sub-properties scheme, term, and label)
- first publication date of the publication
- rights held in and over the publications
- atom:summary and atom:content
- description of the publication (as plain text or some other format for atom:content)
- language(s) of the publication (any format?)
- size or duration of the publication (?)
- Publisher of the publication
Moreover each publication may link to related resources. Unfortunately you cannot just use arbitrary RDF properties but the following relations (from this draft):
- alternative description of the publication
- copyright statement that applies to the catalog entry
- more recent version of the publication
- license associated with the catalog entry
- comment on or discussion of the catalog entry
I consider this relation types one of the weakest points of OPDS. The domain and range of the links are not clear and there are much better vocabularies for links between publications, for instance in FRBR, the Bibliographic Ontology, the citation type ontology, Memento, and SIOC (which also overlaps with ODPS at other places).
In addition each publication must contain at least one atom:link element which is used to encode an Aquisition Link.
OPDS defines two Aquisition types: „Direct Acquisition“ and „Indirect Acquisition“. Direct Aquisition links must directly lead to the publication (in some format) without any login, meta or catalog page in front of it (!) while Indirect Acquisition links lead to such a portal pages that then links to the publications. There are five Aquisition types (called „Acquisition Relations“) similar to DAIA Service types:
- a complete representation of the
publication that may be retrieved without payment
- a complete representation of the publication
that may be retrieved as part of a lending transaction
- a complete representation of the publication
that may be retrieved as part of a purchase
- a representation of a subset of the publication
- a complete representation of the publication that may be retrieved as part of a subscription
odps:acquisition can be mapped to daia:Service/Openaccess and odps:acquisition/borrow can be mapped to daia:Service/Loan (and vice versa). odps:acquisition/buy is not defined in DAIA but could easily be added while daia:Service/Presentation and daia:Service/Interloan are not defined in ODPS. At least the first should be added to ODPS to indicate publications that require you to become a member and log in or to physically walk into an institution to get a publication (strictly limiting OPDS to pure-digital publications accessible via HTTP is stupid if you allow indirect aquisition).
The remaining two acquisition types somehow do not fit between the others: odps:acquisition/sample and odps:acquisition/subscribe should be orthogonal to the other relations. For instance you could subscribe to a paid or to a free subscription and you could buy a subset of a publication.
In addition Aquisition links may or must contain some other properties such as odps:price (containing of a currency code from ISO4217 and a value).
Cover and artwork links
Beside Aquisition links the relations opds:cover and opds:thumbnail can be used to relate a Publication with it’s cover or some other visual representation. The thumbnail should not exceed 120 pixles in height or width and images must be either GIF, JPEG, or PNG. Thumbnails may also be directly embedded via the „data“ URL schema from RFC2397.
OPDS looks very promising and it is already used for benefit in practise. There are some minor issues that can easily be fixed. The random selection of relation types is surely I flaw that can be repaired by allowing arbitrary RDF properties (come on XML fanboys, you should notice that RDF is good at least at link types!) and the list of acquisition types should be cleaned and enhanced at least to support „presentation“ without lending like DAIA does. A typical use case for this are National Licenses that require you to register to access the publications. For more details I would like to compare OPDS in more depth with models like DAIA, FRBR, SIOC, OAI-ORE, Europeana etc. – but not now.
I just finished the first complete draft of an OWL ontology of the DAIA data model. Unless the final URI prefix is sure, the ontology is available in GBV Wiki in Notation3 syntax, but you can also get RDF/XML. There is also a browsable HTML view created with OWLDoc (I only wonder why it does not include URI prefixes like in the same view of the Bibliographic Ontology).
It turned out that mapping the XML format DAIA/XML to RDF is not trivial – although I kept in mind doing so when I designed DAIA. XML is mostly based on a closed world tree data model but RDF is based on an open world graph model. Last month Mike Bergman wrote a good article about the clash of Open World Assumption and Closed World Assumption. I think as long as you only view data in form of tables, lists, and trees, you will not grasp the concept of the Semantic Web. I don’t know whether I have fully grasped the concept of document availability with DAIA and the ontology surely needs some further review, but it’s something to start with – just have a look!
Since almost a year I work on a simple encoding format and API to just get the current (!) availability status of documents in libraries. Together with Reh Uwe (hebis network) and Anne Christensen (beluga project) we created the Document Availability Information API (DAIA) which is defined as data model with encoding in XML and JSON (whichever you prefer).
This week I finished and published a reference implementation of the DAIA protocol as open source Perl-module at CPAN. The implementation includes a simple DAIA validator and converter. A public installation of this validator is also available. The next tasks include implementing server and client components for several ILS software. Every library has its own special rules and schemas – Jonathan Rochkind already wrote about the problems to implement DAIA because of ILS complexity. We cannot erase this complexity by magic (unless we refactor and clean the ILS), but at least we can try to map it to a common data model – which DAIA provides.
With the DAIA Perl package you can concentrate on writing wrappers from your library systems to DAIA and easily consume and evaluate DAIA-encoded information. Why should everyone write its own routines to grab for instance the HTML OPAC output and parse availability status? One mapping to DAIA should fit most needs, so others can build upon. DAIA can not only be helpful to connect different library systems, but also to create mashups and services like „Show me on a map, where a given book is currently hold and available“ or „Send me a tweet if a given books in my library is available again“ – If you have more cool ideas for client applications, just let me know!
In the context of ILS Discovery Interface Task Force and their official recommendation DAIA implements the GetAvailability method (section 6.3.1). There are numerous APIs for several tasks in library systems (SRU/SRW, Z39.50, OpenSearch, OAI-PMH, Atom, unAPI etc.) but there was no open, usable standard way just to query whether a copy of given publication – for instance book – is available in a library, in which department, whether you can loan it or only use it in the library, whether you can directly get it online, or how long it will probably take until it is available again (yes, I looked at alternatives like Z39.50, ISO 20775, NCIP, SLNP etc. but they were hardly defined, documented, implemented and usable freely on the Web). I hope that DAIA is easy enough so non-librarians can make use of it if libraries provide an API to their system with DAIA. Extensions to DAIA can be discussed for instance in Code4Lib Wiki but I’d prefer to start with this basic, predefined services:
- presentation: an item can be used inside the institution (in their rooms, in their intranet etc.).
- loan: an item can be used outside of the institution (by lending or online access).
- interloan: an tem can be used mediated by another institution. That means you do not have to interact with the institution that was queried for this item. This include interlibrary loan as well as public online ressources that are not hosted or made available by the queried institution.
- openaccess: an item can be used imediately without any restrictions by the institution, you don’t even have to give it back. This applies for Open Access publications and free copies.