An impression of the OPDS/OpenPub catalog data model

27. Mai 2010 um 00:05 3 Kommentare

A few days ago Ed Summers pointed me to the specification of the Open Publication Distribution System (OPDS) which was just released as version 0.9. OpenPub (an alias for OPDS) is part of the Internet Archive’s BookServer project to build an architecture for vending and lending digital books over the Internet. I wonder why I have not heard more of BookServer and OpenPub at recent library conferences, discussion lists, and journals but maybe current libraries prefer to stay in the physical world to become museums and archives. Anyway, I had a look at OpenPub, so here are my public notes of the first impressions – and my answer to the call for comments. Please comment if you have corrections or additions (or create an issue in the tracker)!

OPDS is a syndication format for electronic publications based on Atom (RFC 4287). Therefore it is fully based on HTTP and the Web (this place that current libraries are still about to discover). Conceptually OPDS is somehow related to OAI(-ORE) and DAIA but it is purely based on XML which makes it difficult to compare with RDF-based approaches. I tried to reengineer the conceptual data model to better seperate model and serialization like I did with DAIA. The goal of OPDS catalogs is “to make Publications both discoverable and straightforward to acquire on a range of devices and platforms”.

OPDS uses a mix of DCMI Metadata Terms (DC) elements and ATOM element enriched with some new OPDS elements. Furthermore it interprets some DC and ATOM elements in a special way (this is common in many data formats although frequently forgotten).

Core concepts

The core concepts of OPDS are Catalogs which are provided as ATOM Feeds (like Jangle which should fit nicely for library resources), Catalog Entries that each refer to one publication and Aquisition Links. There are two disjunct types of Catalogs: Navigation Feeds provide a browseable hierarchy and Acquisition Feeds contain a list of Publication Entries. I will skip the details on Navigation Feeds and search facities (possible via OpenSearch) but focus on Elements and Aquisition.

Catalog Elements

The specification distinguishes between Partial and Complete Catalog Entries but this is not relevant on the conceptual level. There we have two concepts that are not clearly seperated in the XML serialization: the Catalog Record and the Publication which a Catalog Record describes are mixed in one Catalog Element. The properties of a Catalog Record are:

identifier of the catalog entry (MANDATORY)
modification timestamp of the catalog entry (MANDATORY)
timestamp of when the catalog entry was first accessible

The properties of a Publication are:

identifier of the publication
title of the publication (MANDATORY)
creator of the publication (possibly with sub-properties)
additional contributors to the publication (dito)
publication’s category, keywords, classification codes etc. (with sub-properties scheme, term, and label)
first publication date of the publication
rights held in and over the publications
atom:summary and atom:content
description of the publication (as plain text or some other format for atom:content)
language(s) of the publication (any format?)
size or duration of the publication (?)
Publisher of the publication

Moreover each publication may link to related resources. Unfortunately you cannot just use arbitrary RDF properties but the following relations (from this draft):

alternative description of the publication
copyright statement that applies to the catalog entry
more recent version of the publication
license associated with the catalog entry
comment on or discussion of the catalog entry

I consider this relation types one of the weakest points of OPDS. The domain and range of the links are not clear and there are much better vocabularies for links between publications, for instance in FRBR, the Bibliographic Ontology, the citation type ontology, Memento, and SIOC (which also overlaps with ODPS at other places).

In addition each publication must contain at least one atom:link element which is used to encode an Aquisition Link.

Aquisition Links

OPDS defines two Aquisition types: “Direct Acquisition” and “Indirect Acquisition”. Direct Aquisition links must directly lead to the publication (in some format) without any login, meta or catalog page in front of it (!) while Indirect Acquisition links lead to such a portal pages that then links to the publications. There are five Aquisition types (called “Acquisition Relations”) similar to DAIA Service types:

a complete representation of the
publication that may be retrieved without payment
a complete representation of the publication
that may be retrieved as part of a lending transaction
a complete representation of the publication
that may be retrieved as part of a purchase
a representation of a subset of the publication
a complete representation of the publication that may be retrieved as part of a subscription

odps:acquisition can be mapped to daia:Service/Openaccess and odps:acquisition/borrow can be mapped to daia:Service/Loan (and vice versa). odps:acquisition/buy is not defined in DAIA but could easily be added while daia:Service/Presentation and daia:Service/Interloan are not defined in ODPS. At least the first should be added to ODPS to indicate publications that require you to become a member and log in or to physically walk into an institution to get a publication (strictly limiting OPDS to pure-digital publications accessible via HTTP is stupid if you allow indirect aquisition).

The remaining two acquisition types somehow do not fit between the others: odps:acquisition/sample and odps:acquisition/subscribe should be orthogonal to the other relations. For instance you could subscribe to a paid or to a free subscription and you could buy a subset of a publication.

In addition Aquisition links may or must contain some other properties such as odps:price (containing of a currency code from ISO4217 and a value).

Cover and artwork links

Beside Aquisition links the relations opds:cover and opds:thumbnail can be used to relate a Publication with it’s cover or some other visual representation. The thumbnail should not exceed 120 pixles in height or width and images must be either GIF, JPEG, or PNG. Thumbnails may also be directly embedded via the “data” URL schema from RFC2397.

Final thoughts

OPDS looks very promising and it is already used for benefit in practise. There are some minor issues that can easily be fixed. The random selection of relation types is surely I flaw that can be repaired by allowing arbitrary RDF properties (come on XML fanboys, you should notice that RDF is good at least at link types!) and the list of acquisition types should be cleaned and enhanced at least to support “presentation” without lending like DAIA does. A typical use case for this are National Licenses that require you to register to access the publications. For more details I would like to compare OPDS in more depth with models like DAIA, FRBR, SIOC, OAI-ORE, Europeana etc. – but not now.

Powered by WordPress with Theme based on Pool theme and Silk Icons.
Entries and comments feeds. Valid XHTML and CSS. ^Top^