An impression of the OPDS/OpenPub catalog data model

27. Mai 2010 um 00:05 3 Kommentare

A few days ago Ed Summers pointed me to the specification of the Open Publication Distribution System (OPDS) which was just released as version 0.9. OpenPub (an alias for OPDS) is part of the Internet Archive’s BookServer project to build an architecture for vending and lending digital books over the Internet. I wonder why I have not heard more of BookServer and OpenPub at recent library conferences, discussion lists, and journals but maybe current libraries prefer to stay in the physical world to become museums and archives. Anyway, I had a look at OpenPub, so here are my public notes of the first impressions – and my answer to the call for comments. Please comment if you have corrections or additions (or create an issue in the tracker)!

OPDS is a syndication format for electronic publications based on Atom (RFC 4287). Therefore it is fully based on HTTP and the Web (this place that current libraries are still about to discover). Conceptually OPDS is somehow related to OAI(-ORE) and DAIA but it is purely based on XML which makes it difficult to compare with RDF-based approaches. I tried to reengineer the conceptual data model to better seperate model and serialization like I did with DAIA. The goal of OPDS catalogs is “to make Publications both discoverable and straightforward to acquire on a range of devices and platforms”.

OPDS uses a mix of DCMI Metadata Terms (DC) elements and ATOM element enriched with some new OPDS elements. Furthermore it interprets some DC and ATOM elements in a special way (this is common in many data formats although frequently forgotten).

Core concepts

The core concepts of OPDS are Catalogs which are provided as ATOM Feeds (like Jangle which should fit nicely for library resources), Catalog Entries that each refer to one publication and Aquisition Links. There are two disjunct types of Catalogs: Navigation Feeds provide a browseable hierarchy and Acquisition Feeds contain a list of Publication Entries. I will skip the details on Navigation Feeds and search facities (possible via OpenSearch) but focus on Elements and Aquisition.

Catalog Elements

The specification distinguishes between Partial and Complete Catalog Entries but this is not relevant on the conceptual level. There we have two concepts that are not clearly seperated in the XML serialization: the Catalog Record and the Publication which a Catalog Record describes are mixed in one Catalog Element. The properties of a Catalog Record are:

atom:id
identifier of the catalog entry (MANDATORY)
atom:updated
modification timestamp of the catalog entry (MANDATORY)
atom:published
timestamp of when the catalog entry was first accessible

The properties of a Publication are:

dc:identifier
identifier of the publication
atom:title
title of the publication (MANDATORY)
atom:author
creator of the publication (possibly with sub-properties)
atom:contributors
additional contributors to the publication (dito)
atom:category
publication’s category, keywords, classification codes etc. (with sub-properties scheme, term, and label)
dc:issued
first publication date of the publication
atom:rights
rights held in and over the publications
atom:summary and atom:content
description of the publication (as plain text or some other format for atom:content)
dc:language
language(s) of the publication (any format?)
dc:extend
size or duration of the publication (?)
dc:publisher
Publisher of the publication

Moreover each publication may link to related resources. Unfortunately you cannot just use arbitrary RDF properties but the following relations (from this draft):

alternate
alternative description of the publication
copyright
copyright statement that applies to the catalog entry
latest-version
more recent version of the publication
license
license associated with the catalog entry
replies
comment on or discussion of the catalog entry

I consider this relation types one of the weakest points of OPDS. The domain and range of the links are not clear and there are much better vocabularies for links between publications, for instance in FRBR, the Bibliographic Ontology, the citation type ontology, Memento, and SIOC (which also overlaps with ODPS at other places).

In addition each publication must contain at least one atom:link element which is used to encode an Aquisition Link.

Aquisition Links

OPDS defines two Aquisition types: “Direct Acquisition” and “Indirect Acquisition”. Direct Aquisition links must directly lead to the publication (in some format) without any login, meta or catalog page in front of it (!) while Indirect Acquisition links lead to such a portal pages that then links to the publications. There are five Aquisition types (called “Acquisition Relations”) similar to DAIA Service types:

odps:acquisition
a complete representation of the
publication that may be retrieved without payment
odps:acquisition/borrow
a complete representation of the publication
that may be retrieved as part of a lending transaction
odps:acquisition/buy
a complete representation of the publication
that may be retrieved as part of a purchase
odps:acquisition/sample
a representation of a subset of the publication
odps:acquisition/subscribe
a complete representation of the publication that may be retrieved as part of a subscription

odps:acquisition can be mapped to daia:Service/Openaccess and odps:acquisition/borrow can be mapped to daia:Service/Loan (and vice versa). odps:acquisition/buy is not defined in DAIA but could easily be added while daia:Service/Presentation and daia:Service/Interloan are not defined in ODPS. At least the first should be added to ODPS to indicate publications that require you to become a member and log in or to physically walk into an institution to get a publication (strictly limiting OPDS to pure-digital publications accessible via HTTP is stupid if you allow indirect aquisition).

The remaining two acquisition types somehow do not fit between the others: odps:acquisition/sample and odps:acquisition/subscribe should be orthogonal to the other relations. For instance you could subscribe to a paid or to a free subscription and you could buy a subset of a publication.

In addition Aquisition links may or must contain some other properties such as odps:price (containing of a currency code from ISO4217 and a value).

Cover and artwork links

Beside Aquisition links the relations opds:cover and opds:thumbnail can be used to relate a Publication with it’s cover or some other visual representation. The thumbnail should not exceed 120 pixles in height or width and images must be either GIF, JPEG, or PNG. Thumbnails may also be directly embedded via the “data” URL schema from RFC2397.

Final thoughts

OPDS looks very promising and it is already used for benefit in practise. There are some minor issues that can easily be fixed. The random selection of relation types is surely I flaw that can be repaired by allowing arbitrary RDF properties (come on XML fanboys, you should notice that RDF is good at least at link types!) and the list of acquisition types should be cleaned and enhanced at least to support “presentation” without lending like DAIA does. A typical use case for this are National Licenses that require you to register to access the publications. For more details I would like to compare OPDS in more depth with models like DAIA, FRBR, SIOC, OAI-ORE, Europeana etc. – but not now.

3 Kommentare »

RSS Feed für Kommentare zu diesem Artikel. TrackBack URI

  1. Unfortunately you cannot just use arbitrary RDF properties

    In fact, I would expect some OPDS Catalog providers to include rdf:RDF blocks inside the Complete Catalog Entry as a way to include the sort of richer metadata that some lucky few do have (like this, for example).

    Kommentar by Keith Fahlgren — 27. Mai 2010 #

  2. I certainly agree that we would like to allow OPDS Catalogs to use better link relations. That said, I hope that the OPDS Catalog 0.9 spec does not limits the allow @rel values to only those defined in the Link Header draft (or the “name”s defined in Atom, or the IANA registry). Instead, we have the opportunity to use IRIs rather than “name”s, which is the way to support arbitrary relations: Atom RFC4287 §4.2.7 on atom:link and its @rel attribute.

    We use exactly this technique for the Acquisition Relations.

    What would be the best way to represent the more nuanced relationships in FRBR, the Bibliographic Ontology, the citation type ontology, Memento, and SIOCas as IRIs in atom:link/@rels?

    Kommentar by Keith Fahlgren — 27. Mai 2010 #

  3. Thanks for the detailed review Jakob! I agree with Keith that Atom does support the rich typed linking between resources that is found in the RDF data model, and its various serializations. I’ve tried to make the case that if you squint right Atom looks like a nice RDF serialization for resource-oriented RDF graphs. This is primarily an idea I borrowed from Herbert van de Sompel, Peter Keane and the rest of the oai-ore folks who worked on the Atom serialization for oai-ore resource maps.

    All that being said, I agree with you that the Catalog Relations section in v0.9 of the OPDS Specification should be updated to mention that RFC 4287 allows any URI to be used as a rel value in an atom:link element, and that there are use cases where you might want to use pieces of vocabulary from sioc, memento, frbr, etc. Maybe it would be worthwhile adding some suggestions on how to improve the spec to issue 36 that Keith opened up.

    Again, thanks for your detailed analysis. I’ve been interested in the possible interplay between dai and opds.

    Kommentar by Ed Summers — 27. Mai 2010 #

Entschuldige, das Kommentarformular ist zurzeit geschlossen.

Powered by WordPress with Theme based on Pool theme and Silk Icons.
Entries and comments feeds. Valid XHTML and CSS. ^Top^