An impression of the OPDS/OpenPub catalog data model

27. Mai 2010 um 00:05 3 Kommentare

A few days ago Ed Summers pointed me to the specification of the Open Publication Distribution System (OPDS) which was just released as version 0.9. OpenPub (an alias for OPDS) is part of the Internet Archive’s BookServer project to build an architecture for vending and lending digital books over the Internet. I wonder why I have not heard more of BookServer and OpenPub at recent library conferences, discussion lists, and journals but maybe current libraries prefer to stay in the physical world to become museums and archives. Anyway, I had a look at OpenPub, so here are my public notes of the first impressions – and my answer to the call for comments. Please comment if you have corrections or additions (or create an issue in the tracker)!

OPDS is a syndication format for electronic publications based on Atom (RFC 4287). Therefore it is fully based on HTTP and the Web (this place that current libraries are still about to discover). Conceptually OPDS is somehow related to OAI(-ORE) and DAIA but it is purely based on XML which makes it difficult to compare with RDF-based approaches. I tried to reengineer the conceptual data model to better seperate model and serialization like I did with DAIA. The goal of OPDS catalogs is “to make Publications both discoverable and straightforward to acquire on a range of devices and platforms”.

OPDS uses a mix of DCMI Metadata Terms (DC) elements and ATOM element enriched with some new OPDS elements. Furthermore it interprets some DC and ATOM elements in a special way (this is common in many data formats although frequently forgotten).

Core concepts

The core concepts of OPDS are Catalogs which are provided as ATOM Feeds (like Jangle which should fit nicely for library resources), Catalog Entries that each refer to one publication and Aquisition Links. There are two disjunct types of Catalogs: Navigation Feeds provide a browseable hierarchy and Acquisition Feeds contain a list of Publication Entries. I will skip the details on Navigation Feeds and search facities (possible via OpenSearch) but focus on Elements and Aquisition.

Catalog Elements

The specification distinguishes between Partial and Complete Catalog Entries but this is not relevant on the conceptual level. There we have two concepts that are not clearly seperated in the XML serialization: the Catalog Record and the Publication which a Catalog Record describes are mixed in one Catalog Element. The properties of a Catalog Record are:

atom:id
identifier of the catalog entry (MANDATORY)
atom:updated
modification timestamp of the catalog entry (MANDATORY)
atom:published
timestamp of when the catalog entry was first accessible

The properties of a Publication are:

dc:identifier
identifier of the publication
atom:title
title of the publication (MANDATORY)
atom:author
creator of the publication (possibly with sub-properties)
atom:contributors
additional contributors to the publication (dito)
atom:category
publication’s category, keywords, classification codes etc. (with sub-properties scheme, term, and label)
dc:issued
first publication date of the publication
atom:rights
rights held in and over the publications
atom:summary and atom:content
description of the publication (as plain text or some other format for atom:content)
dc:language
language(s) of the publication (any format?)
dc:extend
size or duration of the publication (?)
dc:publisher
Publisher of the publication

Moreover each publication may link to related resources. Unfortunately you cannot just use arbitrary RDF properties but the following relations (from this draft):

alternate
alternative description of the publication
copyright
copyright statement that applies to the catalog entry
latest-version
more recent version of the publication
license
license associated with the catalog entry
replies
comment on or discussion of the catalog entry

I consider this relation types one of the weakest points of OPDS. The domain and range of the links are not clear and there are much better vocabularies for links between publications, for instance in FRBR, the Bibliographic Ontology, the citation type ontology, Memento, and SIOC (which also overlaps with ODPS at other places).

In addition each publication must contain at least one atom:link element which is used to encode an Aquisition Link.

Aquisition Links

OPDS defines two Aquisition types: “Direct Acquisition” and “Indirect Acquisition”. Direct Aquisition links must directly lead to the publication (in some format) without any login, meta or catalog page in front of it (!) while Indirect Acquisition links lead to such a portal pages that then links to the publications. There are five Aquisition types (called “Acquisition Relations”) similar to DAIA Service types:

odps:acquisition
a complete representation of the
publication that may be retrieved without payment
odps:acquisition/borrow
a complete representation of the publication
that may be retrieved as part of a lending transaction
odps:acquisition/buy
a complete representation of the publication
that may be retrieved as part of a purchase
odps:acquisition/sample
a representation of a subset of the publication
odps:acquisition/subscribe
a complete representation of the publication that may be retrieved as part of a subscription

odps:acquisition can be mapped to daia:Service/Openaccess and odps:acquisition/borrow can be mapped to daia:Service/Loan (and vice versa). odps:acquisition/buy is not defined in DAIA but could easily be added while daia:Service/Presentation and daia:Service/Interloan are not defined in ODPS. At least the first should be added to ODPS to indicate publications that require you to become a member and log in or to physically walk into an institution to get a publication (strictly limiting OPDS to pure-digital publications accessible via HTTP is stupid if you allow indirect aquisition).

The remaining two acquisition types somehow do not fit between the others: odps:acquisition/sample and odps:acquisition/subscribe should be orthogonal to the other relations. For instance you could subscribe to a paid or to a free subscription and you could buy a subset of a publication.

In addition Aquisition links may or must contain some other properties such as odps:price (containing of a currency code from ISO4217 and a value).

Cover and artwork links

Beside Aquisition links the relations opds:cover and opds:thumbnail can be used to relate a Publication with it’s cover or some other visual representation. The thumbnail should not exceed 120 pixles in height or width and images must be either GIF, JPEG, or PNG. Thumbnails may also be directly embedded via the “data” URL schema from RFC2397.

Final thoughts

OPDS looks very promising and it is already used for benefit in practise. There are some minor issues that can easily be fixed. The random selection of relation types is surely I flaw that can be repaired by allowing arbitrary RDF properties (come on XML fanboys, you should notice that RDF is good at least at link types!) and the list of acquisition types should be cleaned and enhanced at least to support “presentation” without lending like DAIA does. A typical use case for this are National Licenses that require you to register to access the publications. For more details I would like to compare OPDS in more depth with models like DAIA, FRBR, SIOC, OAI-ORE, Europeana etc. – but not now.

First complete draft of DAIA Ontology

7. Januar 2010 um 19:06 2 Kommentare

I just finished the first complete draft of an OWL ontology of the DAIA data model. Unless the final URI prefix is sure, the ontology is available in GBV Wiki in Notation3 syntax, but you can also get RDF/XML. There is also a browsable HTML view created with OWLDoc (I only wonder why it does not include URI prefixes like in the same view of the Bibliographic Ontology).

It turned out that mapping the XML format DAIA/XML to RDF is not trivial – although I kept in mind doing so when I designed DAIA. XML is mostly based on a closed world tree data model but RDF is based on an open world graph model. Last month Mike Bergman wrote a good article about the clash of Open World Assumption and Closed World Assumption. I think as long as you only view data in form of tables, lists, and trees, you will not grasp the concept of the Semantic Web. I don’t know whether I have fully grasped the concept of document availability with DAIA and the ontology surely needs some further review, but it’s something to start with – just have a look!

Class or Property? Objectification in RDF and data modeling

14. August 2009 um 00:23 4 Kommentare

A short twitter statement, in which Ross Singer asked about encoding MARC relator codes in RDF, reminded me of a basic data modeling question that I am thinking about for a while: When should you model something as class and when should you model it as property? Is there a need to distinguish at all? The question is not limited to RDF but fundamental in data/information modeling. In Entity-relationship modeling (Chen 1976) the question is whether to use an entity or a relation. Let me give an example by two subject-predicat-object statements in RDF Notation3:

:Work dc:creator :Agent
:Agent rdf:type :Creator

The first statement says that a specific agent (:Agent) has created (dc:creator) a specific work (:Work). The second statement says that :Agent is a creator (:Creator). In the first dc:creator is a property while in the second :Creator is a class. You could define that the one implies the other, but you still need two different concepts because classes and properties are disjoint (at least in OWL – I am not sure about plain RDF). In Notation3 the implications may be written as:

@forAll X1, X2. { X1 dc:creator X2 } => { X2 a _:Creator }.
@forAll Y1. { Y1 a _:Creator } => { @forSome Y2. Y2 dc:creator Y1 }.

If you define two URIs for class and property of the same concept (the concept of a creator and creating something) then the two things are tightly bound together: Everyone who ever created something is a creator, and to be a creator you must have created something. This logic rule sounds rather rude if you apply it to other concepts like to lie and to be a liar or to sing and to be a singer. Think about it!

Beside the lack of fuzzy logic on the Semantic Web I miss an easy way to do “reification” (there is another concept called “reification” in RDF but I have never seen it in the wild) or “objectification”: You cannot easily convert between classes and properties. In a closed ontology this is less a problem because you can just decide whether to use a class or a property. But the Semantic Web is about sharing and combining data! What if Ontology A has defined a “Singer” class and Ontology B defined a “sings” property which refer to the same real-world concept?

Other data modeling languages (more or less) support objectification. Terry Halpin, the creator and evangelist of Object-Role Modeling (ORM) wrote a detailed paper about objectification in ORM whithout missing to mention the underlying philosophical questions. My (doubtful)
philosophic intuition makes me think that properties are more problematic then classes because the latter can easily be modeled as sets. I think the need for objectification and to bring together classes and properties with similar meaning will increase, the more “semantic” data we work with. In many natural languages you can use a verb or adjective as noun by nominalization. The meaning may slightly change but it is still very useful for communication. Maybe we should more rely on natural language instead of dreaming of defining without ambiguity?

Powered by WordPress with Theme based on Pool theme and Silk Icons.
Entries and comments feeds. Valid XHTML and CSS. ^Top^

Switch to our mobile site