<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jakoblog &#187; ATOM</title>
	<atom:link href="http://jakoblog.de/tag/atom/feed/" rel="self" type="application/rss+xml" />
	<link>http://jakoblog.de</link>
	<description>Das Weblog von Jakob Voß</description>
	<lastBuildDate>Wed, 01 Sep 2010 15:45:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>An impression of the OPDS/OpenPub catalog data model</title>
		<link>http://jakoblog.de/2010/05/27/an-impression-of-the-opdsopenpub-catalog-data-model/</link>
		<comments>http://jakoblog.de/2010/05/27/an-impression-of-the-opdsopenpub-catalog-data-model/#comments</comments>
		<pubDate>Wed, 26 May 2010 22:05:58 +0000</pubDate>
		<dc:creator>jakob</dc:creator>
				<category><![CDATA[en]]></category>
		<category><![CDATA[ATOM]]></category>
		<category><![CDATA[DAIA]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[OPDS]]></category>
		<category><![CDATA[openpub]]></category>

		<guid isPermaLink="false">http://jakoblog.de/?p=830</guid>
		<description><![CDATA[A few days ago Ed Summers pointed me to the specification of the Open Publication Distribution System (OPDS) which was just released as version 0.9. OpenPub (an alias for OPDS) is part of the Internet Archive&#8217;s BookServer project to build an architecture for  vending and lending digital books over the Internet. I wonder why [...]]]></description>
			<content:encoded><![CDATA[<p>A few days ago Ed Summers pointed me to the specification of the <a href="http://opds-spec.org/">Open Publication Distribution System (OPDS)</a> which was just released as version 0.9. <a href="http://code.google.com/p/openpub/">OpenPub</a> (an alias for OPDS) is part of the Internet Archive&#8217;s <a href="http://www.archive.org/bookserver">BookServer</a> project to build an architecture for  vending and lending digital books over the Internet. I wonder why I have not heard more of BookServer and OpenPub at recent library conferences, discussion lists, and journals but maybe current libraries prefer to stay in the physical world to become museums and archives. Anyway, I had a look at OpenPub, so here are my public notes of the first impressions &#8211; and my answer to the <a href="http://groups.google.com/group/openpub/browse_thread/thread/e51af9510f150d55">call for comments</a>. <a href="#comments">Please comment</a> if you have corrections or additions (or <a href="http://code.google.com/p/openpub/issues/entry">create an issue</a> in the tracker)!</p>
<p>OPDS is a syndication format for electronic publications based on Atom (<a href="http://tools.ietf.org/html/rfc4287">RFC 4287</a>). Therefore it is fully based on HTTP and the Web (this place that current libraries are still about to discover). Conceptually OPDS is somehow related to <a href="http://www.openarchives.org/ore/">OAI(-ORE)</a> and <a href="http://daia.sourceforge.net/">DAIA</a> but it is purely based on XML which makes it difficult to compare with RDF-based approaches. I tried to reengineer the conceptual data model to better seperate model and serialization like I did with DAIA. The goal of OPDS catalogs is &#8220;to make Publications both discoverable and straightforward to acquire on a range of devices and platforms&#8221;.</p>
<p>OPDS uses a mix of <a href="http://dublincore.org/documents/dcmi-terms/">DCMI Metadata Terms</a> (DC) elements and ATOM element enriched with some new OPDS elements. Furthermore it interprets some DC and ATOM elements in a special way (this is common in many data formats although frequently forgotten).</p>
<p><b>Core concepts</b></p>
<p>The core concepts of OPDS are <b>Catalogs</b> which are provided as ATOM Feeds (like <a href="http://www.jangle.org/">Jangle</a> which should fit nicely for library resources), Catalog <b>Entries</b> that each refer to one publication and <b>Aquisition Links</b>. There are two disjunct types of Catalogs: <i>Navigation Feeds</i> provide a browseable hierarchy and <i>Acquisition Feeds</i> contain a list of Publication Entries. I will skip the <a href="http://opds-spec.org/specs/opds-catalog-0-9-20100525/#Catalog_Relations">details on Navigation Feeds</a> and search facities (possible via <a href="http://www.opensearch.org/">OpenSearch</a>) but focus on Elements and Aquisition.</p>
<p><b>Catalog Elements</b></p>
<p>The specification distinguishes between Partial and Complete Catalog Entries but this is not relevant on the conceptual level. There we have two concepts that are not clearly seperated in the XML serialization: the <b>Catalog Record</b> and the <b>Publication</b> which a Catalog Record describes are mixed in one Catalog Element. The properties of a Catalog Record are:</p>
<dl>
<dt><tt>atom:id</tt></dt>
<dd>identifier of the catalog entry (MANDATORY)</dd>
<dt><tt>atom:updated</tt></dt>
<dd>modification timestamp of the catalog entry (MANDATORY)</dd>
<dt><tt>atom:published</tt></dt>
<dd>timestamp of when the catalog entry was first accessible</dd>
</dl>
<p>The properties of a Publication are:</p>
<dl>
<dt><tt>dc:identifier</tt></dt>
<dd>identifier of the publication</dd>
<dt><tt>atom:title</tt></dt>
<dd>title of the publication (MANDATORY)</dd>
<dt><tt>atom:author</tt></dt>
<dd>creator of the publication (possibly with sub-properties)</dd>
<dt><tt>atom:contributors</tt></dt>
<dd>additional contributors to the publication (dito)</dd>
<dt><tt>atom:category</tt></dt>
<dd>publication&#8217;s category, keywords, classification codes etc. (with sub-properties scheme, term, and label)</dd>
<dt><tt>dc:issued</tt></dt>
<dd>first publication date of the publication</dd>
<dt><tt>atom:rights</tt></dt>
<dd>rights held in and over the publications</dd>
<dt><tt>atom:summary</tt> and <tt>atom:content</tt></dt>
<dd>description of the publication (as plain text or some other format for atom:content)</dd>
<dt><tt>dc:language</tt></dt>
<dd>language(s) of the publication (<a href="http://code.google.com/p/openpub/issues/detail?id=35">any format?</a>)</dd>
<dt><tt>dc:extend</tt></dt>
<dd>size or duration of the publication (<a href="http://code.google.com/p/openpub/issues/detail?id=34">?</a>)</dd>
<dt><tt>dc:publisher</tt></dt>
<dd>Publisher of the publication</dd>
</dl>
<p>Moreover each publication may link to related resources. Unfortunately you cannot just use arbitrary RDF properties but the following relations (from <a href="http://tools.ietf.org/html/draft-nottingham-http-link-header-10">this draft</a>):</p>
<dl>
<dt><tt>alternate</tt></dt>
<dd>alternative description of the publication</dd>
<dt><tt>copyright</tt></dt>
<dd>copyright statement that applies to the catalog entry</dd>
<dt><tt>latest-version</tt></dt>
<dd>more recent version of the publication</dd>
<dt><tt>license</tt></dt>
<dd>license associated with the catalog entry</dd>
<dt><tt>replies</tt></dt>
<dd>comment on or discussion of the catalog entry</dd>
</dl>
<p>I consider this relation types one of the weakest points of OPDS. The domain and range of the links are not clear and there are much better vocabularies for links between publications, for instance in <a href="http://vocab.org/frbr/core.html">FRBR</a>, the <a href="http://bibliontology.com/">Bibliographic Ontology</a>, the <a href="http://www.crossref.org/CrossTech/2009/03/citation_typing_ontology.html">citation type ontology</a>, <a href="http://www.mementoweb.org/">Memento</a>, and <a href="http://sioc-project.org/">SIOC</a> (which also overlaps with ODPS at other places).</p>
<p>In addition each publication must contain at least one <tt>atom:link</tt> element which is used to encode an <i>Aquisition Link</i>.</p>
<p><b>Aquisition Links</b></p>
<p>OPDS defines two Aquisition types: &#8220;Direct Acquisition&#8221; and &#8220;Indirect Acquisition&#8221;. Direct Aquisition links must directly lead to the publication (in some format) without any login, meta or catalog page in front of it (!) while Indirect Acquisition links lead to such a portal pages that then links to the publications. There are five Aquisition types (called &#8220;Acquisition Relations&#8221;) similar to <a href="http://purl.org/NET/DAIA#2.4._Available_element">DAIA Service types</a>:</p>
<dl>
<dt><tt>odps:acquisition</tt></dt>
<dd>a complete representation of the<br />
publication that may be retrieved without payment</dd>
<dt><tt>odps:acquisition/borrow</tt></dt>
<dd>a complete representation of the publication<br />
that may be retrieved as part of a lending transaction</dd>
<dt><tt>odps:acquisition/buy</tt></dt>
<dd>a complete representation of the publication<br />
that may be retrieved as part of a purchase</dd>
<dt><tt>odps:acquisition/sample</tt></dt>
<dd>a representation of a subset of the publication</dd>
<dt><tt>odps:acquisition/subscribe</tt></dt>
<dd>a complete representation of the publication that may be retrieved as part of a subscription</dd>
</dl>
<p><tt>odps:acquisition</tt> can be mapped to <tt>daia:Service/Openaccess</tt> and <tt>odps:acquisition/borrow</tt> can be mapped to <tt>daia:Service/Loan</tt> (and vice versa). <tt>odps:acquisition/buy</tt> is not defined in DAIA but could easily be added while <tt>daia:Service/Presentation</tt> and <tt>daia:Service/Interloan</tt> are not defined in ODPS. At least the first should be added to ODPS to indicate publications that require you to become a member and log in or to physically walk into an institution to get a publication (strictly limiting OPDS to pure-digital publications accessible via HTTP is stupid if you allow indirect aquisition).</p>
<p>The remaining two acquisition types somehow do not fit between the others: <tt>odps:acquisition/sample</tt> and <tt>odps:acquisition/subscribe</tt> should be orthogonal to the other relations. For instance you could subscribe to a paid or to a free subscription and you could buy a subset of a publication.</p>
<p>In addition Aquisition links may or must contain some other properties such as <tt>odps:price</tt> (containing of a currency code from ISO4217 and a value).</p>
<p><b>Cover and artwork links</b></p>
<p>Beside Aquisition links the relations <tt>opds:cover</tt> and <tt>opds:thumbnail</tt> can be used to relate a Publication with it&#8217;s cover or some other visual representation. The thumbnail should not exceed 120 pixles in height or width and images must be either GIF, JPEG, or PNG. Thumbnails may also be directly embedded via the &#8220;data&#8221; URL schema from RFC2397.</p>
<p><b>Final thoughts</b></p>
<p>OPDS looks very promising and it is already used for benefit in practise. There are some minor issues that can easily be fixed. The random selection of relation types is surely I flaw that can be repaired by allowing arbitrary RDF properties (come on XML fanboys, you should notice that RDF is good at least at link types!) and the list of acquisition types should be cleaned and enhanced at least to support &#8220;presentation&#8221; without lending like DAIA does. A typical use case for this are National Licenses that require you to register to access the publications. For more details I would like to compare OPDS in more depth with models like DAIA, FRBR, SIOC, OAI-ORE, Europeana etc. &#8211; but not now.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakoblog.de/2010/05/27/an-impression-of-the-opdsopenpub-catalog-data-model/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Working group on digital library APIs and possible outcomes</title>
		<link>http://jakoblog.de/2008/04/13/working-group-on-digital-library-apis-and-possible-outcomes/</link>
		<comments>http://jakoblog.de/2008/04/13/working-group-on-digital-library-apis-and-possible-outcomes/#comments</comments>
		<pubDate>Sun, 13 Apr 2008 12:48:50 +0000</pubDate>
		<dc:creator>jakob</dc:creator>
				<category><![CDATA[en]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[ATOM]]></category>
		<category><![CDATA[digital library]]></category>
		<category><![CDATA[OAI]]></category>
		<category><![CDATA[Seealso]]></category>
		<category><![CDATA[SOA]]></category>
		<category><![CDATA[Standards]]></category>

		<guid isPermaLink="false">http://jakoblog.de/2008/04/13/working-group-on-digital-library-apis-and-possible-outcomes/</guid>
		<description><![CDATA[Last year the Digital Library Federation (DLF) formed the &#8220;ILS Discovery Interface Task Force&#8220;, a working group on APIs for digital libraries. See their agenda and the current draft recommendation (February, 15th) for details [via Panlibus]. I&#8217;d like to shortly comment on the essential functions they agreed on at a meeting with major library system [...]]]></description>
			<content:encoded><![CDATA[<p>Last year the <a href="http://www.diglib.org/">Digital Library Federation</a> (DLF) formed the &#8220;<a href="http://blogs.lib.berkeley.edu/shimenawa.php/2008/04/04/ils_basic_discovery">ILS Discovery Interface Task Force</a>&#8220;, a working group on <a href="http://jakoblog.de/2007/11/30/relevant-apis-for-digital-libraries/">APIs for digital libraries</a>. See <a href="https://project.library.upenn.edu/confluence/display/ilsapi/Charge+and+Agenda">their agenda</a> and the <a href="https://project.library.upenn.edu/confluence/display/ilsapi/Draft+Recommendation">current draft recommendation</a> (February, 15th) for details [<a href="http://blogs.talis.com/panlibus/archives/2008/04/ils-vendors-support-berkeley-accord-on-apis.php">via Panlibus</a>]. I&#8217;d like to shortly comment on the essential functions they agreed on at a meeting with major library system (ILS) vendors. <a href="http://dltj.org/article/dlf-ils-statement/">Peter Murray summarized</a> the functions as &#8220;automated interfaces for offloading records from the ILS, a mechanism for determining the availability of an item, and a scheme for creating persistent links to records.&#8221;</p>
<p>On the one hand I welcome if vendors try to agree on (open) standards and service oriented architecture. On the other hand the working group is yet another top-down effort to discuss things that just have to be implemented based on existing Internet standards.</p>
<p><b>1. Harvesting</b>: In the library world this is mainly done via <a href="http://www.openarchives.org/OAI/openarchivesprotocol.html">OAI-PMH</a>. I&#8217;d also consider <a href="http://en.wikipedia.org/wiki/RSS">RSS</a> and  <a href="http://atompub.org/rfc4287.html">Atom</a>. To fetch single records, there is <a href="http://unapi.info/">unAPI</a> &#8211; which the DLF group does not mention. There is no need for <em>any</em> other harvesting API &#8211; missing features (if any) should be integrated into extensions and/or next versions of OAI-PMH and ATOM instead of inventing something new. P.S: <a href="http://www.jasonkolb.com/weblog/2009/09/why-google-wave-is-the-coolest-thing-since-sliced-bread.html">Google Wave</a> shows what to expect in the next years.</p>
<p><b>2. Search</b>: There is still good old overblown Z39.50. The near future is (slightly overblown) <a href="http://www.loc.gov/standards/sru/">SRU/SRW</a> and (simple) <a href="http://www.opensearch.org/">OpenSearch</a>. There is no need for discussion but for open implementations of SRU (I am still waiting for a full client implementation in Perl). I suppose that next generation search interfaces will be based on SPARQL or other RDF-stuff.</p>
<p><b>2. Availability</b>: The <a href="http://blogs.lib.berkeley.edu/shimenawa.php/2008/04/04/ils_basic_discovery">announcement</a> says: &#8220;This functionality will be implemented through a simple REST interface to be specified by the ILS-DI task group&#8221;. Yes, there is definitely a need (in december I <a href="http://jakoblog.de/2007/12/23/heidelberger-katalog-auf-dem-weg-zu-serviceorientierter-architektur/">wrote about such an API</a> in German). However the main point is not the API but to define what &#8220;availability&#8221; means. Please focus on this. P.S: <a href="http://purl.org/NET/DAIA">DAIA</a> is now available.</p>
<p><b>3. Linking:</b> For &#8220;Linking in a stable manner to any item in an OPAC in a way that allows services to be invoked on it&#8221; (announcement) there is no need to create new APIs. Add and propagate clean URIs for your items and point to your APIs via autodiscovery (HTML link element). That&#8217;s all. Really. To query and distribute general links for a given identifier, I created <a href="http://www.gbv.de/wikis/cls/SeeAlso_Simple_Specification">the SeeAlso API</a> which is used more and more in our libraries.</p>
<p>Furthermore the draft contains a section on &#8220;<a href="https://project.library.upenn.edu/confluence/display/ilsapi/Patron+Functionality">Patron functionality</a>&#8221; which is going to be based on <a href="http://www.niso.org/committees/committee_at.html">NCIP</a> and <a href="http://multimedia.mmm.com/mws/mediawebserver.dyn?6666660Zjcf6lVs6EVs66S0LeCOrrrrQ-">SIP2</a>. Both are dead ends in my point of view. You should better look at projects <em>outside the library world</em> and try to define schemas/ontologies for patrons and patron data (hint: patrons are also called &#8220;customer&#8221; and &#8220;user&#8221;). Again: the API itself is not underdefined &#8211; it&#8217;s the data which we need to agree on.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakoblog.de/2008/04/13/working-group-on-digital-library-apis-and-possible-outcomes/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>First draft of OAI-ORE</title>
		<link>http://jakoblog.de/2007/12/30/first-draft-of-oai-ore/</link>
		<comments>http://jakoblog.de/2007/12/30/first-draft-of-oai-ore/#comments</comments>
		<pubDate>Sun, 30 Dec 2007 17:06:09 +0000</pubDate>
		<dc:creator>jakob</dc:creator>
				<category><![CDATA[en]]></category>
		<category><![CDATA[ATOM]]></category>
		<category><![CDATA[OAI]]></category>
		<category><![CDATA[OAI-ORE]]></category>
		<category><![CDATA[Semantic Web]]></category>

		<guid isPermaLink="false">http://jakoblog.de/2007/12/30/first-draft-of-oai-ore/</guid>
		<description><![CDATA[&#8220;Web 3.0&#8243; (or &#8220;Semantic Web&#8221; &#8211; use the buzzword of your choice) is slowly on the raise. Two weeks ago the first public draft of OAI-ORE was published and Mike Giarlo published an OAI-ORE-Plugin for WordPress &#8211; I have not actually tried it, but as far as I understand one could add RFC 5005 to [...]]]></description>
			<content:encoded><![CDATA[<p>&#8220;Web 3.0&#8243; (or &#8220;Semantic Web&#8221; &#8211; use the buzzword of your choice) is slowly on the raise. Two weeks ago the <a href="http://www.openarchives.org/ore/0.1/toc">first public draft of OAI-ORE</a> was published and Mike Giarlo published an <a href="http://lackoftalent.org/michael/blog/ore-wordpress-plug-in/">OAI-ORE-Plugin</a> for WordPress &#8211; I have not actually tried it, but as far as I understand one could <a href="http://jakoblog.de/2007/10/19/archiving-weblogs-with-atom-and-rfc-5005-an-alternative-to-oai-pmh/">add RFC 5005</a> to OAI-ORE to support large resource sets. Or is OAI-PMH enough? Well, in the end it depends on the availability of software libraries, client and the ease of connecting it with other services. After my fancy there are still too much <a href="http://www.ice-nine.net/~mgsimpson/asqu/archives/64">generalized data models</a> but we need <em>concrete</em> implementations &#8211; it was not RDF and OWL but Microformats that got the Web of data started (yes, we&#8217;re in it: the next hype after &#8220;Web 2.0&#8243;). For 2008 I wish less abstract meta-meta-meta-stuff but, more little usable applications and services that can be combined.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakoblog.de/2007/12/30/first-draft-of-oai-ore/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Relevant APIs for (digital) libraries</title>
		<link>http://jakoblog.de/2007/11/30/relevant-apis-for-digital-libraries/</link>
		<comments>http://jakoblog.de/2007/11/30/relevant-apis-for-digital-libraries/#comments</comments>
		<pubDate>Fri, 30 Nov 2007 13:50:11 +0000</pubDate>
		<dc:creator>jakob</dc:creator>
				<category><![CDATA[en]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[ATOM]]></category>
		<category><![CDATA[COinS]]></category>
		<category><![CDATA[Identity Management]]></category>
		<category><![CDATA[Microformats]]></category>
		<category><![CDATA[OAI]]></category>
		<category><![CDATA[OpenId]]></category>
		<category><![CDATA[Shibboleth]]></category>
		<category><![CDATA[Standards]]></category>

		<guid isPermaLink="false">http://jakoblog.de/2007/11/30/relevant-apis-for-digital-libraries/</guid>
		<description><![CDATA[My current impression of OCLC/WorldCat Service Grid is still far to abstract &#8211; instead of creating a framework, we (libraries and library associations) should agree upon some open protocols and (metadata) formats. To start with, here is a list of relevant, existing open standard APIs from my point of view:
Search: SRU/SRW (including CQL), OpenSearch, Z39.50
Harvest/Syndicate: [...]]]></description>
			<content:encoded><![CDATA[<p>My current impression of OCLC/WorldCat Service Grid is still far to abstract &#8211; instead of creating a framework, we (libraries and library associations) should agree upon some open protocols and (metadata) formats. To start with, here is a list of relevant, existing open standard APIs from my point of view:</p>
<p><b>Search:</b> <a href="http://www.loc.gov/standards/sru/">SRU/SRW</a> (including <a href="http://www.loc.gov/standards/sru/specs/cql.html">CQL</a>), <a href="http://www.opensearch.org">OpenSearch</a>, <a href="http://www.loc.gov/z3950/agency/">Z39.50</a></p>
<p><b>Harvest/Syndicate:</b> <a href="http://www.openarchives.org/OAI/openarchivesprotocol.html">OAI-PMH</a>, <a href="http://en.wikipedia.org/wiki/RSS">RSS</a>, <a href="http://atompub.org/rfc4287.html">Atom Syndication</a> (also with <a href="http://www.intertwingly.net/wiki/pie/#head-656bcfe284e2da39c77d4fdab55b16ad3c654719">ATOM Extensions</a>)</p>
<p><b>Copy/Provide:</b> <a href="http://unapi.info">unAPI</a>, <a href="http://ocoins.info">COinS</a>, <a href="http://microformats.org">Microformats</a> (<small>not a real API but a way to provide data</small>)</p>
<p><b>Upload/Edit:</b> <a href="http://www.loc.gov/standards/sru/record-update/">SRU Update</a>, <a href="http://www.ibm.com/developerworks/library/x-atompp1/">Atom Publishing Protocol</a></p>
<p><b>Identity Management:</b> <a href="http://shibboleth.internet2.edu/">Shibboleth</a> (and other <a href="http://en.wikipedia.org/wiki/SAML">SAML</a>-based protocols), <a href="http://www.openid.net/">OpenID</a> (see also <a href="http://osis.netmesh.org/">OSIS</a>)</p>
<p>For more complex applications, additional (REST)-APIs and common metadata standards need to be found (or defined) &#8211; but only if the application is just another kind of search, harvest/syndicate, copy/provide, upload/edit, or Identity Management.</p>
<p><strong>P.S:</strong> I forgot <a href="http://ncip.envisionware.com/">NCIP</a>, a &#8220;standard for the exchange of circulation data&#8221;. Frankly I don&#8217;t fully understand the meaning and importance of &#8220;circulation data&#8221; and the standard looks more complex then needed. More on APIs for libraries can be found <a href="http://worldcat.org/devnet/">in WorldCat Developer Network</a>, in <a href="http://www.jangle.org/">the Jangle project</a> and a <a href="http://jakoblog.de/2008/04/13/working-group-on-digital-library-apis-and-possible-outcomes/">DLF Working group on digital library APIs</a>. For staying in the limited world if libraries, this may suffice, but on the web simplicity and availability of implementations matters &#8211; that&#8217;s why I am working on the <a href="http://www.gbv.de/wikis/cls/SeeAlso_Simple_Specification">SeeAlso linkserver protocol</a> and now at a simple API to query availaibility information (more in August/September 2008).</p>
<p><strong>P.P.S:</strong> A more detailed <a href="http://techessence.info/apis">list of concrete library-related APIs</a> was published by Roy Tennant based on <a href="http://tinyurl.com/59hop2">a list by Owen Stephens</a>.</p>
<p><strong>P.P.S:</strong> And <a href="http://stephenslighthouse.sirsidynix.com/archives/2009/09/apis_and_librar.html">another list by Stephen Abram</a> (SirsiDynix) from September 1st, 2009</p>
]]></content:encoded>
			<wfw:commentRss>http://jakoblog.de/2007/11/30/relevant-apis-for-digital-libraries/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Archiving Weblogs with ATOM and RFC 5005: An alternative to OAI-PMH</title>
		<link>http://jakoblog.de/2007/10/19/archiving-weblogs-with-atom-and-rfc-5005-an-alternative-to-oai-pmh/</link>
		<comments>http://jakoblog.de/2007/10/19/archiving-weblogs-with-atom-and-rfc-5005-an-alternative-to-oai-pmh/#comments</comments>
		<pubDate>Fri, 19 Oct 2007 09:34:45 +0000</pubDate>
		<dc:creator>jakob</dc:creator>
				<category><![CDATA[en]]></category>
		<category><![CDATA[Archivierung]]></category>
		<category><![CDATA[ATOM]]></category>
		<category><![CDATA[BlogML]]></category>
		<category><![CDATA[Feed]]></category>
		<category><![CDATA[OAI]]></category>

		<guid isPermaLink="false">http://jakoblog.de/2007/10/19/archiving-weblogs-with-atom-and-rfc-5005-an-alternative-to-oai-pmh/</guid>
		<description><![CDATA[Following up to my recent post (in German) I had a conversation with my colleague about harvesting and archiving blogs and ATOM vs OAI-PMH. In my opinion with the recent RFC 5005 about Feed Paging and Archiving and its proposed extension of Archived Feeds ATOM can be an alternative to OAI-PMH. Instead of arguing which [...]]]></description>
			<content:encoded><![CDATA[<p>Following up to my <a href="http://jakoblog.de/2007/10/19/weblogs-sammeln-erschliessen-verfuegbar-machen-und-archivieren/">recent post</a> (in German) I had a conversation with my colleague about harvesting and archiving blogs and <a href="http://en.wikipedia.org/wiki/Atom_(standard)">ATOM</a> vs <a href="http://www.openarchives.org/">OAI-PMH</a>. In my opinion with the recent <a href="http://rfc.net/rfc5005.html">RFC 5005</a> about <i>Feed Paging and Archiving</i> and its proposed extension of <a href="http://rfc.net/rfc5005.html#s4">Archived Feeds</a> ATOM can be an alternative to OAI-PMH. Instead of arguing which is better, digital libraries should support both for harvesting and providing archived publications such as preprints and weblog entries (scientific communication and publication already takes place in both).</p>
<p>Instead of having every project to implementing both protocols you could create a wrapper from ATOM with archived feeds to OAI-PMH and vice versa. The mapping from OAI-PMH to ATOM is probably the easier part: You partition the repository into chunks <a href="http://rfc.net/rfc5005.html#s4">as defined in RFC 5005</a> with the <tt>from</tt> and <tt>until</tt> arguments of OAI-PMH. The mapping from OAI-PMH to ATOM is more complicated because you cannot select with timestamps. If you only specify a <tt>from</tt>argument, the corresponding ATOM feed could be harvested going backwards in time but if there is an <tt>until</tt> argument you must harvest the whole archive just to get the first entries and and throw away the rest. Luckily the most frequent use case is to get the newest entries only. Anyway: Both protocols have their pros and cons and a two-way-wrapper could help both. Of course it should be implemented as open source so anyone can use it (by the way: There seems to be no OAI-crawler in Perl yet: Sure there is <a href="http://search.cpan.org/dist/OAI-Harvester/">OAI-Harvester</a> but for real-world applications you have to deal with unavailable servers, corrupt feeds, duplicated or deleted entries, and a way to save the harvested records, so a whole layer above the harvester is missing).</p>
<p><b>P.S.:</b> At <a href="http://www.code4lib.org">code4lib</a> Ed Summers pointed me to Stuart Weibel who asked <a href="http://weibel-lines.typepad.com/weibelines/2007/08/blog-curation-e.html">the same question about blog archiving</a>, and to a discussion in <a href="http://blog.jonudell.net/2007/02/16/a-conversation-with-dan-chudnov-about-openurl-context-sensitive-linking-and-digital-archiving/">John Udell&#8217;s blog</a> that include blog archiving (he also mentions <a href="http://blogml.org">BlogML</a> as a possible part of a solution &#8211; unluckily BlogML looks very dirty to me, <a href="http://nayyeri.net/archive/2006/09/06/BlogML-2.0-Released.aspx">the spec is here</a>). And Daniel Chudnov <a href="http://onebiglibrary.net/story/simple-old-design-for-widespread-blog-mirroring">drafted a blog mirroring architecture</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakoblog.de/2007/10/19/archiving-weblogs-with-atom-and-rfc-5005-an-alternative-to-oai-pmh/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Weblogs Sammeln, Erschließen, Verfügbar machen und Archivieren</title>
		<link>http://jakoblog.de/2007/10/19/weblogs-sammeln-erschliessen-verfuegbar-machen-und-archivieren/</link>
		<comments>http://jakoblog.de/2007/10/19/weblogs-sammeln-erschliessen-verfuegbar-machen-und-archivieren/#comments</comments>
		<pubDate>Fri, 19 Oct 2007 01:03:51 +0000</pubDate>
		<dc:creator>jakob</dc:creator>
				<category><![CDATA[de]]></category>
		<category><![CDATA[Archivierung]]></category>
		<category><![CDATA[ATOM]]></category>
		<category><![CDATA[Bibliothek]]></category>
		<category><![CDATA[Feed]]></category>
		<category><![CDATA[OAI]]></category>
		<category><![CDATA[Web 2.0]]></category>

		<guid isPermaLink="false">http://jakoblog.de/2007/10/19/weblogs-sammeln-erschliessen-verfuegbar-machen-und-archivieren/</guid>
		<description><![CDATA[Ich ärgere mich ja schon seit längerer Zeit, dass praktisch keine Bibliotheken Weblogs sammeln und archivieren, obwohl diese Mediengattung bereits jetzt teilweise die Funktion von Fachzeitschriften übernimmt. Inzwischen kann ich unter den Kollegen zwar ein steigendes Interesse an Blogs feststellen (der nächste Workshop war nach kurzer Zeit ausgebucht), aber so richtig ist bei der Mehrheit [...]]]></description>
			<content:encoded><![CDATA[<p>Ich ärgere mich ja schon seit längerer Zeit, dass praktisch keine Bibliotheken Weblogs sammeln und archivieren, obwohl diese Mediengattung bereits jetzt teilweise die Funktion von Fachzeitschriften übernimmt. Inzwischen kann ich unter den Kollegen zwar ein steigendes Interesse an Blogs feststellen (der <a href="http://www.gbv.de/vgm/info/termine/2007/2007_3102">nächste Workshop</a> war nach kurzer Zeit ausgebucht), aber so richtig ist bei der Mehrheit noch nicht angekommen, dass hier eine mit der Einführung des Buchdrucks oder Erfindung von Zeitschriften vergleichbare Evolution im Gange ist. Ansonsten sollten doch viel mehr Bibliotheken damit beginnen Weblogs zu Sammeln, Erschließen, Verfügbar zu machen und zu Archivieren. </p>
<p>Anstatt erstmal darüber zu diskutieren, in welche MAB-Spezialfelder die Daten kommen und als was für eine Mediengatung Weblogs gelten, müsste nur mal einer der existierenden Open Source-<a href="http://de.wikipedia.org/wiki/Feedreader">Feedreader</a> aufgebohrt werden, so dass er im großen Maßstab auf einem oder mehreren Servern läuft und wenigstens jene Feeds sammelt, die irgend ein Bibliothekar mal als sammlungswürdig eigestuft hat. Alles was wohlgeformtes XML und mit einem Mindestsatz an obligatorischen Elementen (Autor [Zeichenkette], Titel [Zeichenkette], Datum [ISO 8061], Inhalt [Zeichenkette]) ausgestattet ist, dürfte doch wenigstens so archivierbar sein, dass sich der wesentliche Teil rekonstruieren lässt &#8211; Besonderheiten wie HTML-Inhalte, Kategorien und Kommentare können ja später noch dazu kommen, wenn die Infrastruktur (Harvester zum Sammeln, Speicher zum Archivieren, Index zum Erschließen und eine Lesemöglichkeit zum Verfügbar machen) steht.</p>
<p>Für die Millionen von Blogartikeln, die bislang verloren sind (abgesehen von den nicht für die Archivierung zur Verfügung stehenden Blogsuchmaschinen wie <a href="http://www.bloglines.com">Bloglines</a>, <a href="http://www.technorati.com/">Technorati</a>, <a href="http://blogsearch.google.de/">Google Blogsearch</a>, <a href="">Blogdigger</a> etc.) gibt es zumindest teilweise Hoffnung:</p>
<p>Im September wurde <a href="http://rfc.net/rfc5005.html">RFC 5005</a>: <i>Feed Paging and Archiving</i> definiert eine (auch in RSS mögliche) Erweiterung des <a href="http://de.wikipedia.org/wiki/ATOM">ATOM-Formats</a>, bei der vom Feed der letzten Einträge auf die vorhergehenden Einträge und/oder ein Archiv verwiesen wird. Im Prinzip ist das schon <a href="http://www.xml.com/pub/a/2004/06/16/dive.html">länger möglich</a> und <a href="http://www.ibm.com/developerworks/library/x-tipatom2/">hier an einem Beispiel</a> beschrieben, aber jetzt wurde es nochmal etwas genauer spezifiziert. Damit ist ATOM eine echte Alternative zum <a href="http://www.openarchives.org/OAI/openarchivesprotocol.html">OAI-PMH</a>, das zwar der Bibliothekswelt etwas näher steht, aber leider auch noch etwas stiefmütterlich behandelt wird.</p>
<p>Wie auch immer: Bislang werden Blogs nicht systematisch und dauerhaft für die Nachwelt gesammelt und falls Bibliotheken überhaupt eine Zukunft haben, sind sie die einzigen Einrichtungen die dafür wirklich in Frage kommen. Dazu sollte in den nächsten Jahren aber die &#8220;Erwerbung&#8221; eines Blogs für den Bibliotheksbestand ebenso vertraut werden wie die Anschaffung eines Buches oder einer Zeitschrift. Meinetwegen können dazu auch DFG-Anträge zur &#8220;Sammlung und Archivierung des in Form von Weblogs vorliegenden kulturellen Erbes&#8221; gestellt werden, obgleich ich diesem Projektwesen eher skeptisch gegenüber bin: Die Beständige Weiterentwicklung von Anwendungen als Open Source bringt mehr und es wird auch weniger häufig das Rad neu erfunden.</p>
<p>P.S.: Auf der <a href="http://info-deposit.d-nb.de/">Informationsseite der DNB zur Sammlung von Netzpublikationen</a> findet sich zu Weblogs <a href="http://services.d-nb.de/search/search-service?service=sitesearch&#038;query=weblogs">noch nichts</a> &#8211; es liegt also an jeder einzelnen Bibliothek, sich mal Gedanken über die Sammlung von für Sie relevanten Weblogs zu machen.</p>
]]></content:encoded>
			<wfw:commentRss>http://jakoblog.de/2007/10/19/weblogs-sammeln-erschliessen-verfuegbar-machen-und-archivieren/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Syndication and Harvesting with RSS, ATOM, OAI-PMH and Sitemaps</title>
		<link>http://jakoblog.de/2007/09/28/syndication-and-harvesting-with-rss-atom-oai-pmh-and-sitemaps/</link>
		<comments>http://jakoblog.de/2007/09/28/syndication-and-harvesting-with-rss-atom-oai-pmh-and-sitemaps/#comments</comments>
		<pubDate>Fri, 28 Sep 2007 10:32:16 +0000</pubDate>
		<dc:creator>jakob</dc:creator>
				<category><![CDATA[de]]></category>
		<category><![CDATA[ATOM]]></category>
		<category><![CDATA[OAI]]></category>
		<category><![CDATA[Sitemaps]]></category>

		<guid isPermaLink="false">http://jakoblog.de/2007/09/28/syndication-and-harvesting-with-rss-atom-oai-pmh-and-sitemaps/</guid>
		<description><![CDATA[On my quest for metadata formats and APIs I found that ATOM is not just another RSS but more like a simple database language. Google&#8217;s Data API GData strongly pushes ATOM forward (but may also introduce some problems). Jim Downing wrote about ATOM, OAI-PMH, and Sitemaps &#8211; three different ways to provide a list of [...]]]></description>
			<content:encoded><![CDATA[<p>On my quest for metadata formats and APIs I found that <a href="http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-17.html">ATOM</a> is not just another RSS but more like a simple database language. Google&#8217;s Data API <a href="http://code.google.com/apis/gdata/">GData</a> strongly pushes ATOM forward (but may also introduce some problems). Jim Downing <a href="http://wwmm.ch.cam.ac.uk/blogs/downing/?p=101">wrote about ATOM, OAI-PMH, and Sitemaps</a> &#8211; three different ways to provide a list of all the resources in a collection, and to incrementally discover changes. OAI-PMH is much less prominent, but why?</p>
<p>Andy Powell started a <a href="http://efoundations.typepad.com/efoundations/2007/06/repositories_ro.html">very lightening discussion</a> with <a href="http://efoundations.typepad.com/efoundations/2007/06/the_repository_.html">his talk</a> at the  <a href="http://efoundations.typepad.com/efoundations/2007/06/the_repository_.html">JISC Digital repositories conference 2007</a>. He complains that repositories are partly missing the web &#8211; popular we-could-also-call-them-repositories like Flickr, Slideshare, YouTube, Scribd etc. don&#8217;t use OAI-PMH nor does Google support it. Following the discussion I ask myself what the differences are between scholarly communication and people uploading and mixing any popular content. And do the differences justify different methods of syndication and harvesting? Have a look at the comments by Herbert van de Sompel and Erik Hetzner!</p>
]]></content:encoded>
			<wfw:commentRss>http://jakoblog.de/2007/09/28/syndication-and-harvesting-with-rss-atom-oai-pmh-and-sitemaps/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
