OAI « Jakoblog — Das Weblog von Jakob VoÃŸ

Blog
About

Working group on digital library APIs and possible outcomes

13. April 2008 um 14:48 3 Kommentare

Last year the Digital Library Federation (DLF) formed the „ILS Discovery Interface Task Force„, a working group on APIs for digital libraries. See their agenda and the current draft recommendation (February, 15th) for details [via Panlibus]. I’d like to shortly comment on the essential functions they agreed on at a meeting with major library system (ILS) vendors. Peter Murray summarized the functions as „automated interfaces for offloading records from the ILS, a mechanism for determining the availability of an item, and a scheme for creating persistent links to records.“

On the one hand I welcome if vendors try to agree on (open) standards and service oriented architecture. On the other hand the working group is yet another top-down effort to discuss things that just have to be implemented based on existing Internet standards.

1. Harvesting: In the library world this is mainly done via OAI-PMH. I’d also consider RSS and Atom. To fetch single records, there is unAPI – which the DLF group does not mention. There is no need for any other harvesting API – missing features (if any) should be integrated into extensions and/or next versions of OAI-PMH and ATOM instead of inventing something new. P.S: Google Wave shows what to expect in the next years.

2. Search: There is still good old overblown Z39.50. The near future is (slightly overblown) SRU/SRW and (simple) OpenSearch. There is no need for discussion but for open implementations of SRU (I am still waiting for a full client implementation in Perl). I suppose that next generation search interfaces will be based on SPARQL or other RDF-stuff.

2. Availability: The announcement says: „This functionality will be implemented through a simple REST interface to be specified by the ILS-DI task group“. Yes, there is definitely a need (in december I wrote about such an API in German). However the main point is not the API but to define what „availability“ means. Please focus on this. P.S: DAIA is now available.

3. Linking: For „Linking in a stable manner to any item in an OPAC in a way that allows services to be invoked on it“ (announcement) there is no need to create new APIs. Add and propagate clean URIs for your items and point to your APIs via autodiscovery (HTML link element). That’s all. Really. To query and distribute general links for a given identifier, I created the SeeAlso API which is used more and more in our libraries.

Furthermore the draft contains a section on „Patron functionality“ which is going to be based on NCIP and SIP2. Both are dead ends in my point of view. You should better look at projects outside the library world and try to define schemas/ontologies for patrons and patron data (hint: patrons are also called „customer“ and „user“). Again: the API itself is not underdefined – it’s the data which we need to agree on.

Tags: API, ATOM, digital library, OAI, Seealso, SOA, Standards 3 Kommentare

Wikisource im DFG-Viewer dank Schnittstellen

31. März 2008 um 14:52 3 Kommentare

Der DFG-Viewer ist eine relativ neue Webanwendung zur Anzeige von Digitalisaten. Das von der Deutschen Forschungsgemeinschaft gefÃ¶rderte Projekt soll bei der Etablierung von Standards fÃ¼r Digitalisierungsprojekten helfen – und macht das dank Webservices und offener Standards schon recht gut.

AngestoÃŸen von einem Hinweis auf die Sammlung Ponickau an der ULB Sachsen-Anhalt und eine anschlieÃŸende Diskussion um die andauernden Verwirrungen bezÃ¼glich URI, URN, URL Identifikatoren und Lokatoren, habe ich mir den DFG-Viewer etwas nÃ¤her angesehen. Die Darstellung sieht nicht ganz so cool aus, wie bei The Open Library, dafÃ¼r gibt es offene Schnittstellen. Digitalisate kÃ¶nnen dem Viewer per OAI oder direkter URL im METS/MODS-Format Ã¼bergeben werden. Die einzelnen Seiten eines digitalisierten Buches und dessen innere Struktur (Gliederung) lassen sich dann durchblÃ¤ttern. Eine Volltextsuche ist anscheinend noch nicht implementiert und es fehlt eine eigene Zoom-Funktion; bislang ist es nur mÃ¶glich zwischen verschieden groÃŸen AuflÃ¶sungen zu wechseln, falls diese vom Repository ausgeliefert werden.

Ein Exemplar des auf INETBIB als Beispiel genannten Buches mit der VD17-Nummer 32:623995L ist in Halle digitalisiert vorhanden. Die Metadaten des Digitalisates kÃ¶nnen per OAI in METS/MODS abgerufen werden. Ãœbergibt man nun dem DFG-Viewer die URL, kann das Digitalisat im DFG-Viewer betrachtet werden. Im Moment ist noch ein Schritt Handarbeit notwendig, da im DFG-Viewer ein falscher (?) OAI-Server fÃ¼r Halle eingetragen ist, aber grundsÃ¤chtlich funktioniert das Mashup. 🙂

Statt spaÃŸeshalber eine METS-Datei mit Pornobildchen zusammenzustellen, um sie im DFG-Viewer anzeigen zu lassen, habe ich mir ein zufÃ¤lliges Digitalisat von Wikisource vorgenommen. In Wikisource gibt es fÃ¼r jedes Digitalisat eine Indexseite, auf der einige Metadaten und die Seiten der digitalisierten Vorlage aufgelistet sind. Aus dieser Seite kann eine METS/MODS-Datei erzeugt und an den DFG-Viewer geschickt werden. Zwei bis drei Stunden spÃ¤ter steht ein einfaches Perl-Skript, dass aus der Index-Seite in Wikisource eine METS-Datei erzeugt. Und so sieht es im DFG-Viewer aus (Draufklicken=grÃ¶ÃŸere Ansicht):

Das ganze ist nur ein schnell gehackter Proof-of-concept. Eine stabile Verwendung der Metadaten aus Wikisource sollte aus einer OAI-Schnittstelle bestehen, die METS/MODS liefert (und MABXML fÃ¼r ZVDD). Falls jemand Interesse hat (Bachelor/Diplomarbeit, eigenes Projekt etc.), biete ich gerne meine UnterstÃ¼tzung an – umsetzen muss er es jedoch erstmal jemand anderes da ich nicht dauernd nur neue Projekte anfangen kann. 🙁

Tags: Digitalisierung, Identifier, Katalog, Mashup, Metadata, METS, MODS, OAI, Wikipedia, wikisource 3 Kommentare

First draft of OAI-ORE

30. Dezember 2007 um 18:06 Keine Kommentare

„Web 3.0“ (or „Semantic Web“ – use the buzzword of your choice) is slowly on the raise. Two weeks ago the first public draft of OAI-ORE was published and Mike Giarlo published an OAI-ORE-Plugin for WordPress – I have not actually tried it, but as far as I understand one could add RFC 5005 to OAI-ORE to support large resource sets. Or is OAI-PMH enough? Well, in the end it depends on the availability of software libraries, client and the ease of connecting it with other services. After my fancy there are still too much generalized data models but we need concrete implementations – it was not RDF and OWL but Microformats that got the Web of data started (yes, we’re in it: the next hype after „Web 2.0“). For 2008 I wish less abstract meta-meta-meta-stuff but, more little usable applications and services that can be combined.

Tags: ATOM, OAI, OAI-ORE, Semantic Web Keine Kommentare

Relevant APIs for (digital) libraries

30. November 2007 um 14:50 5 Kommentare

My current impression of OCLC/WorldCat Service Grid is still far to abstract – instead of creating a framework, we (libraries and library associations) should agree upon some open protocols and (metadata) formats. To start with, here is a list of relevant, existing open standard APIs from my point of view:

Search: SRU/SRW (including CQL), OpenSearch, Z39.50

Harvest/Syndicate: OAI-PMH, RSS, Atom Syndication (also with ATOM Extensions)

Copy/Provide: unAPI, COinS, Microformats (not a real API but a way to provide data)

Upload/Edit: SRU Update, Atom Publishing Protocol

Identity Management: Shibboleth (and other SAML-based protocols), OpenID (see also OSIS)

For more complex applications, additional (REST)-APIs and common metadata standards need to be found (or defined) – but only if the application is just another kind of search, harvest/syndicate, copy/provide, upload/edit, or Identity Management.

P.S: I forgot NCIP, a „standard for the exchange of circulation data“. Frankly I don’t fully understand the meaning and importance of „circulation data“ and the standard looks more complex then needed. More on APIs for libraries can be found in WorldCat Developer Network, in the Jangle project and a DLF Working group on digital library APIs. For staying in the limited world if libraries, this may suffice, but on the web simplicity and availability of implementations matters – that’s why I am working on the SeeAlso linkserver protocol and now at a simple API to query availaibility information (more in August/September 2008).

P.P.S: A more detailed list of concrete library-related APIs was published by Roy Tennant based on a list by Owen Stephens.

P.P.S: And another list by Stephen Abram (SirsiDynix) from September 1st, 2009

Tags: API, ATOM, COinS, Identity Management, Microformats, OAI, OpenId, Shibboleth, Standards 5 Kommentare

Archiving Weblogs with ATOM and RFC 5005: An alternative to OAI-PMH

19. Oktober 2007 um 11:34 1 Kommentar

Following up to my recent post (in German) I had a conversation with my colleague about harvesting and archiving blogs and ATOM vs OAI-PMH. In my opinion with the recent RFC 5005 about Feed Paging and Archiving and its proposed extension of Archived Feeds ATOM can be an alternative to OAI-PMH. Instead of arguing which is better, digital libraries should support both for harvesting and providing archived publications such as preprints and weblog entries (scientific communication and publication already takes place in both).

Instead of having every project to implementing both protocols you could create a wrapper from ATOM with archived feeds to OAI-PMH and vice versa. The mapping from OAI-PMH to ATOM is probably the easier part: You partition the repository into chunks as defined in RFC 5005 with the from and until arguments of OAI-PMH. The mapping from OAI-PMH to ATOM is more complicated because you cannot select with timestamps. If you only specify a fromargument, the corresponding ATOM feed could be harvested going backwards in time but if there is an until argument you must harvest the whole archive just to get the first entries and and throw away the rest. Luckily the most frequent use case is to get the newest entries only. Anyway: Both protocols have their pros and cons and a two-way-wrapper could help both. Of course it should be implemented as open source so anyone can use it (by the way: There seems to be no OAI-crawler in Perl yet: Sure there is OAI-Harvester but for real-world applications you have to deal with unavailable servers, corrupt feeds, duplicated or deleted entries, and a way to save the harvested records, so a whole layer above the harvester is missing).

P.S.: At code4lib Ed Summers pointed me to Stuart Weibel who asked the same question about blog archiving, and to a discussion in John Udell’s blog that include blog archiving (he also mentions BlogML as a possible part of a solution – unluckily BlogML looks very dirty to me, the spec is here). And Daniel Chudnov drafted a blog mirroring architecture.

Tags: Archivierung, ATOM, BlogML, Feed, OAI 1 Kommentar

Weblogs Sammeln, ErschlieÃŸen, VerfÃ¼gbar machen und Archivieren

19. Oktober 2007 um 03:03 2 Kommentare

Ich Ã¤rgere mich ja schon seit lÃ¤ngerer Zeit, dass praktisch keine Bibliotheken Weblogs sammeln und archivieren, obwohl diese Mediengattung bereits jetzt teilweise die Funktion von Fachzeitschriften Ã¼bernimmt. Inzwischen kann ich unter den Kollegen zwar ein steigendes Interesse an Blogs feststellen (der nÃ¤chste Workshop war nach kurzer Zeit ausgebucht), aber so richtig ist bei der Mehrheit noch nicht angekommen, dass hier eine mit der EinfÃ¼hrung des Buchdrucks oder Erfindung von Zeitschriften vergleichbare Evolution im Gange ist. Ansonsten sollten doch viel mehr Bibliotheken damit beginnen Weblogs zu Sammeln, ErschlieÃŸen, VerfÃ¼gbar zu machen und zu Archivieren.

Anstatt erstmal darÃ¼ber zu diskutieren, in welche MAB-Spezialfelder die Daten kommen und als was fÃ¼r eine Mediengatung Weblogs gelten, mÃ¼sste nur mal einer der existierenden Open Source-Feedreader aufgebohrt werden, so dass er im groÃŸen MaÃŸstab auf einem oder mehreren Servern lÃ¤uft und wenigstens jene Feeds sammelt, die irgend ein Bibliothekar mal als sammlungswÃ¼rdig eigestuft hat. Alles was wohlgeformtes XML und mit einem Mindestsatz an obligatorischen Elementen (Autor [Zeichenkette], Titel [Zeichenkette], Datum [ISO 8061], Inhalt [Zeichenkette]) ausgestattet ist, dÃ¼rfte doch wenigstens so archivierbar sein, dass sich der wesentliche Teil rekonstruieren lÃ¤sst – Besonderheiten wie HTML-Inhalte, Kategorien und Kommentare kÃ¶nnen ja spÃ¤ter noch dazu kommen, wenn die Infrastruktur (Harvester zum Sammeln, Speicher zum Archivieren, Index zum ErschlieÃŸen und eine LesemÃ¶glichkeit zum VerfÃ¼gbar machen) steht.

FÃ¼r die Millionen von Blogartikeln, die bislang verloren sind (abgesehen von den nicht fÃ¼r die Archivierung zur VerfÃ¼gung stehenden Blogsuchmaschinen wie Bloglines, Technorati, Google Blogsearch, Blogdigger etc.) gibt es zumindest teilweise Hoffnung:

Im September wurde RFC 5005: Feed Paging and Archiving definiert eine (auch in RSS mÃ¶gliche) Erweiterung des ATOM-Formats, bei der vom Feed der letzten EintrÃ¤ge auf die vorhergehenden EintrÃ¤ge und/oder ein Archiv verwiesen wird. Im Prinzip ist das schon lÃ¤nger mÃ¶glich und hier an einem Beispiel beschrieben, aber jetzt wurde es nochmal etwas genauer spezifiziert. Damit ist ATOM eine echte Alternative zum OAI-PMH, das zwar der Bibliothekswelt etwas nÃ¤her steht, aber leider auch noch etwas stiefmÃ¼tterlich behandelt wird.

Wie auch immer: Bislang werden Blogs nicht systematisch und dauerhaft fÃ¼r die Nachwelt gesammelt und falls Bibliotheken Ã¼berhaupt eine Zukunft haben, sind sie die einzigen Einrichtungen die dafÃ¼r wirklich in Frage kommen. Dazu sollte in den nÃ¤chsten Jahren aber die „Erwerbung“ eines Blogs fÃ¼r den Bibliotheksbestand ebenso vertraut werden wie die Anschaffung eines Buches oder einer Zeitschrift. Meinetwegen kÃ¶nnen dazu auch DFG-AntrÃ¤ge zur „Sammlung und Archivierung des in Form von Weblogs vorliegenden kulturellen Erbes“ gestellt werden, obgleich ich diesem Projektwesen eher skeptisch gegenÃ¼ber bin: Die BestÃ¤ndige Weiterentwicklung von Anwendungen als Open Source bringt mehr und es wird auch weniger hÃ¤ufig das Rad neu erfunden.

P.S.: Auf der Informationsseite der DNB zur Sammlung von Netzpublikationen findet sich zu Weblogs noch nichts – es liegt also an jeder einzelnen Bibliothek, sich mal Gedanken Ã¼ber die Sammlung von fÃ¼r Sie relevanten Weblogs zu machen.

Tags: Archivierung, ATOM, Bibliothek, Feed, OAI, Web 2.0 2 Kommentare

Second day at MTSR

18. Oktober 2007 um 18:46 Keine Kommentare

It is already a week ago (conference blogging should be published immediately) so I better summarize my final notes of the MTSR conference 2007: Beitrag Second day at MTSR weiterlesen…

Tags: DCAP, digital library, Dublin Core, MTSR07, MTSR2007, OAI, ontology, Overlay Journal, RIOJA, Science 2.0, SOA, Tagging Keine Kommentare

Syndication and Harvesting with RSS, ATOM, OAI-PMH and Sitemaps

28. September 2007 um 12:32 Keine Kommentare

On my quest for metadata formats and APIs I found that ATOM is not just another RSS but more like a simple database language. Google’s Data API GData strongly pushes ATOM forward (but may also introduce some problems). Jim Downing wrote about ATOM, OAI-PMH, and Sitemaps – three different ways to provide a list of all the resources in a collection, and to incrementally discover changes. OAI-PMH is much less prominent, but why?

Andy Powell started a very lightening discussion with his talk at the JISC Digital repositories conference 2007. He complains that repositories are partly missing the web – popular we-could-also-call-them-repositories like Flickr, Slideshare, YouTube, Scribd etc. don’t use OAI-PMH nor does Google support it. Following the discussion I ask myself what the differences are between scholarly communication and people uploading and mixing any popular content. And do the differences justify different methods of syndication and harvesting? Have a look at the comments by Herbert van de Sompel and Erik Hetzner!

Tags: ATOM, OAI, Sitemaps Keine Kommentare

OAI Object Re-Use and Exchange (OAI-ORE)

27. April 2007 um 15:52 Keine Kommentare

Lambert verweist im netbib weblog auf das Projekt OAI Object Re-Use and Exchange (OAI-ORE) der Open Archives Initiative (OAI) aufmerksam gemacht, die uns schon vor sechs Jahren (sic!) das wunderbar einfache OAI Protocol for Metadata Harvesting (OAI-PMH) beschert hat. Leider ist OAI in der bibliothekarischen Praxis noch immer nicht so reibungslos eingebunden, wie es sein kÃ¶nnte. Data-Provider fallen immer mal wieder aus oder liefern kaputte Daten (was auf das gleiche hinauslÃ¤uft), Harvesting-Prozesse sind nicht in automatische Workflows eingebunden und die QualitÃ¤t der Metadaten ist … naja, was soll man erwarten wenn keine automatischen PrÃ¼froutinen installiert sind. Lambert weist auf die letzte Woche am CERN (wo Ã¼brigens das WWW erfunden wurde) stattgefundene Veranstaltung Agenda for the 5th Workshop on Innovations in Scholarly Communication hin, auf der unter Anderem Herbert Van De Sompel OAI-ORE prÃ¤sentiert (Video leider nur in einem bescheurten-Format). Bei der Gelegenheit habe ich den noch sehr Ã¼bersichtlichen Wikipedia-Eintrag zu OAI um OAI-ORE ergÃ¤nzt – ErgÃ¤nzungen und Korrekturen sind selbstverstÃ¤ndlich sehr willkommen. Der Verweis auf Social Software-Dienste im Zusammenhang mit ORE ist mir allerdings etwas zu allgemein. Eher relevant dÃ¼rfte die Technik im Rahmen von Projekten wie TextGrid und Vorhaben im FP7 der EU sein – dafÃ¼r ist OAI-ORE aber wahrscheinlich noch zu neu und innovativ und solange Deutschland es nicht gebacken bekommt, international mitzuwirken (Siehe Anzahl der deutschen Mitglieder in der OAI-ORE community) erwarte ich nicht, dass hier so schnell etwas passiert. Einige Gedanken von Pete Johnston zu ORE und Web Architecture finden sich in diesem Beitrag aus dem Januar.

Tags: Bibliothek, OAI, OAI-ORE Keine Kommentare

Jakoblog — Das Weblog von Jakob VoÃŸ