2009 Mai « Jakoblog — Das Weblog von Jakob VoÃŸ

Blog
About

Empfehlungsdienste fÃ¼r Bibliotheken und Informationseinrichtungen

24. Mai 2009 um 11:32 Keine Kommentare

Im Handbuch „Erfolgreiches Management von Bibliotheken und Informationseinrichtungen“ (hrsg. von Hans-Christoph Hobohm und Konrad Umlauf im Verlag DashÃ¶fer seit 2002) ist vor einiger Zeit das Kapitel Ã¼ber Empfehlungsdienste (recommender/recommendation systems) von mir erschienen, das ich nun in einer angepassten Version auf E-LIS online gestellt habe. ErklÃ¤rt werden die grundlegenden Formen von Empfehlungsdiensten (explizit, inhaltsbasiert, regel- und wissensbasiert, verhaltensbasiert), ihre Entwicklung in Komponenten (Datengrundlage, Verfahren, Dienst, Anwendung) sowie eine Reihe von Beispielen. Mit „Empfehlungssysteme: Grundlagen, Konzepte und Systeme“ von AndrÃ© Klahold (InterRed GmbH) gibt es inzwischen auch ein deutschsprachiges Handbuch zum Thema. Eine verstÃ¤ndliche ErklÃ¤rung des collaborative filtering (einer Form von Empfehlungsdiensten) gibt der „elektrische Reporter“.

Die redaktionelle Betreuung beim DashÃ¶fer-Verlag wusste ich beim Verfassen des Artikels sehr zu schÃ¤tzen; die Form der VerÃ¶ffentlichung als Loseblattsammlung – einer grobschlÃ¤chtigen Vorform des Wikis – ist jedoch ziemlich unpraktisch. HÃ¤tten Hobohm und Umlauf nur einige Jahre gewartet, wÃ¤ren Sie wahrscheinlich selber darauf gekommen, dass solche Nachschlagewerke – zumal wenn sie regelmÃ¤ÃŸig aktualisiert werden sollen – nur noch in elektronischer Form sinnvoll sind. Das muss ja nicht bedeuten, dass es keine druckbare Version gibt und dass jeder frei Ã¤ndern kann, was er mÃ¶chte. Erstens ist es jedoch ziemlich mÃ¼hsam, unnÃ¶tig Word-Dokumente hin-und-herzuschicken, umzulayouten, auszudrucken, zu verteilen und einsortieren zu lassen, nur damit sie spÃ¤ter herausgesucht und kopiert werden kÃ¶nnen. Und zweitens glaube ich, dass der Artikel Ã¼ber Empfehlungsdienste in Form eines Wikipedia-Artikels insgesamt fÃ¼r mehr Leser von Nutzen wÃ¤re.

Nun ja, der Wandel des Publikationswesens dauert wohl doch etwas lÃ¤nger als erwartet. Schade nur, dass Bibliotheks- und Informationseinrichtungen hierbei nicht unbedingt weitsichtig voran- sondern bestenfalls mitgehen. In Zukunft mÃ¶chte ich zur betreuten Erstellung von regelmÃ¤ÃŸig aktualisierten Werken jedenfalls nur noch Wikis oder Ã¤hnliche Publikationssysteme nutzen (Google Docs ist schon mal ein Anfang). Aber vielleicht bin ich auch einfach etwas zu ungeduldig. 🙂

Tags: Empfehlungsdienste, Services Keine Kommentare

Aktuelles zur Zeitschriftendatenbank

21. Mai 2009 um 12:33 Keine Kommentare

Wie JÃ¼rgen Plieninger berichtet, wurde Anfang dieser Woche die Die Webseite der Zeitschriftendatenbank (ZDB) Ã¼berarbeitet und auf das Content-Management-System (CMS) Typo3 umgestellt. Das Discovery-Interface (aka OPAC) der ZDB bleibt von den Ã„nderungen unberÃ¼hrt. Um daran nachhaltig etwas zu verbessern, sollten sich meiner Meinung nach PICA-Anwender mal zusammentun und auf Open-Source-Basis (!) ein neues Interface fÃ¼r PICA-Kataloge erstellen. Die DNB hat beispielsweise ein eigenes Portal aufgesetzt und an der VZG werden verschiedenen Alternativen ausprobiert – z.B. die Suchkiste – aber zusammengenommen ist das noch zu wenig und zu unkoordiniert. Aber das ist ein Anderes Thema.

Ãœber den RSS-Feed der ZDB-Webseite erfÃ¤hrt man zum Beispiel, dass vor kurzem die CD-ROM-Ausgabe eingestellt wurde – die ZDB ist also endgÃ¼ltig im Web angekommen. Um sozusagen auch im „Semantic Web“ oder „Web 3.0“ anzukommen, d.h. um auch mit den aktuellen Entwicklungen des Webs Schritt zu halten, sollt die ZDB als nÃ¤chstes Linked Open Data tauglich werden. Dazu mÃ¼ssen in erster Linie stabile URIs vergeben und die ZDB-Daten verfÃ¼gbar gemacht werden. Zweitens gibt es anscheinend auch in technischen Angelegenheiten nie genÃ¼gend Dokumentation und Ã–ffentlichkeitsarbeit. Auf der Mailingliste zur Bibliographic Ontology wird schon seit mehreren Wochen darÃ¼ber diskutiert, wie sich Zeitschriftendaten am besten in RDF abbilden lassen und auf welche Daten dabei zurÃ¼ckgegriffen werden kann. Die ZDB wurde dabei zwar schon erwÃ¤hnt, ist aber in der Diskussion noch nicht aktiv in Erscheinung getreten.

Tags: ISSN, Semantic Web, ZDB, Zeitschriften Keine Kommentare

Unique Identifiers for Authors, VIAF and Linked Open Data

20. Mai 2009 um 15:53 1 Kommentar

The topic of unique identifiers for authors is getting more and more attention on the Web. Martin Fenner listed some research papers about it and did a quick poll – you can see the results in a short presentation [via infobib]. What striked me about the results is how unknown existing traditional identifier systems for authors are: Libraries manage so called „authority files“ since years. The German Wikipedia has a cooperation with the German National Library to link biliographic Wikipedia articles [de] with the German name authority file since 2005 and there is a similar project in the Czech Wikipedia.

Maybe name authority files of libraries are so unknown because they have not been visible on the Web – but this changes. An important project to combine authority files is the Virtual International Authority File (VIAF). At the moment it already contains mappings between name authority files of six national libraries (USA, Germany, France, Sweden, Czech Republic, and Israel) and more are going to be added. At an ELAG 2008 Workshop in Bratislava I talked with VIAF project manager Thomas Hickey (OCLC) about also getting VIAF and its participating authority files into the Semantic Web. He just wrote about recent changes in VIAF: by now it almost contains 8 million records!

So why are people thinking about creating other systems of unique identifiers for authors if there already is an infrastructure? The survey that Martin did showed, that a centralized registry is wished. VIAF is an aggregator of distributed authority files which are managed by national libraries. This architecture has several advantages, for instance it is non-commercial and data is managed where it can be managed best (Czech librarians can better identify Czech authors, Israeli librarians can better identify authors from Israel, and so on). One drawback is that libraries are technically slow – many of them have not really switched to the Web and the digital age. For instance up to now there are no official URIs for Czech and Israeli authority records and VIAF is not connected yet to Linked Open Data. But the more people reuse library data instead of reinventing wheels, the faster and easier it gets.

For demonstration purpose I created a SeeAlso-wrapper for VIAF that extracts RDF triples of the mappings. At http://ws.gbv.de/seealso/viafmappings you can try out by submitting authority record URIs or the authority record codes used at VIAF. For instance a query for LC|n 79003362 in Notation3 to get a mapping for Goethe. Some returned URIs are also cool URLs, for instance at the DNB or the VIAF URI itself. At the moment owl:sameAs is used to specify the mappings, maybe the SKOS vocabulary provides better properties. You can argue a lot about how to encode information about authors, but the unique identifiers – that you can link to – already exist!

Tags: Identifier, Normdaten, Semantic Web, VIAF 1 Kommentar

Petition gegen Internetzensur unterzeichnen!

18. Mai 2009 um 23:57 Keine Kommentare

Obgleich schon in vielen Blogs und anderen Internetmedien berichtet wurde Ã¼ber die PlÃ¤ne zur Internetzensur unter dem Vorwand der BekÃ¤mpfung von Kinderpornografie, mÃ¶chte ich die Sache selber kurz zusammenfassen fÃ¼r alle, die die Petition an den Bundestag noch nicht unterzeichnet haben:

Die Petition richtet sich gegen ein geplantes Gesetz der Fraktionen der CDU/CSU und SPD, nach dem das Bundeskriminalamt (BKA) eine geheime Liste von Webseiten verwalten soll, die von Internetprovidern gesperrt werden mÃ¼ssen. Statt der Seiten soll ein Stoppschild angezeigt und der Zugriff auf das Stoppschild protokolliert werden.

Wie sinnlos und gefÃ¤hrlich das ist, wird unter Anderem in einem YouTube-Video mit LEGO demonstriert: Das Internet ist wie eine FuÃŸgÃ¤ngerzone, bei der die Webseiten den Schaufenstern entsprechen. Das BKA soll nun das Recht bekommen, vor jedes Schaufenster seiner Wahl ein Stoppschild anzubringen und alle Passanten zu filmen, die vor dem Stoppschild stehenbleiben. Dabei lÃ¤sst sich das Stoppschild mit wenig technischem Aufwand (oder Ã¼ber die Seitengassen) ganz einfach umgehen. Welche Schaufenster verdeckt werden, ist geheim. So werden unliebsame Webseiten einfach zensiert, anstatt gegen die Seitenbetreiber mit rechtlichen Mitteln vorzugehen: Wegschauen statt helfen und alles im Namen des Kinderschutz!

Statt tatsÃ¤chlich gegen Kindesmissbrauch vorzugehen werden also mit dem Gesetz und mit Ã¤hnlichen MaÃŸnahmen Grundrechte abgeschafft. Im Februar gab es beispielsweise eine Hausdurchsuchung bei einem Blog-Betreiber, der einen Link auf einen anderen Blog gesetzt hatte, der wiederum auf eine der geheimen Sperrlisten bei WikiLeaks verlinkt hatte. Da im Internet Ã¼ber einige Links Alles mit Allem Verbunden ist, kann praktisch jeder mit einer willkÃ¼rlichen Hausdurchsuchung rechnen.

Eine weitere gute Zusammenfassungen gibt es hier, hier beim CCC und Ã¼bersichtlich unter www.zeichnemit.de. AbschlieÃŸend nochmal der Text der Petition und ihre BegrÃ¼ndung. Bitte unterzeichnet und sagt es weiter – auch auÃŸerhalb des Internet:

Wir fordern, daÃŸ der Deutsche Bundestag die Ã„nderung des Telemediengesetzes nach dem Gesetzentwurf des Bundeskabinetts vom 22.4.09 ablehnt. Wir halten das geplante Vorgehen, Internetseiten vom BKA indizieren & von den Providern sperren zu lassen, fÃ¼r undurchsichtig & unkontrollierbar, da die „Sperrlisten“ weder einsehbar sind noch genau festgelegt ist, nach welchen Kriterien Webseiten auf die Liste gesetzt werden. Wir sehen darin eine GefÃ¤hrdung des Grundrechtes auf Informationsfreiheit.

BegrÃ¼ndung: Das vornehmliche Ziel â€“ Kinder zu schÃ¼tzen und sowohl ihren MiÃŸbrauch, als auch die Verbreitung von Kinderpornografie, zu verhindern stellen wir dabei absolut nicht in Frage â€“ im Gegenteil, es ist in unser aller Interesse. Dass die im Vorhaben vorgesehenen MaÃŸnahmen dafÃ¼r denkbar ungeeignet sind, wurde an vielen Stellen offengelegt und von Experten aus den unterschiedlichsten Bereichen mehrfach bestÃ¤tigt. Eine Sperrung von Internetseiten hat so gut wie keinen nachweisbaren EinfluÃŸ auf die kÃ¶rperliche und seelische Unversehrtheit miÃŸbrauchter Kinder.

Tags: deutschland, polizeistaat, zensur Keine Kommentare

Who identifies the identifiers?

10. Mai 2009 um 16:39 7 Kommentare

A few days ago, after a short discussion on Twitter, Ross Singer posted a couple of open questions about identifiers for data formats on code4lib and other related mailing lists. He outlined the problem that several APIs like Jangle, unAPI, SRU, OpenURL, and OAI-PMH use different identifiers to specify the format of data that is transported (MARC-XML, Dublin Core, MODS, BibTeX etc.). It is remarable that all these APIs are more or less relevant only in the libraries sector while the issue of data formats and its identifiers is also relevant in other areas – looks like the ivory tower of library standards is still beeing build on.

The problem Ross issued is that there is little coordination and each standard governs its own registry of data format identifiers. An inofficial registry for unAPI [archive] disappeared (that’s why I started the discussion), there is a registry for SRU, a registry for OpenURL, and a list for Jangle. In OAI-PMH and unAPI each service hosts its own list of formats, OAI-PMH includes a method to map local identifier to global identifiers.

On code4lib several arguments and suggestions where raised which almost provoced me to a rant on library standards in general (everyone want’s to define but noone likes to implement and reuse. Why do librarians ignore W3C and IETF?). Identifiers for data formats should neither be defined by creators of transport protocols nor do we need yet another Ã¼ber-registry. In my point of view the problem is less technical but more social. Like Douglas Campbell writes in Identifying the identifiers, one of the rare papers on identifier theory: it’s not a technology issue but a commitment issue.

First there is a misconception about registries of data format identifiers. You should distinguish descriptive registries that only list identifiers and formats that are defined elsewhere and authoritative registries that define identifiers and formats. Yes: and formats. It makes no sense to define an identifier and say that is stands for data format X if you don’t provide a specification of format X (either via a schema or via a pointer to a schema). This already implies that the best actor to define a format identifier is the creator of the format itself.

Second local identifiers that depend on context are always problematic. There is a well-established global identifier system called Uniform Resource Identifier (URI) and there is no excuse not to use URIs as identifiers but incapability, dullness, laziness, or ignorance. The same reasons apply if you create a new identifier for a data format that already has one. One good thing about URI is that you can always find out who was responsible for creating a given identifier: You start with the URI Scheme and drill down the namespaces and standards. I must admin that this process can be laborious but at least it makes registries of identifiers descriptive for all identifiers but the ones in their own namespace.

Third you must be clear on the definition of a format. For instance the local identifier „MARC“ does not refer to a format but to many variants (USMARC, UNIMARC, MARC21…) and encodings (MARCXML/MARC21). This is not unusual if you consider that many formats are specializations of other formats. For instance ATOM (defined by RFC4287 and RFC5023, identified either its Mime Type „application/atom+xml“ which can could expressed as URI http://www.iana.org/assignments/media-types/application/atom%2Bxml or by its XML Namespace „http://www.w3.org/2005/Atom“)* is extended from XML (specified in http://www.w3.org/TR/xml [XML 1.0] and http://www.w3.org/TR/xml11 [XML 1.1], identified by this URLs or by the Mime Type „application/xml“ which is URI http://www.iana.org/assignments/media-types/application/xml)*.

The problem of identifying the right identifiers for data formats can be reduced to two fundamental rules of thumb:

1. reuse: don’t create new identifiers for things that already have one.

2. document: if you have to create an identifier describe its referent as open, clear, and detailled as possible to make it reusable.

If there happen to exist multiple identifiers for one thing, choose the one that is documented and adopted best. There will always be multiple identifiers for the same thing – don’t make it worse.

*Footnote: The identification of Internet Media Types with URIs that start with http://www.iana.org/assignments/media-types/ is neither widely used nor documented well but it’s the most official URI form that I could find. If for a particular format there is a better identifier – like an XML or RDF namespace – then you should use that, but if there is nothing but a Mime Type then there is no reason to create a new URI on your own.

Tags: Formats, Identifier, Standards 7 Kommentare

Jakoblog — Das Weblog von Jakob VoÃŸ

Empfehlungsdienste fÃ¼r Bibliotheken und Informationseinrichtungen

Aktuelles zur Zeitschriftendatenbank

Unique Identifiers for Authors, VIAF and Linked Open Data

Petition gegen Internetzensur unterzeichnen!

Who identifies the identifiers?

Neueste Beiträge

Neueste Kommentare

Blogroll

Feeds

Siehe auch

Jakoblog — Das Weblog von Jakob VoÃŸ

Empfehlungsdienste fÃ¼r Bibliotheken und Informationseinrichtungen

Aktuelles zur Zeitschriftendatenbank

Unique Identifiers for Authors, VIAF and Linked Open Data

Petition gegen Internetzensur unterzeichnen!

Who identifies the identifiers?

Neueste Beiträge

Neueste Kommentare

Themen

Blogroll

Feeds

Siehe auch