Wikimedia-Projekt Bibliographisch archivalische Datenbank

10. September 2011 um 10:51 1 Kommentar

In Nürnberg findet dieses Wochenende mit der WikiConvention 2011 eine Wikimedia/Wikipedia-Tagung statt – in mehr als 80 Workshops diskutieren die über 160 Teilnehmer über vielfältige Möglichkeiten und Probleme im Wikimedia-Universum. Heute morgen habe ich an der Veranstaltung von Olaf Simons zu einer geplanten
Bibliographisch archivalischen Datenbank teilgenommen. Das von Wikimedia Deutschland geförderte Projekt lässt sich in den größeren Rahmen der Idee von „WikiData“ einordnen.

Die Kernidee der „bibliographisch-archivalischen Datenbank“ scheint für Kenner des Bibliotheksbereiches etwas naiv, soll doch nichts weniger als ein „Internationaler Katalog und Recherechewerkzeug für alle Bücher aus einem begrenzten Erscheinungszeitraum“ erstellt werden. Andererseits tut ein wenig unkoventionelle Naivität ganz gut, um die traditionellen, geschlossenen Katalogstrukturen zu überwinden. Mit der Datenbank sollen bibliographische Daten als Forschungsdaten verwendet werden können, beispielsweise um sie nach nach Erscheinungsorten, -jahren und beteiligten Personen zu durchforsten, Inhalte zu annotieren und zu korrigieren, und neue Verbindungen und Visualisierungen herzustellen. Die bestehenden Kataloge wie VD16, VD17, VD18 oder im Englischsprachigen Raum ESTC können dazu eher als Steinbruch und Datengrundlage dienen – ohne kollaborative Funktionen und einfachen Datenexport bleiben solche Projekte jedoch zwangsläufig unter ihren Möglichkeiten.

Wie Olaf Simons berichtete, gab es bei ersten Gesprächen zwischen der Bodleian Library und Wikimedia-Vertretern einige Aha-Effekte. Dort – wie auch an einigen anderen Bibliotheken – gibt es zwar schon Bestrebungen, Nicht-Bibliothekare an bibliographischen Datenbanken zu beteiligen, vor allem durch Forscher für historische Bestände. Vergleicht man die Ansätze mit Lösungen aus dem Wikimedia-Universum, scheinen jedoch oft Räder neu erfunden zu werden. Gleichzeitig sind auch die Verfahren in Wikipedia historisch gewachsen und nicht immer auf andere Kontexte anwendbar. Etwas überrascht hat mich Simons Bericht darüber, wie unsicher bisherige Forschungsprojekte zu historischen Publikationsdaten sind: Bestand und Weiternutzung der Forschungsdaten an Bibliotheken sind nach Ablauf der Projektphase in der Regel nicht möglich, was für die Bibliotheken aus meiner Sicht ein Armutszeugnis ist. Nicht nur aus diesem Grund ist es wahrscheinlich sinnvoller, die bibliographischen Datenbanken nicht primär an einer Bibliothekseinrichtung sondern bei Wikimedia, vergleichbar mit Wikisource. Im Gegensatz zu Wikisource sollte es jedoch nicht verschiedene Sprachversionen sondern gleich eine internationale Datenbank geben.

Im Verlauf des Workshop stelle Mathias Schindler einige bereits verwendete Mapping-Tools vor, mit denen Wikipedianer und andere Freiwillige schon mehrere Hunderttausend Verknüpfungen zwischen Personen, Bildern und Publikationen gefunden haben, zum Beispiel im Rahmen der Kooperation mit dem Bundesarchiv. Selbstverständlich konnten wir im kurzen WikiCon-Workshop kein fertiges Konzept für eine kollaborative, bibliographisch-archivalische Datenbank vorlegen. Sicher ist jedoch, dass wir früher oder später bibliographische Systeme haben werden, die wenig mit den geschlossenen Katalogsystemen von Bibliotheken zu tun haben. Um diese Vision voranzutreiben, sollen in dem von Olaf Simons geleiteten Projekt möglichst alle interessierten Personengruppen (Fachwissenschaftler, Wikimedia-Community, Open-Data Community, Techniker und Bibliothekare) zusammengebracht werden. Feedback ist ausdrücklich erwünscht!

Proposed changes in VIAF RDF

13. April 2011 um 13:42 2 Kommentare

The Virtual International Authority File (VIAF) is one of the distinguished showcases of international library community projects. Since more then five years, name authority files from different countries are mapped in VIAF. With VIAF you can look up records about authors and other people, and see which identifiers are used for the same person in different national library catalogs. For some people there are also links to bibliographic articles in Wikipedia (I think only English Wikipedia, but you can get some mappings to other Wikipedias via MediaWiki API), and I hope that there will be links to LibraryThing author pages, too.

However, for two reasons VIAF is not used as much as it could be: first not enough easy-to-understand documentation, examples, and simple APIs; and second difficulties to adopt technologies by potential users. Unfortunately the second reason is the larger barrier: many libraries cannot even provide a simple way to directly link to publications from and/or about a specific person, once you got the right person identifier from VIAF. If you cannot even provide such a fundamental method to link to your database, how should you be able to integrate VIAF for better retrieval? VIAF can do little about this lack of technical skills in libraries, it can only help integrating VIAF services in library software to some degree. This brings me to the other reason: you can always further improve documentation, examples, the design of you APIs, etc. to simplify use of your services. As a developer I found VIAF well documented and not very difficult to use, but there are many small things that could be made better. This is natural and a good thing, if you communicate with your users and adopt suggested changes, as VIAF does.

For instance yesterday Jeffrey A. Young, one of the developers behind VIAF at OCLC published a blog article about proposed changes to the RDF encoding of VIAF. I hope that other people will join the discussion so we can make VIAF more usable. There is also a discussion about the changes at the library linked data mailing list. And earlier this month, at the Code4Lib mailing list, there was a a controversial thread about the problems to map authority records that are not about people (see my statement here).

I appreciate the simplification of VIAF RDF and only disagree in some details. The current proposal is illustrated in this picture (copied from Jeffrey’s original article):

This looks straightforward, doesn’t it? But it only suits for simple one-to-one mappings. Any attempt to put more complex mappings into this scheme (as well as the existing VIAF RDF scheme) will result in a disaster. There is nothing wrong with simple one-to-one mappings, with SKOS you can even express different kinds of mappings (broader, narrower, exact, close), but you should not expect too much preciseness and detail. I wonder why at one side of the diagram links are expressed via foaf:focus and at the other side via owl:sameAs. In my opinion, as VIAF is about mapping authority files, all mapping links should use SKOS mapping properties. There is nothing wrong in declaring an URI like http://viaf.org/viaf/39377930/ to stand for both a foaf:Person, a rdaEnt:Person, and a skos:Concept. And the Webpage that gives you information about the person can also get the same URI (see this article for a good defense of the HTTP-303 mess). Sure Semantic Web purists, which still dream of hard artificial intelligence, will disagree. But in the end RDF data is alway about something instead of the thing itself. For practical use it would help much more to think about how to map complex concepts at the level of concept schemes (authority records, classifications, thesauri etc.) instead of trying to find a „right“ model reality. As soon as we use language (and data is a specific kind of language), all we have is concepts. In terms of RDF: using owl:Thing instead of skos:Concept in most cases is an illusion of control.

Unique Identifiers for Authors, VIAF and Linked Open Data

20. Mai 2009 um 15:53 1 Kommentar

The topic of unique identifiers for authors is getting more and more attention on the Web. Martin Fenner listed some research papers about it and did a quick poll – you can see the results in a short presentation [via infobib]. What striked me about the results is how unknown existing traditional identifier systems for authors are: Libraries manage so called „authority files“ since years. The German Wikipedia has a cooperation with the German National Library to link biliographic Wikipedia articles [de] with the German name authority file since 2005 and there is a similar project in the Czech Wikipedia.

Maybe name authority files of libraries are so unknown because they have not been visible on the Web – but this changes. An important project to combine authority files is the Virtual International Authority File (VIAF). At the moment it already contains mappings between name authority files of six national libraries (USA, Germany, France, Sweden, Czech Republic, and Israel) and more are going to be added. At an ELAG 2008 Workshop in Bratislava I talked with VIAF project manager Thomas Hickey (OCLC) about also getting VIAF and its participating authority files into the Semantic Web. He just wrote about recent changes in VIAF: by now it almost contains 8 million records!

So why are people thinking about creating other systems of unique identifiers for authors if there already is an infrastructure? The survey that Martin did showed, that a centralized registry is wished. VIAF is an aggregator of distributed authority files which are managed by national libraries. This architecture has several advantages, for instance it is non-commercial and data is managed where it can be managed best (Czech librarians can better identify Czech authors, Israeli librarians can better identify authors from Israel, and so on). One drawback is that libraries are technically slow – many of them have not really switched to the Web and the digital age. For instance up to now there are no official URIs for Czech and Israeli authority records and VIAF is not connected yet to Linked Open Data. But the more people reuse library data instead of reinventing wheels, the faster and easier it gets.

For demonstration purpose I created a SeeAlso-wrapper for VIAF that extracts RDF triples of the mappings. At http://ws.gbv.de/seealso/viafmappings you can try out by submitting authority record URIs or the authority record codes used at VIAF. For instance a query for LC|n 79003362 in Notation3 to get a mapping for Goethe. Some returned URIs are also cool URLs, for instance at the DNB or the VIAF URI itself. At the moment owl:sameAs is used to specify the mappings, maybe the SKOS vocabulary provides better properties. You can argue a lot about how to encode information about authors, but the unique identifiers – that you can link to – already exist!