Semantic Web « Jakoblog — Das Weblog von Jakob VoÃŸ

Blog
About

On the way to a library ontology

11. April 2013 um 15:02 2 Kommentare

I have been working for some years on specification and implementation of several APIs and exchange formats for data used in, and provided by libraries. Unfortunately most existing library standards are either fuzzy, complex, and misused (such as MARC21), or limited to bibliographic data or authority data, or both. Libraries, however, are much more than bibliographic data – they involve library patrons, library buildings, library services, library holdings, library databases etc.

During the work on formats and APIs for these parts of library world, Patrons Account Information API (PAIA) being the newest piece, I found myself more and more on the way to a whole library ontology. The idea of a library ontology started in 2009 (now moved to this location) but designing such a broad data model from bottom would surely have lead to yet another complex, impractical and unused library standard. Meanwhile there are several smaller ontologies for parts of the library world, to be combined and used as Linked Open Data.

In my opinion, ontologies, RDF, Semantic Web, Linked Data and all the buzz is is overrated, but it includes some opportunities for clean data modeling and data integration, which one rarely finds in library data. For this reason I try to design all APIs and formats at least compatible with RDF. For instance the Document Availability Information API (DAIA), created in 2008 (and now being slightly redesigned for version 1.0) can be accessed in XML and in JSON format, and both can fully be mapped to RDF. Other micro-ontologies include:

Document Service Ontology (DSO) defines typical document-related services such as loan, presentation, and digitization
Simple Service Status Ontology (SSSO) defines a service instance as kind of event that connects a service provider (e.g. a library) with a service consumer (e.g. a library patron). SSSO further defines typical service status (e.g. reserved, prepared, executed…) and limitations of a service (e.g. a waiting queue or a delay
Patrons Account Information API (PAIA) will include a mapping to RDF to express basic patron information, fees, and a list of current services in a patron account, based on SSSO and DSO.
Document Availability Information API (DAIA) includes a mapping to RDF to express the current availability of library holdings for selected services. See here for the current draft.
A holdings ontology should define properties to relate holdings (or parts of holdings) to abstract documents and editions and to holding institutions.
GBV Ontology contains several concepts and relations used in GBV library network that do not fit into other ontologies (yet).
One might further create a database ontology to describe library databases with their provider, extent APIs etc. – right now we use the GBV ontology for this purpose. Is there anything to reuse instead of creating just another ontology?!

The next step will probably creation of a small holdings ontology that nicely fits to the other micro-ontologies. This ontology should be aligned or compatible with the BIBFRAME initiative, other ontologies such as Schema.org, and existing holding formats, without becoming too complex. The German Initiative DINI-KIM has just launched a a working group to define such holding format or ontology.

Tags: DAIA, DSO, Library, PAIA, Semantic Web, ssso 2 Kommentare

Links Sammeln und Verteilen mit BEACON

12. Juni 2012 um 12:45 9 Kommentare

Seit ich Ende letzten Jahres auf der Semantic Web in Bibliotheken (SWIB11) einen Vortrag zur Linkaggregation mit BEACON gehalten haben (hier der Mitschnitt) hat sich einiges getan.

Das BEACON-Format wurde ursprÃ¼nglich Anfang 2010 von Mathias Schindler als ad-hoc LÃ¶sung vorgeschlagen, um Ã¼ber Identifier der Gemeinsame Normdatei (GND) zwischen Wikipedia-Artikeln und passenden Webseiten in Personenlexika und Bibliothekskatalogen zu verlinken. Beispielsweise findet sich Literatur zu Tina Modotti im Katalog der Bayerischen Staatsbibliothek (BSB) unter folgender URL:

http://opacplus.bsb-muenchen.de/search?pnd=11858295X

Die URI des GND-Eintrags von Modotti ist:

http://d-nb.info/gnd/11858295X

Sofern die Links einheitlich aufgebaut sind, reicht fÃ¼r die VerknÃ¼pfung in einer BEACON-Datei die GND-Nummer 11858295X aus. ZusÃ¤tzlich kann beispielsweise die Anzahl der Treffer im BSB-Katalog (momentan acht) angegeben werden. Hier ein Beispiel fÃ¼r eine BEACON-Datei:

#FORMAT: BEACON
#PREFIX: http://d-nb.info/gnd/
#TARGET: http://opacplus.bsb-muenchen.de/search?pnd={ID}
#DESCRIPTION: Links auf Literatur zu Personen im Katalog der BSB
#MESSAGE: {annotation} EintrÃ¤ge im BSB-Katalog

11858295X|8

Diese einfache Form der Weitergabe von Links hat sich inzwischen durchgesetzt und es sind zahlreiche BEACON-Dateien verfÃ¼gbar. Wie bei ad-hoc Standards Ã¼blich, haben sich allerdings unterschiedliche Interpretationen und Erweiterungen von BEACON entwickelt. Wir sind deshalb dabei, BEACON endgÃ¼ltig exakt zu spezifizieren, um es schlieÃŸlich als Internet-Standard (RFC) zu verabschieden. Die Entwicklung kann auf github verfolgt werden, wobei der aktuelle Stand hier (HTML) bzw. hier (TXT) einsehbar ist.

Im wesentlichen muss zum VerstÃ¤ndnis von BEACON zwischen zwei Ebenen unterschieden werden: Ein BEACON Link Dump ist eine Menge von einheitlich aufgebautem Links, die ggf. mit einigen Metadaten angereichert ist. In welchem Format die Links gespeichert werden, ist davon unabhÃ¤ngig. Jeder Link besteht aus genau vier Teilen:

Einer Quelle (link source), beispielsweise der URL
http://d-nb.info/gnd/11858295X
Einem Ziel (link target), beispielsweise der URL
http://opacplus.bsb-muenchen.de/search?pnd=11858295X
Einem Beziehungstyp (link relation type), beispielsweise der URI
http://www.w3.org/2000/01/rdf-schema#seeAlso
Einer Anmerkung (link annotation), beispielsweise der Zeichenkette
8 EintrÃ¤ge im BSB-Katalog.

Der Beziehungstyp ist fÃ¼r alle Links in einem BEACON Link Dump gleich. Quelle, Ziel und Anmerkung kÃ¶nnen bei der Speicherung abgekÃ¼rzt werden. Die Form zur Speicherung und Weitergabe (Serialisierung) ist die Zweite Ebene von BEACON. Neben dem ursprÃ¼nglichen BEACON-Text-Format gibt es ein einfaches BEACON-XML-Format. Das oben angegebene Beispiel kÃ¶nnte in BEACON-XML folgendermaÃŸen ausgedrÃ¼ckt werden:

<?xml version="1.0" encoding="UTF-8"?>
<beacon xmlns="http://purl.org/net/beacon"
       prefix="http://d-nb.info/gnd/"
       target="http://opacplus.bsb-muenchen.de/search?pnd="
  description="Links auf Literatur zu Personen im Katalog der BSB"
      message="{annotation} EintrÃ¤ge im BSB-Katalog">
   <link source="11858295X" annotation="8" />
</beacon>

Daneben kÃ¶nnen Links aus BEACON auch nach RDF Ã¼bersetzt werden, was fÃ¼r die Anwendung als Linked Open Data von Bedeutung ist. Der Link in RDF/Turtle-Syntax (hier ohne Anmerkung) wÃ¤re bswp.:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://d-nb.info/gnd/11858295X>
rdfs:seeAlso 
<http://opacplus.bsb-muenchen.de/search?pnd=11858295X> .

Zum AusdrÃ¼cken der Anmerkung eines Links ist das Meta-Feld „qualifier“ vorgeschlagen, so dass sich BEACON Dumps auch vollstÃ¤ndig in RDF Ã¼bertragen lassen. In jedem Fall ist BEACON nicht auf GND-Nummern beschrÃ¤nkt und Quelle und Ziel mÃ¼ssen nicht zwangslÃ¤ufig eine gemeinsame ID verwenden. So stellt beispielsweise lobid.org ein Mapping zwischen Lobid-URIs und Wikipedia-Artikeln bereit. Die dabei verwendete Form von BEACON weicht noch etwas vom endgÃ¼ltigen BEACON-Standard ab. Auch aus diesem Grund benÃ¶tigen wir zum Aktuellen Entwurf des BEACON-Spezifkation noch Feedback und Korrekturleser.

Tags: BEACON, Linked Data, Semantic Web 9 Kommentare

Goethe erklÃ¤rt das Semantic Web

20. Mai 2012 um 15:49 4 Kommentare

Seit Google vor einigen Tagen den „Knowledge Graph“ vorgestellt hat, rumort es in der Semantic Web Community. Klaut Google doch einfach Ideen und Techniken die seit Jahren unter der Bezeichnung „Linked Data“ und „Semantic Web“ entwickelt wurden, und verkauft das ganze unter anderem Namen neu! Ich finde sowohl die Aufregung als auch die gedankenlose Verwendung von Worten wie „Knowledge“ und „Semantic“ auf beiden Seiten albern.

Hirngespinste von denkenden Maschinen, die „Fakten“ prÃ¤sentieren, als seien es objektive Urteile ohne soziale Herkunft und Kontext, sind nun eben Mainstream geworden. Dabei sind und bleiben es auch mit kÃ¼nstlicher Intelligenz immer Menschen, die darÃ¼ber bestimmen, was Computer verknÃ¼pfen und prÃ¤sentieren. Wie Frank Rieger in der FAZ gerade schrieb:

Es sind â€žunsere Maschinenâ€œ, nicht â€ždie Maschinenâ€œ. Sie haben […] kein Bewusstsein, keinen Willen, keine Absichten. Sie werden konstruiert, gebaut und eingesetzt von Menschen, die damit Absichten und Ziele verfolgen – dem Zeitgeist folgend, meist die Maximierung von Profit und Machtpositionen.

In abgeschwÃ¤chter Form tritt der Irrglaube von wissenden Computern in der Fokussierung auf „Information“ auf, wÃ¤hrend in den meisten FÃ¤llen stattdessen Daten verarbeitet werden. Statt eines „Knowledge Graph“ hÃ¤tte ich deshalb lieber einen „Document Graph“, in dem sich Herkunft und VerÃ¤nderungen von Aussagen zurÃ¼ckverfolgen lassen. Ted Nelson, der Erfinder des Hypertext hat dafÃ¼r die Bezeichnung „Docuverse“ geschaffen. Wie er in seiner Korrektur von Tim Berners-Lee schreibt: „not â€˜all the worldâ€™s informationâ€™, but all the worldâ€™s documents.“ Diese Transparenz liegt jedoch nicht im Interesse von Google; der Semantic-Web-Community ist sie die Behandlung von Aussagen Ã¼ber Aussagen schlicht zu aufwendig.

Laut lachen musste ich deshalb, als Google ein weiteres Blogposting zur Publikation von gewichteten Wortlisten mit einem Zitat aus Goethes Faust beginnen lÃ¤sst:

Yet in each word some concept there must be…

Im „Docuverse“ wÃ¤re dieses Zitat durch Transklusion so eingebettet, dass sich sich der Weg zum Original zurÃ¼ckverfolgen lieÃŸe. Hier der Kontext des Zitat von Wikisource:

Mephistopheles: […] Im Ganzen â€“ haltet euch an Worte! Dann geht ihr durch die sichre Pforte Zum Tempel der GewiÃŸheit ein.

SchÃ¼ler: Doch ein Begriff muÃŸ bey dem Worte seyn.

Mephistopheles: Schon gut! Nur muÃŸ man sich nicht allzu Ã¤ngstlich quÃ¤len; Denn eben wo Begriffe fehlen, Da stellt ein Wort zur rechten Zeit sich ein. Mit Worten lÃ¤ÃŸt sich trefflich streiten, Mit Worten ein System bereiten, An Worte lÃ¤ÃŸt sich trefflich glauben, Von einem Wort lÃ¤ÃŸt sich kein Jota rauben.

Die Antwort von Google (und nicht nur Google) auf den zitierten Einwand des SchÃ¼lers gleicht nÃ¤mlich bei nÃ¤herer Betrachtung der Antwort des Teufels, wobei das „System“ das uns hier „bereitet“ wird ein algorithmisches ist, das nicht auf Begriffen sondern auf Wortlisten und anderen statistischen Verfahren beruht.

In der Zeitschrift fÃ¼r kritische Theorie fÃ¼hrt Marcus Hawel zu eben diesem Zitat Goethes (bzw. Googles) aus, dass Begriffe unkritisch bleiben, solange sie nur positivistisch, ohne BerÃ¼cksichtigung des „Seinsollen des Dings“, das bestehende „verdoppeln“ (vgl. Adorno). Wenn Google, dem Semantic Web oder irgend einem anderen Computersystem jedoch normative Macht zugebilligt wird, hÃ¶rt der SpaÃŸ auf (und das nicht nur aufgrund der Paradoxien deontischer Logik). Mir scheint, es mangelt in der semantischen Knowledge-Welt an Sprachkritik, Semiotik und kritischer Theorie.

Tags: hypertext, Semantic Web, xanadu 4 Kommentare

Die Grenzen des Semantic Web

2. November 2011 um 18:42 4 Kommentare

Es gibt mehrere GrÃ¼nde dafÃ¼r, warum das Semantic Web, so wie es vor etwa zehn Jahren vorgeschlagen wurde, nicht funktioniert. Die wesentlichen Kritiken sind bereits vor mehreren Jahren vorgebracht worden und haben seitdem nichts von ihrer GÃ¼ltigkeit verloren. Inzwischen ist deshalb eher von „Linked Data“ statt von „semantisch“ die Rede, ohne jedoch auf die Werbewirkung von „semantischen Technologien“ zu verzichten.

Aufgrund der hohen Erwartungen, die so am Leben erhalten werden, gibt es immer wieder Erstaunen, wenn die Versprechen eingelÃ¶st werden sollen. Letzt Woche wurde beispielsweise von einer Praxis-Studie berichtet, bei der einige einfache Fragen mit verknÃ¼pften RDF-Daten beantwortet werden sollten (Reck, Ronald P., Kenneth B. Sall and Wendy A. Swanbeck: Determining the Impact of Eric Clapton on Music Using RDF Graphs: Selected Challenges of Semantics Across and Within Datasets. Balisage 2011). Die Studie erinnerte mich an den vergeblichen Versuch im letzten Jahr, eine einfache Frage mit Linked Data zu beantworten. Schuld sind anscheinend die uneinheitlichen und inkonsistenten Daten. Genaugenommen sind es aber die Menschen und die RealitÃ¤t, welche sich einfach nicht an starre Schemas und Regeln halten wollen, sondern in unzÃ¤hlige EinzelfÃ¤lle zerfallen. Deshalb ist der Versuch, menschliches BeurteilungsvermÃ¶gen automatisieren zu kÃ¶nnen, eine Illusion.

Die Grenzen des Semantic Web liegen dort, wo Menschen verschiedene Quellen beurteilen und aus unterschiedlichen Informationen Schlussfolgerungen ziehen. Diese Schlussfolgerungen haben aber wenig mit automatischen Schlussfolgerung und Inferenz-Regeln zu tun, sondern mit dem gesunden Menschenverstand und persÃ¶nlichen Entscheidungen. Kein noch so ausgeklÃ¼geltes System kann uns die Aufgabe abnehmen, selber den Verstand zu benutzen.

Wie die Studien zeigen, fÃ¼hrt der Versuch, denken zu automatisieren, im Semantic Web zu sinnlosen und falschen Ergebnissen. Dies passiert umso schneller, je mehr Daten aus verschiedenen Quellen zusammengefÃ¼hrt, und ohne Nachzudenken (d.h. automatisch) mit Schlussfolgerungsregeln zu weiteren Daten verarbeitet werden („Six degrees of fallacy“). Deshalb ist es sinnvoller, Quellen einzeln und gezielt auszuwÃ¤hlen. Dies gilt vor allem fÃ¼r die Auswahl von Ontologien und automatischen Ableitungsregeln. Dass dabei Ontologie je nach Anwendungsfall umgedeutet und verÃ¤ndert werden, ist unumgÃ¤nglich. Andernfalls mÃ¼sste fÃ¼r jede Anwendung eine komplett eigene Ontologie erstellt werden.

Trotz aller Kritik halte ich Semantic Web und Linked Data jedoch nicht fÃ¼r Mythen vom Paradies auf Erden: Solange man sich darÃ¼ber bewusst ist, dass sich Menschen nicht grundsÃ¤tzlich Ã¤ndern lassen, ist es nicht nur legitim sondern unverzichtbar,
daran zu arbeitem dem Paradies nÃ¤her zu kommen. Das heisst nicht, dass wir irgendwann im Semantischen Datenhimmel ankommen; zumindest lassen sich aber einige Probleme der Aggregation von Metadaten mit RDF etwas abmildern – nicht mehr und nicht weniger.

Tags: AI, Semantic Web 4 Kommentare

Proposed changes in VIAF RDF

13. April 2011 um 13:42 2 Kommentare

The Virtual International Authority File (VIAF) is one of the distinguished showcases of international library community projects. Since more then five years, name authority files from different countries are mapped in VIAF. With VIAF you can look up records about authors and other people, and see which identifiers are used for the same person in different national library catalogs. For some people there are also links to bibliographic articles in Wikipedia (I think only English Wikipedia, but you can get some mappings to other Wikipedias via MediaWiki API), and I hope that there will be links to LibraryThing author pages, too.

However, for two reasons VIAF is not used as much as it could be: first not enough easy-to-understand documentation, examples, and simple APIs; and second difficulties to adopt technologies by potential users. Unfortunately the second reason is the larger barrier: many libraries cannot even provide a simple way to directly link to publications from and/or about a specific person, once you got the right person identifier from VIAF. If you cannot even provide such a fundamental method to link to your database, how should you be able to integrate VIAF for better retrieval? VIAF can do little about this lack of technical skills in libraries, it can only help integrating VIAF services in library software to some degree. This brings me to the other reason: you can always further improve documentation, examples, the design of you APIs, etc. to simplify use of your services. As a developer I found VIAF well documented and not very difficult to use, but there are many small things that could be made better. This is natural and a good thing, if you communicate with your users and adopt suggested changes, as VIAF does.

For instance yesterday Jeffrey A. Young, one of the developers behind VIAF at OCLC published a blog article about proposed changes to the RDF encoding of VIAF. I hope that other people will join the discussion so we can make VIAF more usable. There is also a discussion about the changes at the library linked data mailing list. And earlier this month, at the Code4Lib mailing list, there was a a controversial thread about the problems to map authority records that are not about people (see my statement here).

I appreciate the simplification of VIAF RDF and only disagree in some details. The current proposal is illustrated in this picture (copied from Jeffrey’s original article):

This looks straightforward, doesn’t it? But it only suits for simple one-to-one mappings. Any attempt to put more complex mappings into this scheme (as well as the existing VIAF RDF scheme) will result in a disaster. There is nothing wrong with simple one-to-one mappings, with SKOS you can even express different kinds of mappings (broader, narrower, exact, close), but you should not expect too much preciseness and detail. I wonder why at one side of the diagram links are expressed via foaf:focus and at the other side via owl:sameAs. In my opinion, as VIAF is about mapping authority files, all mapping links should use SKOS mapping properties. There is nothing wrong in declaring an URI like http://viaf.org/viaf/39377930/ to stand for both a foaf:Person, a rdaEnt:Person, and a skos:Concept. And the Webpage that gives you information about the person can also get the same URI (see this article for a good defense of the HTTP-303 mess). Sure Semantic Web purists, which still dream of hard artificial intelligence, will disagree. But in the end RDF data is alway about something instead of the thing itself. For practical use it would help much more to think about how to map complex concepts at the level of concept schemes (authority records, classifications, thesauri etc.) instead of trying to find a „right“ model reality. As soon as we use language (and data is a specific kind of language), all we have is concepts. In terms of RDF: using owl:Thing instead of skos:Concept in most cases is an illusion of control.

Tags: Identifier, rdf, Semantic Web, VIAF 2 Kommentare

Named Entity Recognition with DBPedia

15. Februar 2011 um 14:55 5 Kommentare

Yesterday the DBPedia team released DBPedia Spotlight, a named entity recognition service based on structured data extracted from Wikipedia. You can access the service via Web APIs or download the software as Open Source. I could not resist to feed Spotlight its own description:

DBpedia Spotlight is a tool for annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia. Text annotation has the potential of enhancing a wide range of applications including search, faceted browsing and navigation. By connecting text documents with DBpedia, our system enables a range of interesting use cases. For instance, the ontology can be used as background knowledge to display complementary information on web pages or to enhance information retrieval tasks. Moreover, faceted browsing over documents and customization of web feeds based on semantics become feasible. Finally, by following links from DBpedia into other data sources, the Linked Open Data cloud is pulled closer to the Web of Documents.

Pretty cool, isn’t it? Natural Language Processing (NLP) for information extraction seems to be the next hype after Web 2.0 and Semantic Web. I don’t neglect the innovative capabilities of DBPedia Spotlight and similar tools, but you should never forget that these are just tools, which won’t automatically solve information problems, or replace all other tools. Given the example above, there is little chance that an automatic system will extract you an exact topic of the text (for instance „named entity recognition based on data extracted from Wikipedia“) because this requires much background knowledge combining domain-specific expertise with common sense. By the way: as long as both Wikipedia and NLP-software is mainly written by white males, the result of will always mirror a limited world-view.

You can compare the results of Spotlight with similar open services:

I found little overlap between the different services. Spotlight seems to provide more results (depending on the Text) on an error rate between 10% and 30%. You could use such tools for automatic subject indexing based on abstracts and use the result at least for ranking. Unfortunately in library metadata we often have no full text or abstract to annotate. Furthermore many library entities have no DBPedia entry but catalogers create new authority records if needed. What do you think, named entity recognition and other NLP techniques can be used for in metadata land? Can we give up controlled subject indexing in libraries in favour of automatic NLP-based indexing on the one side and social tagging on the other? Or is room for all of these approaches, and how can you successfully combine them?

Tags: NLP, Semantic Web, Wikipedia 5 Kommentare

What is Semantic Information Retrieval?

19. August 2010 um 00:45 Keine Kommentare

The most fun part of my dissertation is when I can ~~procastinate~~ dig deeply to the foundation of computer and information science. Lately I tried to find out when the terms „file“ and the „directory“ were coined in its current sense. The first commercial disk drive was the IBM 350, introduced in 1956. It had the size of a wardrobe, stored 4.4 megabytes 6-bit-characters and could be leased for 3,200$/month. Instances of it were also called „files“. But user files first appeared in the early 1960s with the Compatible Time-Sharing System (CTSS), the earliest ancestor of Unix. You should watch this great video from 1964 in which Robert Fano talks about making computers accessible to people. A wonderful demonstration of one of the very first command lines of a multi-user system! The explicit aims and concepts of computer systems are very similar to today. The more I read about history of computing, the more it seems to be that all important concepts were developed in the 1960s and 1970s. The rest is just reinventing and application on a broader scale.

Robert Fano was director of project MAC, a laboratory that brought together pioneers in operating systems, artificial intelligence, and other areas of the emerging discipline computer science. I browsed the historical publications of the laboratory at MIT where you can find a report of CTSS. Also published at MAC in 1964, I stumbled upon Bertram Raphael’s PhD thesis. It is titled SIR: A COMPUTER PROGRAM FOR SEMANTIC INFORMATION RETRIEVAL and its abstracts sounds like todays Semantic Web propaganda:

This system demonstrates what can reasonably be called an ability to „understand“ semantic information. SIR’s semantic and deductive ability is based on the construction of an internal model, which uses word associations and property lists, for the relational information normally conveyed in conversational statements. […] The system has some capacity to recognize exceptions to general rules, resolve certain semantic ambiguities, and modify its model structure in order to save computer memory space.

The SIR expert system even seems to go beyong current RDF techniques in supporting exceptions. By the way Bertram Raphael was at MAC at the same time as Joseph Weizenbaum. Weizenbaum fooled expectations in articial intelligence with his program ELIZA that he created between 1964 and 1966. He later became an important critic of artificial intelligence and the application of computer technology in general. By the way we need more like him instead of well-meaning, megalomaniac technology evangelists. See the documentary Rebel at work about Weizenbaum or even better the promising film Plug & Pray!

So what is Semantic Information Retrieval? In short: bullshit. The term is also used independently for search indices on graph structured data (2009), digital libraries (1998) and more. But why bothering with words, meaning, and history if computers will surely „understand“ soon?

Tags: Dissertation, hype, Meaning, Semantic Web Keine Kommentare

Aktuelles zur Zeitschriftendatenbank

21. Mai 2009 um 12:33 Keine Kommentare

Wie JÃ¼rgen Plieninger berichtet, wurde Anfang dieser Woche die Die Webseite der Zeitschriftendatenbank (ZDB) Ã¼berarbeitet und auf das Content-Management-System (CMS) Typo3 umgestellt. Das Discovery-Interface (aka OPAC) der ZDB bleibt von den Ã„nderungen unberÃ¼hrt. Um daran nachhaltig etwas zu verbessern, sollten sich meiner Meinung nach PICA-Anwender mal zusammentun und auf Open-Source-Basis (!) ein neues Interface fÃ¼r PICA-Kataloge erstellen. Die DNB hat beispielsweise ein eigenes Portal aufgesetzt und an der VZG werden verschiedenen Alternativen ausprobiert – z.B. die Suchkiste – aber zusammengenommen ist das noch zu wenig und zu unkoordiniert. Aber das ist ein Anderes Thema.

Ãœber den RSS-Feed der ZDB-Webseite erfÃ¤hrt man zum Beispiel, dass vor kurzem die CD-ROM-Ausgabe eingestellt wurde – die ZDB ist also endgÃ¼ltig im Web angekommen. Um sozusagen auch im „Semantic Web“ oder „Web 3.0“ anzukommen, d.h. um auch mit den aktuellen Entwicklungen des Webs Schritt zu halten, sollt die ZDB als nÃ¤chstes Linked Open Data tauglich werden. Dazu mÃ¼ssen in erster Linie stabile URIs vergeben und die ZDB-Daten verfÃ¼gbar gemacht werden. Zweitens gibt es anscheinend auch in technischen Angelegenheiten nie genÃ¼gend Dokumentation und Ã–ffentlichkeitsarbeit. Auf der Mailingliste zur Bibliographic Ontology wird schon seit mehreren Wochen darÃ¼ber diskutiert, wie sich Zeitschriftendaten am besten in RDF abbilden lassen und auf welche Daten dabei zurÃ¼ckgegriffen werden kann. Die ZDB wurde dabei zwar schon erwÃ¤hnt, ist aber in der Diskussion noch nicht aktiv in Erscheinung getreten.

Tags: ISSN, Semantic Web, ZDB, Zeitschriften Keine Kommentare

Unique Identifiers for Authors, VIAF and Linked Open Data

20. Mai 2009 um 15:53 1 Kommentar

The topic of unique identifiers for authors is getting more and more attention on the Web. Martin Fenner listed some research papers about it and did a quick poll – you can see the results in a short presentation [via infobib]. What striked me about the results is how unknown existing traditional identifier systems for authors are: Libraries manage so called „authority files“ since years. The German Wikipedia has a cooperation with the German National Library to link biliographic Wikipedia articles [de] with the German name authority file since 2005 and there is a similar project in the Czech Wikipedia.

Maybe name authority files of libraries are so unknown because they have not been visible on the Web – but this changes. An important project to combine authority files is the Virtual International Authority File (VIAF). At the moment it already contains mappings between name authority files of six national libraries (USA, Germany, France, Sweden, Czech Republic, and Israel) and more are going to be added. At an ELAG 2008 Workshop in Bratislava I talked with VIAF project manager Thomas Hickey (OCLC) about also getting VIAF and its participating authority files into the Semantic Web. He just wrote about recent changes in VIAF: by now it almost contains 8 million records!

So why are people thinking about creating other systems of unique identifiers for authors if there already is an infrastructure? The survey that Martin did showed, that a centralized registry is wished. VIAF is an aggregator of distributed authority files which are managed by national libraries. This architecture has several advantages, for instance it is non-commercial and data is managed where it can be managed best (Czech librarians can better identify Czech authors, Israeli librarians can better identify authors from Israel, and so on). One drawback is that libraries are technically slow – many of them have not really switched to the Web and the digital age. For instance up to now there are no official URIs for Czech and Israeli authority records and VIAF is not connected yet to Linked Open Data. But the more people reuse library data instead of reinventing wheels, the faster and easier it gets.

For demonstration purpose I created a SeeAlso-wrapper for VIAF that extracts RDF triples of the mappings. At http://ws.gbv.de/seealso/viafmappings you can try out by submitting authority record URIs or the authority record codes used at VIAF. For instance a query for LC|n 79003362 in Notation3 to get a mapping for Goethe. Some returned URIs are also cool URLs, for instance at the DNB or the VIAF URI itself. At the moment owl:sameAs is used to specify the mappings, maybe the SKOS vocabulary provides better properties. You can argue a lot about how to encode information about authors, but the unique identifiers – that you can link to – already exist!

Tags: Identifier, Normdaten, Semantic Web, VIAF 1 Kommentar

Wo sich Bibliotheken nachschlagen lassen

3. März 2009 um 20:02 3 Kommentare

Katalogisieren, also das einheitliche Erfassen von DatensÃ¤tzen gehÃ¶rt (zumindest noch) zu den typischen von Bibliotheken erbrachten TÃ¤tigkeiten. Und da Bibliotheken sich gerne mit sich selbst beschÃ¤ftigen ist es auch nicht erstaunlich, dass sie Kataloge angelegt haben, in denen Bibliotheken verzeichnet sind. Leider kocht jedoch jeder sein eigenes SÃ¼ppchen, so dass zahlreiche, sich Ã¼berschneidende Verzeichnisse und Datenbanken von Bibliotheken existieren, die mehr schlecht als recht gepflegt sind und sich deshalb teilweise widersprechen. Sobald sich etwas Ã¤ndert oder hinzukommt, mÃ¼ssen die Angaben theoretisch in zig Datenbanken aktualisiert werden – was in der Praxis natÃ¼rlich nicht passiert. Das muss nicht so sein.

Das Semantic Web ist dazu entwickelt worden, verteilte DatenbestÃ¤nde Ã¼ber das Web miteinander zu verbinden. Sobald Informationen nicht mehr nur in voneinander abgeschotteten Datensilos verwaltet werden, sondern offen im Netz als Linked Data verÃ¶ffentlicht sind, reicht es in vielen FÃ¤llen aus, auf andere Datenbanken zu verweisen und die Daten mit eigenen Angaben anzureichern. Als gemeinsamer Identifikator zur VerknÃ¼pfung von Daten Ã¼ber Bibliotheken eignet sich das ehemalige Bibliothekssigel, das derzeit auf ISIL umgestellt wird. Ein Vorteil des ISIL-Systems ist, dass ISIL international gelten. Die ISIL-Agency verwaltet eine Liste von nationalen ISIL-Einrichtungen, zu denen auch das ISIL/Sigelverzeichnis an der Staatsbibliothek zu Berlin gehÃ¶rt.

Weitere Verzeichnisse von Bibliotheken sind unter Anderem:

Deutsche Bibliotheken Online ist ein Verzeichnis des Hochschulbibliothekszentrum hbz.
Das Jahrbuch der Deutschen Bibliotheken und das Jahrbuch der Ã–ffentlichen Bibliotheken enthÃ¤lt jeweils Bibliotheksdaten auf toten BÃ¤umen und macht sich nett im Regal.
In WEBIS sind Bibliotheken mit Sondersammelgebieten verzeichnet.
lib-web-cats (library web sites and catalogs) ist ein von Marshall Breeding verwaltetes Verzeichnis, das schwerpunktmÃ¤ÃŸig US-Bibliotheken enthÃ¤lt und vor allem die technische Ausstattung erfasst.
LibWeb ist ein weiteres internationales Verzeichnis von Bibliotheken, allerdings werden nur Name, Ort und URL erfasst.
OCLC meint alles zentral in WorldCat verwalten zu kÃ¶nnen und stellt fÃ¼r Bibliotheksdaten die WorldCat Registry bereit.
…

Sicherlich gibt es zahlreiche weitere Datenbanken. Es bleibt also noch einiges zum ZusammenfÃ¼hren und Verlinken, bis Bibliotheken bzw. ihre Daten im Semantic Web ankommen.

Tags: Adressverwaltung, Bibliothek, ISIL, Semantic Web 3 Kommentare

Nächste Seite »

Jakoblog — Das Weblog von Jakob VoÃŸ