Embedded diagrams and pandoc

24. Januar 2012 um 13:02 Keine Kommentare

If you don’t know John MacFarlane’s Pandoc, the „Swiss army knive of document formats“, you should definitely give it a try! Pandoc’s abstract document model and its serialization in an extended variant of Markdown markup let you focus on the structure and content of a text instead of dealing with formats and user interfaces. In my opinion pandoc is the best tool for document creation invented since (La)TeX (moreover pandoc is a good argument to finally learn programming in Haskell) Images in pandoc markdown documents, however, are only referenced by their file. This requires some preprocessing if you want to create different files for different document formats, especially bitmap images and vector images. So I hacked a little preprocessing script that let’s you embed images in pandoc’s markup language. For instance you write

~~~~ {.dot .Grankdir:LR}
digraph {
A -> B -> C;
A -> C;
}
~~~~

and you get

or based on rdfdot you write

~~~~ {.rdfdot}
@prefix foaf: <http: //xmlns.com/foaf/0.1/> .
@base <http: //example.com/> .
<alice> foaf:name „Alice“ ;
foaf:knows [ foaf:name „Bob“ ] .
~~~~

and you get

A detailed description is included in the manual which has been transformed automatically to HTML and to PDF. Compare both documents to see that HTML includes PNG images and PDF contains vector images!

Feel free to reuse and modify the script, for instance by adding more diagram types! For instance how about ASCII tabs and ABC notation if you write about music?

Linked local library data simplified

10. Januar 2012 um 14:53 1 Kommentar

A few days ago Lukas Koster wrote an article about local library linked data. He argues that bibliographic data from libraries data as linked data is not „the most interesting type of data that libraries can provide“. Instead „library data that is really unique and interesting is administrative information about holdings and circulation“. So libraries „should focus on holdings and circulation data, and for the rest link to available bibliographic metadata as much as possible.“ I fully agree with this statements but not with the exact method how do accomplish the publication of local library data.

Among other project, Koster points to LibraryCloud to aggregate and deliver library metadata, but it looks like they reinvent yet more wheels in form of their own APIS and formats for search and for bibliographic description. Maybe I am wrong about this project, as they just started to collect holding and circulation data.

At the recent Semantic Web in Bibliotheken conference, Magnus Pfeffer gave a presentation about „Publishing and consuming library loan information as linked open data“ (see slides) and I talked about a Simplified Ontology for Bibliographic Resources (SOBR) which is mainly based on the DAIA data model. We are going to align both data models and I hope that the next libraries will first look at these existing solutions instead of inventing yet another data format or ontology. Koster’s proposal, however, looks like such another solution: he argues that „we need an extra explicit level to link physical Items owned by the library or online subscriptions of the library to the appropriate shared network level“ and suggests to introduce a „holding“ level. So there would be five levels of description:

  • Work
  • Expression
  • Manifestation
  • Holding
  • Item

Apart from the fact that at least one of Work, Expression, Manifestation is dispensable, I disagree with a Holding level above the Item. My current model consists of at most three levels of documents:

  • document as abstract work (frbr:Work, schema:CreativeWork…)
  • bibliographic document (frbr:Manifestation, sobr:Edition…)
  • item as concrete single copy (frbr:Item…)

The term „level“ is misleading because these classes are not disjoint. I depicted their relationship in a simple Venn diagram:

For local library data, we are interested in single items, which are copies of general documents or editions. Where do Koster’s „holding“ entities fit into this model? He writes „a specific Holding in this way would indicate that a specific library has one or more copies (Items) of a specific edition of a work (Manifestation), or offers access to an online digital article by way of a subscription.“ The core concepts as I read them are:

  • „one or more copies (Items)“ = frbr:Item
  • „specific edition of a work (Manifestation)“ = sobr:Edition or frbr:Manifestation
  • „has one […] or offer access to“ = ???

Instead of creating another entity for holdings, you can express the ability „to have one or offer access to“ by DAIA Services. The class daia:Service can be used for an unspecified service and more specific subclasses such as loan, presentation, and openaccess can be used if more is known. Here is a real example with all „levels“:


<http://dbpedia.org/resource/Tractatus_Logico-Philosophicus>
    a bibo:Book ;
    daia:edition <urn:nbn:de:eki:GBV128382600> ;
    daia:exemplar
        <http://uri.gbv.de/document/opac-de-23:epn:266449999> .

<urn:nbn:de:eki:GBV128382600> a bibo:Book ;
    daia:exemplar
        <http://uri.gbv.de/document/opac-de-23:epn:266449999> .

<http://uri.gbv.de/document/opac-de-23:epn:266449999>
    a bibo:Book, daia:Item ;
    daia:heldBy <http://uri.gbv.de/organization/isil/DE-23> ;
        daia:availableFor [
            a daia:Service ;
            daia:providedBy <http://uri.gbv.de/organization/isil/DE-23>
        ] .

I have only made up the RDF property daia:edition from the SOBR proposal because FRBR relations are too strict. If you know a better relation to directly relate an abstract work to a concrete edition, please let me know.



image created with rdfdot

Request for comments: final specification of DAIA

6. Januar 2012 um 12:13 4 Kommentare

When I started to create an API for availability lookup of document in libraries in 2008, I was suprised that such a basic service was so poorly defined. The best I could find was the just-published recommendation of the Digital Library Federation (DLF-ILS). Even there availability status was basically a plain text message (section 6.3.1 and appendix 4 and 5). Other parts of the DLF-ILS GetAvailability response were more helpful, so they are all part of the Document Availability Information API (DAIA). Here is a simple mapping from DLF-ILS to DAIA:

  • bibliographicIdentifer (string) → document (URI)
  • itemIdentifier (string) → item (URI)
  • dateAvailable (dateTime) → expected (xs:dateTime or xs:date or „unknown“) or delay (xs:duration or „unknown“)
  • location (string) → storage (URI and/or string, plus optional URL)
  • call number (string) → label (string)
  • holdQueueLength (int) → queue (xs:nonNegativeInteger)
  • status (string) and circulating (boolean) → available/unavailable (with service type and additional information)

So you could say that DAIA implements the abstract GetAvailability function from DLF-ILS. I like abstract, language independent specifications, but they must be precise and testable (see Meek’s forgotten paper The seven golden rules for producing language-independent standards). DAIA is more than an implementation: it provides both, an abstract standard and bindings to several data languages (XML, JSON, and RDF). The conceptual DAIA data model defines some basic concepts and relationships (document, items, organisations, locations, services, availabilities, limitations…) independent from whether they are expressed in XML elements, attributes, RDF properties, classes, or any other data structuring method. The only reference to specific formats is the requirement that all unique identifiers must be URIs. Right now there is an XML Schema if you want to express DAIA in XML and an OWL ontology for RDF.

In its fourth year of development (see my previous posts from 2009) DAIA seems to have enough momentum to finally get accepted in practice. We use it in GBV library union (public server at http://daia.gbv.de/), there are independent implementations such as in Doctor-Doc, there is client-support in VuFind and I heard rumors that DAIA capabilities will be build into EBSCO and Summon Discovery Services. Native support in Integrated Library Systems, however, is still lacking – I already have given up hope and prefer a clean DAIA wrapper over a broken DAIA-implementation anyway. If you are interested in creating your own DAIA server/wrapper or client, have a look at my reference implementation DAIA and Plack::App::DAIA at CPAN and Oliver Goldschmidt’s PHP implementation in our common github repository. A conceptual overview as tree (DAIA/JSON, DAIA/XML) and as graph (DAIA/RDF) can be found here.

Still there are some details to be defined and I’d like to solve these issues to come to a version DAIA 1.0. These are

  • How to deal with partial publications (you requested an article but only get the full book or you requested a series but only get a single volume).
  • How to deal with digital publications (especially its possible service types: is „download“ a service distinct to „loan“ or is „presentation“ similar to online access restricted to the library’s intranet?).
  • Final agreement on service types (now there are presentation: item can be used in the institution, loan: item can be used outside of the institution for a limited time, interloan: item can be send to another institution, openaccess: item can be access unrestricted, just get a free copy). Some extensions have been proposed.
  • A set of common limitation types (for instance IP-based access restriction, permission-based access etc.).

I’d be happy to get some more feedback on these issues, especially concrete use cases. We are already discussing on the daia-devel mailing list but you can also comment in your own blog, at public-lld, code4lib, ils-di etc.).

P.S: Following an article by Adrian I started to collect open questions and comments as issues at github