Wikidata documentation on the 2017 Hackathon in Vienna

21. Mai 2017 um 15:21 4 Kommentare

At Wikimedia Hackathon 2017, a couple of volunteers sat together to work on the help pages of Wikidata. As part of that Wikidata documentation sprint. Ziko and me took a look at the Wikidata glossary. We identified several shortcomings and made a list of rules how the glossary should look like. The result are the glossary guidelines. Where the old glossary partly replicated Wikidata:Introduction, the new version aims to allow quick lookup of concepts. We already rewrote some entries of the glossary according to these guidelines but several entries are outdated and need to be improved still. We changed the structure of the glossary into a sortable table so it can be displayed as alphabetical list in all languages. The entries can still be translated with the translation system (it took some time to get familiar with this feature).

We also created some missing help pages such as Help:Wikimedia and Help:Wikibase to explain general concepts with regard to Wikidata. Some of these concepts are already explained elsewhere but Wikidata needs at least short introductions especially written for Wikidata users.

Image taken by Andrew Lih (CC-BY-SA)

Introduction to Phabricator at Wikimedia Hackathon

20. Mai 2017 um 09:44 1 Kommentar

This weekend I participate at Wikimedia Hackathon in Vienna. I mostly contribute to Wikidata related events and practice the phrase "long time no see", but I also look into some introductionary talks.

In the late afternoon of day one I attended an introduction to Phabricator project management tool given by André Klapper. Phabricator was introduced in Wikimedia Foundation about three years ago to replace and unify Bugzilla and several other management tools.

Phabricator is much more than an issue tracker for software projects (although it is mainly used for this purpose by Wikimedia developers). In summary there are tasks, projects, and teams. Tasks can be tagged, assigned, followed,discussed, and organized with milestones and workboards. The latter are Kanban-boards like those I know from Trello, waffle, and GitHub project boards.

Phabricator is Open Source so you can self-host it and add your own user management without having to pay for each new user and feature (I am looking at you, JIRA). Internally I would like to use Phabricator but for fully open projects I don’t see enough benefit compared to using GitHub.

P.S.: Wikimedia Hackathon is also organized with Phabricator. There is also a task for blogging about the event.

Some thoughts on IIIF and Metadata

5. Mai 2017 um 22:40 1 Kommentar

Yesterday at DINI AG Kim Workshop 2017 I Martin Baumgartner and Stefanie Rühle gave an introduction to the International Image Interoperability Framework (IIIF) with focus on metadata. I already knew that IIIF is a great technology for providing access to (especially large) images but I had not have a detailed look yet. The main part of IIIF is its Image API and I hope that all major media repositories (I am looking at you, Wikimedia Commons) will implement it. In addition the IIIF community has defined a „Presentation API“, a „Search API“, and an „Authentication API“. I understand the need of such additional APIs within the IIIF community, but I doubt that solving the underlying problems with their own standards (instead of reusing existing standards) is the right way to go. Standards should better „Do One Thing and Do It Well“ (Unix philosophy). If Images are the „One Thing“ of IIIF, then Search and Authentication are different matter.

In the workshop we only looked at parts of the Presentation API to see where metadata (creator, dates, places, provenance etc. and structural metadata such as lists and hierarchies) could be integrated into IIIF. Such metadata is already expressed in many other formats such as METS/MODS and TEI so the question is not whether to use IIIF or other metadata standards but how to connect IIIF with existing metadata standards. A quick look at the Presentation API surprised me to find out that the metadata element is explicitly not intended for additional metadata but only „to be displayed to the user“. The element contains an ordered list of key-value pairs that „might be used to convey the author of the work, information about its creation, a brief physical description, or ownership information, amongst other use cases“. At the same time the standard emphasizes that „there are no semantics conveyed by this information“. Hello, McFly? Without semantics conveyed it isn’t information! In particular there is no such thing as structured data (e.g. a list of key-value pairs) without semantics.

I think the design of field metadata in IIIF is based on a common misconception about the nature of (meta)data, which I already wrote about elsewhere (Sorry, German article – some background in my PhD and found by Ballsun-Stanton).

In a short discussion at Twitter Rob Sanderson (Getty) pointed out that the data format of IIIF Presentation API to describe intellectual works (called a manifest) is expressed in JSON-LD, so it can be extended by other RDF statements. For instance the field „license“ is already defined with dcterms:rights. Addition of a field „author“ for dcterms:creator only requires to define this field in the JSON-LD @context of a manifest. After some experimenting I found a possible way to connect the „meaningless“ metadata field with JSON-LD fields:

{
  "@context": [
    "http://iiif.io/api/presentation/2/context.json",
    { 
      "author": "http://purl.org/dc/terms/creator",
      "bibo": "http://purl.org/ontology/bibo/"
    }
  ],
  "@id": "http://example.org/iiif/book1/manifest",
  "@type": ["sc:Manifest", "bibo:book"],
  "metadata": [
    {
      "label": "Author",
      "property": "http://purl.org/dc/terms/creator",
      "value": "Allen Smithee"
    },
    { 
      "label": "License",
      "property": "http://purl.org/dc/terms/license",      
      "value": "CC-BY 4.0" 
    }
   ],
   "license": "http://creativecommons.org/licenses/by/4.0/",
   "author": {
     "@id": "http://www.wikidata.org/entity/Q734916",
     "label": "Allen Smithee"
   }
}

This solution requires an additional element property in the IIIF specification to connect a metadata field with its meaning. IIIF applications could then enrich the display of metadata fields for instance with links or additional translations. In JSON-LD some names such as „CC-BY 4.0“ and „Allen Smithee“ need to be given twice, but this is ok because normal names (in contrast to field names such as „Author“ and „License“) don’t have semantics.