30. Juli 2012

Databib, a proposed bibliography of research data repositories is calling for editors. These editors shall review submissions and edits to the bibliography. There is already an advisory board, giving Databib an academic touch.

The number of data repositories is growing fast, so it’s good to have an overview of existing repositories such as Databib. The number of similar collections of data repositories, however, is also growing. For instance, as noted by Daniel Kinzler in response to me, there is hosted by the Open Knowledge Foundation and edited by volunteers. There is no advisory board, giving an open community touch. And there are lists such as the list of repositories known to DataCite, the wiki-based list at Open Access Directory, the DFG-funded project (which will likely be closed after funding stops, as known from most DFG funded projects), and many, many more.

One may ask why people cannot agree on either one list of repositories or at least one interchange format to create a virtual bibliography. Welcome to the multifaceted world of cataloging! I think there are reasons to have multiple collections, for instance there are different groups of users and different definitions of a [research] data repository (if there is any definition at all). At least one should be clear about the following:

Any list or collection of data repositories is an instance of a bibliography similar to a library catalog. Managing bibliographies and catalogs is more difficult than some imagine but it’s nothing new and it’s no rocket science. So people should not try to reinvent the wheel but build on established cataloging practices. Above all, one should (re)use identifiers to refer to repositories and one should not just ask for free-text input but use existing controlled vocabularies and authority files. This should also be familiar to people used to Linked Open Data.

By the way, any collection of data repositories, again is a data repository. Adding another level above may not really help. Maybe one should just treat published research data as one instance of a digital publication and catalog it together with other publications? What defines a “dataset” in contrast to other digital publications? In the end it’s all a stream of bits isn’t it? ;-)

  1. That the description of a data set (as well as of a repository of data sets) should be considered as a form of bibliography is discussed in “Data management as bibliography” Bulletin of ASIST

    Kommentar by Michael Buckland — 31. Juli 2012 #

  2. Thanks for the reference! To identify the issues of data cataloging, I’d better speak of non-media resources instead of non-textual resources. Text, images, audio, video and similar media are relatively easy to read while other forms of data much more depend on context to make any use of it.

    Kommentar by jakob — 31. Juli 2012 #

