Databib, a proposed bibliography of research data repositories is calling for editors. These editors shall review submissions and edits to the bibliography. There is already an advisory board, giving Databib an academic touch.
The number of data repositories is growing fast, so it’s good to have an overview of existing repositories such as Databib. The number of similar collections of data repositories, however, is also growing. For instance, as noted by Daniel Kinzler in response to me, there is datahub.io hosted by the Open Knowledge Foundation and edited by volunteers. There is no advisory board, giving datahub.io an open community touch. And there are lists such as the list of repositories known to DataCite, the wiki-based list at Open Access Directory, the DFG-funded re3data.org project (which will likely be closed after funding stops, as known from most DFG funded projects), and many, many more.
One may ask why people cannot agree on either one list of repositories or at least one interchange format to create a virtual bibliography. Welcome to the multifaceted world of cataloging! I think there are reasons to have multiple collections, for instance there are different groups of users and different definitions of a [research] data repository (if there is any definition at all). At least one should be clear about the following:
Any list or collection of data repositories is an instance of a bibliography similar to a library catalog. Managing bibliographies and catalogs is more difficult than some imagine but it’s nothing new and it’s no rocket science. So people should not try to reinvent the wheel but build on established cataloging practices. Above all, one should (re)use identifiers to refer to repositories and one should not just ask for free-text input but use existing controlled vocabularies and authority files. This should also be familiar to people used to Linked Open Data.
By the way, any collection of data repositories, again is a data repository. Adding another level above may not really help. Maybe one should just treat published research data as one instance of a digital publication and catalog it together with other publications? What defines a “dataset” in contrast to other digital publications? In the end it’s all a stream of bits isn’t it?