Databib, a proposed bibliography of research data repositories is calling for editors. These editors shall review submissions and edits to the bibliography. There is already an advisory board, giving Databib an academic touch.
The number of data repositories is growing fast, so it’s good to have an overview of existing repositories such as Databib. The number of similar collections of data repositories, however, is also growing. For instance, as noted by Daniel Kinzler in response to me, there is datahub.io hosted by the Open Knowledge Foundation and edited by volunteers. There is no advisory board, giving datahub.io an open community touch. And there are lists such as the list of repositories known to DataCite, the wiki-based list at Open Access Directory, the DFG-funded re3data.org project (which will likely be closed after funding stops, as known from most DFG funded projects), and many, many more.
One may ask why people cannot agree on either one list of repositories or at least one interchange format to create a virtual bibliography. Welcome to the multifaceted world of cataloging! I think there are reasons to have multiple collections, for instance there are different groups of users and different definitions of a [research] data repository (if there is any definition at all). At least one should be clear about the following:
Any list or collection of data repositories is an instance of a bibliography similar to a library catalog. Managing bibliographies and catalogs is more difficult than some imagine but it’s nothing new and it’s no rocket science. So people should not try to reinvent the wheel but build on established cataloging practices. Above all, one should (re)use identifiers to refer to repositories and one should not just ask for free-text input but use existing controlled vocabularies and authority files. This should also be familiar to people used to Linked Open Data.
By the way, any collection of data repositories, again is a data repository. Adding another level above may not really help. Maybe one should just treat published research data as one instance of a digital publication and catalog it together with other publications? What defines a „dataset“ in contrast to other digital publications? In the end it’s all a stream of bits isn’t it? 😉
Anfang Oktober hatte ich mich noch geärgert, dass viele Repositories der Entwicklung hinterherhinken; zumindest E-LIS ist nun wieder auf dem aktuellen Stand – auf dem man sich natürlich nicht Ausruhen kann denn die Entwicklung geht weiter: „The library is a growing organism“ (Ranganathan 1931). Ich hoffe, dass sich die Repository-Entwickler und Betreiber mehr zusammentun und neben OAI-PMH weitere gemeinsame Standards finden, damit Repositories selber nicht zu monolithischen Systemen verkommen sondern flexibel auf neue Anforderungen reagieren können.
In jedem Fall sollten gute Publikationen aus dem Bibliotheks- und Informationsbereich, die nicht bereits bei einer OpenAccess-Zeitschrift oder in einem anderen offenen Repository veröffentlicht wurden, in E-LIS hochgeladen werden.
P.S: Bedauerlicherweise sind nicht alle Publikationen bei E-LIS OpenAccess – in einigen Fällen ist die Registrierung notwendig, was nicht nur unpraktisch sondern auch überflüssig ist, weil sich jeder registrieren kann.
Frome time to time still publish on paper, so I have to deposit the publication in a repository to make it (and its metadata) available; mostly I use the „open archive for Library and Information Science“ named E-LIS. But each time I get angry because uploading and describing a submission is so complicated – especially compared to popular commercial repositories like flickr, slideshare youtube and such. These web applications pay a lot attention to usability – which sadly is of low priority in many digital libraries.
I soon realized that E-LIS uses a very old version (2.13.1) of GNU EPrints – EPrints 3 is available since December 2006 and there have been many updates since then. To find out whether it is usual to run a repository with such an outdated software, I did a quick study. The Registry of Open Access Repositories (ROAR) should list all relevant public repositories that run with EPrints. With 30 lines of Perl I fetched the list (271 repositories), and queried each repository via OAI to find out the version number. Here the summarized result in short:
76 x unknown (script failed to get or parse OAI response), 8 x 2.1, 18 x 2.2, 98 x 2.3, 58 x 3.0, 13 x 3.1
Of 195 repositories (that I could successfully query and determine the version number of) only 13 use the newest version 3.1 (released September 8th). Moreover 124 still use version 2.3 or older. EPrints 2.3 was released before the web 2.0 hype in 2005! One true point of this web 2.0 bla is the concept of „perpetual beta“: release early but often and follow user feedback, so your application will quickly improve. But most repository operators do not seem to have a real interest in improvement and in their users!
Ok, I know that managing and updating a repository server is work – I would not be the right guy for such a job – but then don’t wail over low acceptance or wonder why libraries have an antiquated image. For real progress one should perpetually do user studies and engage in the developement of your software. Digital libraries with less resources should at least join the Community and follow updates to keep up to date.
P.S: E-LIS has updated its software now (November 2008). A lot of missing features remain but those need to be implemented in EPrints first.