en « Jakoblog — Das Weblog von Jakob Voß

Blog
About

Archiving Weblogs with ATOM and RFC 5005: An alternative to OAI-PMH

19. Oktober 2007 um 11:34 1 Kommentar

Following up to my recent post (in German) I had a conversation with my colleague about harvesting and archiving blogs and ATOM vs OAI-PMH. In my opinion with the recent RFC 5005 about Feed Paging and Archiving and its proposed extension of Archived Feeds ATOM can be an alternative to OAI-PMH. Instead of arguing which is better, digital libraries should support both for harvesting and providing archived publications such as preprints and weblog entries (scientific communication and publication already takes place in both).

Instead of having every project to implementing both protocols you could create a wrapper from ATOM with archived feeds to OAI-PMH and vice versa. The mapping from OAI-PMH to ATOM is probably the easier part: You partition the repository into chunks as defined in RFC 5005 with the from and until arguments of OAI-PMH. The mapping from OAI-PMH to ATOM is more complicated because you cannot select with timestamps. If you only specify a fromargument, the corresponding ATOM feed could be harvested going backwards in time but if there is an until argument you must harvest the whole archive just to get the first entries and and throw away the rest. Luckily the most frequent use case is to get the newest entries only. Anyway: Both protocols have their pros and cons and a two-way-wrapper could help both. Of course it should be implemented as open source so anyone can use it (by the way: There seems to be no OAI-crawler in Perl yet: Sure there is OAI-Harvester but for real-world applications you have to deal with unavailable servers, corrupt feeds, duplicated or deleted entries, and a way to save the harvested records, so a whole layer above the harvester is missing).

P.S.: At code4lib Ed Summers pointed me to Stuart Weibel who asked the same question about blog archiving, and to a discussion in John Udell’s blog that include blog archiving (he also mentions BlogML as a possible part of a solution – unluckily BlogML looks very dirty to me, the spec is here). And Daniel Chudnov drafted a blog mirroring architecture.

Tags: Archivierung, ATOM, BlogML, Feed, OAI 1 Kommentar

Second day at MTSR

18. Oktober 2007 um 18:46 Keine Kommentare

It is already a week ago (conference blogging should be published immediately) so I better summarize my final notes of the MTSR conference 2007: Beitrag Second day at MTSR weiterlesen…

Tags: DCAP, digital library, Dublin Core, MTSR07, MTSR2007, OAI, ontology, Overlay Journal, RIOJA, Science 2.0, SOA, Tagging Keine Kommentare

Introducing the Open Research Society

12. Oktober 2007 um 09:52 Keine Kommentare

After a short break at the MTSR 2007 in which I got to know Panayiota Polydoratou yesterday (greetings to Traugott Koch!), Miltiadis Lytras introduced the Open Research Society (ORS) and raised some important general questions: Why do we do research? Who can benefit from our research? Which alternatives to the current system of publication and review exist? How can we overcome the digital divide? The Open Research Society will also participates in the Open Knowledge Summit in Athens (24-26 September 2008) and it is going to publish a couple of new Open Access journals – have a look at their website and welcome this new organization in the area of Open Access and Open Content!

Miguel-Angel Sicilia explained the ORS plans in more detail with his presentation From open access to open research and information sustainability. The proposed ORS Journals (which ORS should not be limited to) are going to be full open access without author fees and all research data must be provided. Peer review is planned to be double-blind but there will be additional experiments with other review methods to find out how peer review could be changed. Sicilia also talked about Open Access and Information Sustainability which is a hard challenge given the explosion of publication.

My first impression of the Open Research Society is very promising – we should collaborate with Science Commons, Wikimedia and similar projects!

Tags: MTSR07, MTSR2007, Open Access, Open Research, ORS, Science 2.0 Keine Kommentare

MTSR 2007 impressions

11. Oktober 2007 um 13:53 5 Kommentare

Although there is wireless all over the MTSR2007 conference place I have found no postings about MTSR2007 or MTSR07 so far, so I just summarize the talks in the first session I just watched:

Evangelos Sakkopoulus presented with Ontology-based Knowledge Acquisition through Semantic Profiling. An Application to the Cultural Heritage Domain a maintenance scheme for detecting the category a website may belong to for effective caching and client-side re-ranking websites in mobile applications. His approach is based on the observed browsing behaviour of „bursty cases“: a few web pages results’ categories are accessed frequently for short periods of time. Sakkopoulus uses the categories of the Open Directory Project (ODP), but the method could also be applied to other sets like the Wikipedia category system or library classifications.

In the second presentation Dimitrios Koutsomitropoulos talked with Ontology-based Knowledge Acquisition through Semantic Profiling about profiling CIDOC-CRM (ISO 21127:2006) by refining existing and adding new classes and properties with OWL. With additional restrictions and refinement you can increase expressiveness and better match a particular case. Well – this is nice, but in my experience we do not need more complexity and details but less, because the existing data is much less homogeneous and detailed then ontology theorists dream of. CIDOC-CRM is important but real-world applications will rather use simplifications of it.

Gianluca Colombo presented a Reference Ontology Design for a Neurovascular Knowledge Network. He researched how phenotypes in distributed clinical databases can be described with methods of bioinformatics to aggregate them jointly. He admitted that the most difficult part is mapping existing data to one common ontology.

Finally Irina Astrova presented Rule-Based Transformation of SQL Relational Databases to OWL Ontologies. I cannot judge her work because I don’t not the current research in mapping SQL data to the Semantic Web (it is surely a topic that many researcher deal with), but it looks more practical and relevant then the other presentations because most data exists in SQL databases. The implementation QUALEG DB can even do both way (SQL to OWL and OWL to SQL) as shown in another paper of hers – you can get get the software if you want (they are going to rename it).

The lunch was simple and the view over the bay is wonderful.

Tags: Metadata, MTSR07, MTSR2007, ontology 5 Kommentare

Presentation about versioned ISO-3166 in SKOS

11. Oktober 2007 um 00:03 1 Kommentar

Unlike my usual habbits and thanks to W-LAN in the hotel, the slides of my presentation Encoding changing country codes in RDF with ISO 3166 and SKOS at the second International Conference on Metadata and Semantics Research (MTSR2007) are ready and online even before the conference started! The full, detailed paper is not online yet because I am revising and correcting it (I found a very relevant paper after submission). And the serendipity effect of slideshare works: looking for other presentations about SKOS I stumbled upon the very interesting slides of Sebastion Kruk who works in the Corrib project on semantic web and digital libraries.
P.S: A preprint of the revised paper is available at arXiv.org.

Tags: MTSR07, MTSR2007, Semantic Web, SKOS 1 Kommentar

Yet another semantic tagging application

14. September 2007 um 02:11 3 Kommentare

I just found another semantic tagging application: SemKey is also a Firefox-Plugin like EntityDescriber that I just wrote about. SemKey uses WordNet and Wikipedia as controlled vocabularies and help you to find the appropriate entry in them. Maurizio Tesconi and his colleauges describe SemKey in their paper SemKey: A Semantic Collaborative Tagging System at the WWW2007 Workshop on Tagging and Metadata for Social Information Organization (other papers linked here).

But the authors of SemKey don’t cite Gabrilovich and Markovitch (2006): Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge which is highly related (see also the following paper of Gabrilovich and Markovitch). Looks like both Marchetti et al. and their reviewers of the WWW 2007 workshop don’t know about their subject area. The feedback on SemKey is also little: This is science 1.0 about Web 2.0. Researchers 2.0 publishe their work on weblogs and preprint archives or even dare to fight in the jungle of Wikipedia to push forward knowledge instead of citation rank.

Tags: Science 2.0, SemKey, Tagging 3 Kommentare

The Steve.museum tagging project

13. September 2007 um 22:56 1 Kommentar

Steve.museum is a tagging project that has been active for more then a year by now. Unlike other artificial prototype-tagging-research projects it is based on real world data: works of art in museum collections. Moreover its not available only by pictures in research-papers but available at sourceforge (written in PHP). More news about the project can be found at the Mailing list and the blogosphere. I stumbled upon jtran’s blog and his report from ASIST SIG-CR workshop on social cassification that took place in Texas last year. Hopefully someone from the steve.museum team will participate in Dublin Core conference 2008 or some other event the I participate!

Tags: Dublin Core, steve.museum, Tagging 1 Kommentar

Tagging enriched with controlled vocabularies

10. September 2007 um 03:36 7 Kommentare

For Connotea there was published „Entity Describer“ (ED), an add-on tool that allows taggers to select terms from a controlled vocabulary such as MeSH. Background information can be found in the blog of its developer Benjamin Good. Up to now Entity Describer can only be used via a Greasemonkey script. [via Catalogoblog and netbib]

I bet soon there will be more tagging applications that support controlled vocabularies. For instance Sarah Hayman and Nick Lothian plan to extend the Education Network Australia (edna) with – how they call it – taxonomy-directed folksonomy. See their IFLA-Paper (that Patrick pointed me to) for more information.

Benjamin Good also wrote a paper about his work on ED and published it on his blog before even recieving reviewers comments. I like the following discussion on whether and how to publish it – a nice example of the changes in academic publishing. Now the paper is best available as preprint, identified with hdl:10101/npre.2007.945.1 and licensed under the Creative Commons Attribution 2.5 License (!). Thanks Benjamin and thanks to Nature for making this possible!

I already cited the work in an ongoing discussion about the Wikipedia-Article „Folksonomy. The discussion is mostly about words and I hate it. Good et al also contribute in confusion: Why do they have to introduce a new term („Semantic annotation means the association of a data entity with an element from a classification scheme“) instead of using existing vocabulary? A look at my typology of tagging systems could help clarification.

Well… or maybe tagging researchers just like to add synonyms and polysems because they are so used to them – a folksonomy will emerge anyhow so just call it how you like… 🙁

Tags: Connotea, Semantic Web, Tagging, Vocabulary 7 Kommentare

Persistent Identifiers: Irony of Fate or just absurd?

24. August 2007 um 01:20 4 Kommentare

The report „Implementing Persistent Identifiers: overview of concepts, guidelines and recommendations“ shows you the impracticality of URN and URN:NBN – you do not even have to read any of the 70 report’s pages to find out: If you try the „persistent identifier“ http://nbn-resolving.de/urn:nbn:de:gbv:7-isbn-90-6984-508-3-8 to get the report’s PDF, you get the following message by a resolver at http://resolver.sub.uni-goettingen.de/purl/?isbn-90-6984-508-3:

Unfortunately the URL could not be resolved. None of the underlying local document resolver were able to find a document with the given identifier. Maybe one of the services is down or a document with the number doesn’t exist. As your URL should contain a persistent identifier, please check again later.

I’d call this 404 2.0! Furthermore at http://www.cerl.org/news.htm one of the report’s publisher (CERL) points out to a review of the report at http://www.clir.org/pubs/issues/index.html#found – which gives you the current issue of CLIR issues (printed version’s ISSN 1098-6383 is not mentioned anywhere) instead of http://www.clir.org/pubs/issues/issues55.html#found. If you ask Google for the title you easily find the PDF. If you ask WorldCat for the ISBN 90-6984-508-3 you get a record where you have to click and search a lot to guess which link will bring you to the PDF – but it’s only the unresolvable URN again.

If people are already to dump to use existing identifier systems (URL, ISBN, ISSN) in the right way, I strongly doubt that persistent identifier systems will solve any problem.

Tags: Identifier, persistent 4 Kommentare

« Vorige Seite — Nächste Seite »

Jakoblog — Das Weblog von Jakob Voß

Archiving Weblogs with ATOM and RFC 5005: An alternative to OAI-PMH

Second day at MTSR

Introducing the Open Research Society

More MTSR 2007 presentations

MTSR 2007 impressions

Presentation about versioned ISO-3166 in SKOS

Yet another semantic tagging application

The Steve.museum tagging project

Tagging enriched with controlled vocabularies

Persistent Identifiers: Irony of Fate or just absurd?

Neueste Beiträge

Neueste Kommentare

Blogroll

Feeds

Siehe auch

Neueste Beiträge

Neueste Kommentare

Themen

Blogroll

Feeds

Siehe auch