Tagging enriched with controlled vocabularies

10. September 2007 um 03:36 7 Kommentare

For Connotea there was published „Entity Describer“ (ED), an add-on tool that allows taggers to select terms from a controlled vocabulary such as MeSH. Background information can be found in the blog of its developer Benjamin Good. Up to now Entity Describer can only be used via a Greasemonkey script. [via Catalogoblog and netbib]

I bet soon there will be more tagging applications that support controlled vocabularies. For instance Sarah Hayman and Nick Lothian plan to extend the Education Network Australia (edna) with – how they call it – taxonomy-directed folksonomy. See their IFLA-Paper (that Patrick pointed me to) for more information.

Benjamin Good also wrote a paper about his work on ED and published it on his blog before even recieving reviewers comments. I like the following discussion on whether and how to publish it – a nice example of the changes in academic publishing. Now the paper is best available as preprint, identified with hdl:10101/npre.2007.945.1 and licensed under the Creative Commons Attribution 2.5 License (!). Thanks Benjamin and thanks to Nature for making this possible!

I already cited the work in an ongoing discussion about the Wikipedia-Article „Folksonomy. The discussion is mostly about words and I hate it. Good et al also contribute in confusion: Why do they have to introduce a new term („Semantic annotation means the association of a data entity with an element from a classification scheme“) instead of using existing vocabulary? A look at my typology of tagging systems could help clarification.

Well… or maybe tagging researchers just like to add synonyms and polysems because they are so used to them – a folksonomy will emerge anyhow so just call it how you like… 🙁

7 Comments »

RSS feed for comments on this post. TrackBack URI

  1. Hi Jacob,

    I actually don’t work for Connotea at all – at least they don’t pay me. As a grad student it sometimes seems that no one pays me ;). I lamented this a bit in a previous post (see the p.s.).

    Yes, I admit it, we made up a new term or two.. but there are both good and bad reasons for doing this. The bad reason I guess is that bioinformatics (our subject area) really overloads the word „annotation“ both in the amount and the diversity of uses for it. Thus I had „annotation“ on the brain when I wrote this but needed to distinguish open tagging with the use of pre-structured schemes. I think for the bioinformatics audience this may actually help to communicate the idea but, for the wider audience, „semantic annotation“ could certainly be replaced with something like „indexing using a pre-coordinated vocabulary“ (though I like the former’s brevity). However, the other word that I may have invented (I haven’t seen it elsewhere) „semantic tagging“ might actually be a useful thing to keep around. How would you succinctly describe tagging (meta-data assignment by non-professional indexers for personal, wide-ranging purposes) with terms from controlled vocabularies/terminologies/ontologies ? Obviously, i think the intersection of the social tagging with semantics is a new and exciting thing – why not give it a name??

    (great to hear your interest and thanks for the feedback!)

    Comment by Benjamin Good — 10. September 2007 #

  2. Up to now almost only professional indexers used for instance MeSH to annotate documents. With Entity Describer there are two differences: first everyone can do it – that’s why we call it „social tagging“ instead of „indexing“. Second there is RDF. But does make RDF „semantic“? Was all existing usage of MeSH not semantic because it was not encoded in RDF? I don’t think so. The point is not the encoding (RDF) but to use controlled vocabularies instead of free keywords.

    The term of information science is probably „controlled user-generated indexing“ but even Hayman and Lothian at IFLA use the neologism „taxonomy-directed folksonomy“. „Controlled tagging“ may be a good term but the lets-call-everything-semantic-fraction is probably too strong so I bet on „semantic tagging“ – which already is a defined term in linguistics where „tagging“ is used for annotation of words and phrases instead of documents.

    Anyhow: Very good work 😉 – you should only note that you somehow reinvent subject indexing which gives the old controlled indexing vs. automatic indexing debate a new thrill.

    Comment by jakob — 10. September 2007 #

  3. I never meant to imply that RDF was required for tagging to be „semantic“, I did mean to imply that it was needed if the result was to be part of the „semantic web“ – which can’t really exist without the application of standards like RDF.

    I have a little bit of trouble when you lump social tagging in so closely with subject indexing. Sure, a large majority of tags do refer to subject, but certainly not all. E.D. doesn’t as yet support anything besides subjects, but one of the extensions that I’d like to make is to add controlled support for other common uses of tags – e.g. task prioritization („to_read“) or explicit qualitative description („good“, „important“,…). Providing some structure for organizing these other aspects of tagging has already been demonstrated to be useful and I’m sure there is more to do.

    I’ll try to be more careful next time I feel the need to start inventing words – though, as any British person will be happy tell you, that is one of the favorite activities of American english speakers ;).

    Comment by Benjamin Good — 10. September 2007 #

  4. Good point (tags that do not refer to subject). Before tagging such other ends were more done by classification with other methods (for instance the pile of to-read-papers on my desk). It just sometimes annoys me that people in tagging have little or no background knowledge in information retrieval which makes them reinvent terms and concepts that were know long before. By the way have you had a look at SKOS to encode the controlled vocabularies in RDF?

    Comment by jakob — 10. September 2007 #

  5. I have seen SKOS and did mention it in the paper. We’re really pushing to avoid re-representing the ontologies as much as possible so don’t really want to convert everything internally into SKOS just yet. The plan is to add support for externally generated SKOS terminologies first. We’ve discussed the possibility of providing SKOS versions of terminologies represented differently as a service, but, unless our team starts to grow unexpectedly, that may be a while off.

    Comment by Benjamin Good — 12. September 2007 #

  6. Yes – SKOS is still beeing worked on and it’s more for exchange. Import- and export of vocabularies in SKOS will definitely make EntityDescriber a real semantic tagging interface for any tagging application, not only connotea – but you just have to start with something. I think an SKOS-API could help so EntityDescriber could also be used with vocabularies hosted elsewhere.

    Comment by Jakob — 12. September 2007 #

  7. […] another semantic tagging application: SemKey is also a Firefox-Plugin like EntityDescriber that I just wrote about. SemKey uses WordNet and Wikipedia as controlled vocabularies and help you to find the appropriate […]

    Pingback by Yet another semantic tagging application « Jakoblog — Das Weblog von Jakob Voß — 14. September 2007 #

Sorry, the comment form is closed at this time.