Is inconsistency in biological annotations wrong?

March 8, 2010 at 8:39 am | Posted in article | Leave a comment
Tags: , ,

A wealth of information science research explores consistency between different indexers. Give two professional, well-trained indexers the same item and they will probably apply different terms. From a subject access perspective this might create problems – there may be a mis-match between what terms a user expects, and the terms different indexers have applied.

Inconsistency in indexing therefore has implications for information retrieval. Inconsistency is a natural consequence of the way indexers appraise resources, and seek to apply what they consider to be the best terms. Can index terms be wrong? Perhaps, but for every term, there is likely to be arguments for or against its application.

Social tagging on sites like Delicious can also be explored for consistency. Wolfram et al. demonstrate that vector space modeling (which is traditionally used in information retrieval) can be applied to tag populations taken from a Citeulike dataset to measure consistency between taggers. They found that tagging consistency did not vary between subject areas.

Sadly, the authors do not go so far as to state whether they felt user tags were consisently good or consistently bad.

I think biologists would consider Gene Ontology annotations to be either right or wrong. Annotations can be critically appraised as valid or invalid, based on the biology.

The human element in GO annotations, the fuzzy, inconsistent element in manually, or even automatically tagging genes with functional terms, is ignored.

What kind of biology do we have if we accept that functional labels for genes – ‘lipid transporter activity’, ‘cardiac atrium development’ – are not true or false, and that different biologists might apply them within their personal theoretical framework entirely as they feel?

Wolfram, D., Olson, H. A., and Bloom, R. (2009). Measuring consistency for multiple taggers using vector space modeling. Journal of the American Society for Information Science and Technology, 60(10):1995-2003.

dx doi 10.1002/asi.21123

Leave a Comment »

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.

Follow

Get every new post delivered to your Inbox.