Thoughts on bias are biased, so become a hardcore epistemological realist?

November 30, 2010 at 12:17 pm | Posted in article | Leave a comment
Tags: , ,

I’m pretty sure there is a reality.

If there isn’t, then what is not reality is so damn convincing, I’m not bothered either way.

Yet one can choose to commit to realism, or the belief that there is one really real reality in which all our observations, perceptions, language, theories and beliefs, have their referents.

And somehow, we can know this reality free from all prejudice – we can know the universe as an unbiased, neutral observer.

Critics of scientific knowledge have claimed that subscribing to realism is bonkers, because bias exists in everything we see or think and believe. Since bias is ingrained into human nature – even that part of human nature which sees itself as scientific and objective – then epistemological realism is a pipedream. We must accept that every theory, even the the fundamental ones that seem to be entirely free of prejudice are, somewhere along the line, polluted by bias.

The author of the paper below,Ingvar Johansson, describes a view of biasism he terms Myrdal’s Biasism which claims the following:

“…we cannot know truths and that we should therefore speak of research results as being true-for-certain-valuations instead of being just true”

Johansson criticises all forms of biasism with several logical arguments, including the paradox that biasism itself would surely be biased, if we were to accept the version above.

Can we biased and align ourselves with epistemological realism? I don’t see why biased research programmes cannot lead to truth. The problem is when our biases blinker us from better truths than what we have now. We can be biased AND appreciate the kind of movement towards more ‘truthlikeness’ described by Karl Popper and explored by Johansson in this same paper.

I see this sort of productive research bias in Thomas Kuhn’s view of science. The interesting part is when the scientist realises these biases are untenable in the face of new evidence, though the process by which one truth is superseded by a better truth is a fascinating one to try and understand.

Revolution? Inspiration? Logical necessity?

dx doi 10.1016/j.jbi.2005.08.005 and FULL TEXT

Dictionary : do I even know the meaning of the word?

November 12, 2010 at 5:11 pm | Posted in thesis | Leave a comment
Tags: ,

In a recent discussion about what my research is about, I found myself stating clearly and for the first time that I believed the Gene Ontology is essentially a glorified dictionary.

If one can put aside the pretensions garnered by that big word ‘ontology’, the GO project is a list of defined terms describing one part of the biological domain. The three separate ontologies, the relationships, the structure formed by the terms has occluded for me the simple fact that in a basic sense, the Gene Ontology is a dictionary of biological words.

Complexities of ontologies aside, if the GO project had originally been conceived as a universal dictionary for the biological domain, this in itself would have been a tremendously ambitious project which may, perhaps, have revealed just how difficult it is to obtain towards any sort of grand consensus between the competing perspectives and understandings of the numerous micro-domains in biology.

As it stands though, the Gene Ontology is a vastly more complicated endeavour which leap-frogged the difficult question of whether there did indeed exist any sort of universally dictionary of terms for the biological domain. Instead, GO muscled straight into the thorny world of classes, instances, parts and all manner of other complicated relations.

When I was explaining myself earlier, I was asked if biologists did not already have a universal dictionary, a simple, shared understanding of the words they use and what these words mean.

A quick search on Amazon revealed over a thousand hits for the search term ‘biology dictionary’.

I think the domain of biology would best be described as a confederation of states, each speaking in slightly different accents and idioms.

Here I am, relaxing at home

July 29, 2010 at 8:52 am | Posted in personal | Leave a comment
Tags:

Description logic never got me a date

July 29, 2010 at 8:48 am | Posted in article | Leave a comment
Tags: ,

I am no expert in description logic.

I just put that on the advert to get a girlfriend.

However it is interesting to hear about the limitations of representing biological knowledge in OWL-DL which, though powerful for implementing machine reasoning, can struggle with particular ways of thinking in biology such as:

  • Similarity and ‘fuzziness’ : biology is grounded in the idea of similarity – similarity between molecules, between organisms, between functions. HOW similar two biological features may be is sometimes hard to describe (how similar are dogs and cats? how similar is DNA sequence A and DNA sequence B?) Indeterminacy like this is difficult to capture in description logic.
  • Prototypes and exception : biologists often regard entities as prototypical, such that a human eye might be considered a prototype of all eyes even though something like an insectile compound, although totally different, is still in many senses related. Furthermore, exceptions to rules are quite normal in biology, so an enzyme class which catalyses transcription will always catalyse transcription, unless it is doing something else.
  • Complex property restrictions : for example a transcription factor binds to a promoter and activates gene transcription – the biological process ‘gene transcription’ is a property of the factor + promoter complex
  • Expressive datatypes : capturing in description logic the idea of a number ranging meaning ‘this is lots’ or dimensions translating to mean ‘this is a big cell’ – again, this type of thinking is rife in biology

The authors below give other examples and discuss just how some of these conceptual tools in biology might be captured in OWL-DL.

Stevens, R., Aranguren, E. M., Wolstencroft, K., Sattler, U., Drummond, N., Horridge, M., and Rector, A. (2007). Using owl to model biological knowledge. International Journal of Human-Computer Studies, 65(7):583-594.

dx doi 10.1016/j.ijhcs.2007.03.006 and Full text here

Own harshest critic judges himself to be doing a good job

July 28, 2010 at 2:57 pm | Posted in article | Leave a comment

“I am my own harshest critic.”

I doubt this statement.

Bada et al., authors and instigators of the Gene Ontology, write a (not) entirely disinterested analysis of how the implentation of the Gene Ontology has useful lessons for other ontologies.

The Gene Ontology is a marvelous creation. However I would not go to the Tory party headquarters for impartial political advice on whether I should join the Tory party.

The authors highlight community involvement and simplicity as two very important factors in the success of the Gene Ontology, factors other ontology designers might bear in mind.

How else might we explain the success of the Gene Ontology? The first product in a marketplace has a natural advantage over its competitors, and complex products may have a high startup cost that deters alternatives. The economics make it unlikely the ontology will fail.

Biologists may use the Gene Ontology because it is there and there are no alternatives. They may contribute to the ontology but feel beholden to the curators as to whether the changes they want will be made. A community is formed and involved as a consequence of the technology rather than any sense of ownership.

Established norms may also play a strong role in take-up of the Gene Ontology. Everyone else is using it as a standard, therefore I must use it too.

Bada, M., Stevens, R., Goble, C. A. et al. (2004). A short study on the success of the gene ontology. Web Semantics: Science, Services and Agents on the World Wide Web, 1(2):235-240.

dx doi 10.1016/j.websem.2003.12.003 and Full text

Irrelevant truth in functional genomics

July 28, 2010 at 2:33 pm | Posted in article, web | Leave a comment

Always interested to see the issue of ‘relevance’ rearing its noble / ugly / annoying / informative head in the bioinformatics literature, much as relevance has long skulked about the information science domain.

The ontology and description logics literature spends a lot of time avoiding the question of uncertainty which, in my opinion, is fundamental to the practice of science. Scientists often gauge the importance of variation within their empirical framework using all manner of expertise, guesswork and voodoo.

In developing machine learning algorithms designed to take advantage of Gene Ontology annotations, Akand et al. note:

“… any gene is annotated with all of the categories with which it has been associated in the published scientific literature. In any particular experimental setting, however, only a subset of the known annotations of a gene will be relevant.”

All annotations are not created equal, and although the human p53 gene may be annotated with 90 different Gene Ontology terms, in the context of an experiment designed to investigate the process of double-stranded DNA repair, the existence of many of these other annotations may be ignored by the biologist for the sake of simplicity.

What then if all analysis in functional genomics is a matter of attention, in which the biologist is free to ignore accessory information which, although objectively true, is deemed superfluous to the task?

Akand, E., Bain, M., and Temple, M. (2007). Learning from ontological annotation : an application of formal concept analysis to feature construction in the gene ontology. volume 85, pages 15-23.

For abstract and full paper see here

Does 100% reliability between indexers or annotators exist?

July 19, 2010 at 3:52 pm | Posted in article | Leave a comment
Tags: ,

How do we measure the reliability of coding, annotating, indexing or classification by different coders, annotators, indexers or classifiers?

Lombard et al. reported just how poor authors in the mass communication research literature were at reporting in detail the consistency between different coders in analyzing content in their research.

Coders in this situation are individuals reading say, a newspaper, and deciding whether that newspaper contains information about a particular topic or subject, say Wayne Rooney’s wedding. Reliability is the measure of matching judgments between different coders. In the classification world, it might be two librarians choosing subject headings for the same book, or in the Gene Ontology world it might be two different annotators choosing index terms for the same biomedical article.

From failing to report how many coders had coded the sample, to omitting how or whether the coders had been trained or even stating exactly how reliability had been calculated, this paper is a striking investigation into the importance of transparency in reporting research methods.

It is of particular interest to myself since I am interested in the value biologist place on manually created annotations between Gene Ontology terms and biological entities, like genes and proteins. Since this type of coding / classification reliability work has long-shown the impossibility of 100% agreement between different coders, what are the implications of this for bioinformatic tools using ontology annotations?

How much trust can biologists possibly place in even good quality manual annotations if there is always a difference between annotators?

Lombard, M., Snyder-Duch, J., and Bracken, C. C. (2002). Content analysis in mass communication: Assessment and reporting of intercoder reliability. Human Communication Research, 28(4):587-604.

dx doi 10.1111/j.1468-2958.2002.tb00826.x and Full text available

CoLIS 7: The Cool and Belkin faceted classifications of information interactions revisited

June 25, 2010 at 10:42 am | Posted in web | Leave a comment
Tags: ,

Isto Huvila (Uppsala, Sweden) presented at CoLIS 7 and detailed his continuing work with the Cool and Belkin faceted classification of information interactions.

(For Cool and Belkin’s original paper, see here)

What is an information interaction? Isto showed as a picture of a child sitting at a (very dated) PC to illustrate the different ways we could consider an information interaction. The potential complexity of any classification system that attempted to capture the details and nuances of such an interaction was evidently problematic, and my sense of Isto’s presentation was that it was this complexity he had had been grappling with in his work.

My thoughts are that any successful empirical science is necessarily a simplification of the natural world around. Physics and chemistry concentrates on a narrow corridor of the physical world. As sciences, they also simplify the contents and classification of this narrow world, creating a model of what reality is, and finding themselves to possess great explanatory power within that narrow world.

The information scientist who attempts to simplify the information world is often subject to criticism of the sort, “Well, human beings and information is much more complicated than your model suggests, and therefore your model is useless.”

However, I would resist this idea (often flung into the ring from the sociologist’s corner) because if we want to at least try and offer explanations of information interactions, or perhaps adopt an empirical methodology to suggest ways we might augment these interactions, a simplification of the information world as instantiated in the Cool and Belkin classification (and Isto’s extension) is absolutely necessary.

And we can only say such a classification is not helpful when it has failed to demonstrate its utility.

I do not scorn the organic chemist because he cannot offer a complete explanation from within the chemistry paradigm of why I just ate a biscuit .

CoLIS 7: Webometrics, emergent or doomed

June 25, 2010 at 10:42 am | Posted in web | Leave a comment
Tags:

Mike Thelwall (Wolverhampton, UK) reflected on the current academic status of webometrics. Though we did not conclusively determine if the discipline is in fact doooooomed, it was interesting to see how other disciplines such as computer science, are playing with webometric-type techniques without paying much attention to the existing literature.

Is this sloppy scholarship, an active rejection of webometrics or something else? I also wondered if webometrics were not actively applied in the private sector without much of this work feeding through into the academic domain, either because there was no point publishing or because of commercial sensitivity.

Mike suggested we take a look at Blogpulse.com to have a go at our own webometric surveys. I searched for ‘gaming AND motion’, to show how bloggers and gamers are responding to the introduction of new motion gaming systems from the major manufacturers like Microsoft and Sony.

The graph below shows to clear peaks, one for Sony’s announcement of its motion controller, and the second, more recent peak, relating to interest generated by the E3 show.

CoLIS 7: Doctoral forum

June 25, 2010 at 10:40 am | Posted in lecture, thesis, work | Leave a comment
Tags:

Presented at the CoLIS 7 doctoral forum on Monday and got some very encouraging feedback from the session leaders and the other students.

It was really fascinating to get an insight into how everyone else’s research is coming along, the different approaches they are taking, and the unique problems each of us face in trying to get anywhere with our research.

Our doctoral group included students working on everything from tagging in archives and online communities based around the Twilight saga to the philosophical idea of ‘information refusal’ to retrieval challenges for Quranic resources. Oh, and I should mention a project looking at social media use in public libraries and bibliometrics in the literature studies domain. I think that was everyone – you know who you are!

My (unused) project presentation

Some notes I made to prepare

Many thanks to Jutta Haider for her hard work in organising the forum – it was great!

Next Page »

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.

Follow

Get every new post delivered to your Inbox.