Does not mean not for NOT qualifiers in Gene Ontology files?

25 01 2010

Gene associations between entities and GO terms are saved by the Gene Ontology consortium to large annotation files in a simple file format, available at the link below:

http://www.geneontology.org/GO.current.annotations.shtml

Annotation files are divided up by species and are used to create automatic Gene Ontology links to lots of different external database resources such as Entrez-Gene.

Moreira et al. (2007) criticised this annotation file format, citing semantic ambiguities inherent to the file structure which could be solved using OWL. A big problem for example is the use of the NOT qualifier, which means that an annotation, often manually curated, specifies that an ontology term is NOT associated with a gene.

So the gene FMN1 in S. cerevisiae (according to QuickGO) is NOT associated with FMN adenylytransferase activity.

  • See gene FMN1 (riboflavin kinase) Saccharomyces cerevisiae in QuickGo

The EntrezGene service automatically extracts GO annotations, presumably from recent annotation files. The paper above noted that the FMN1 entry in EntrezGene did not highlight the negative relationship between FMN1 and FMN adenylytransferase activity because it failed to infer the appropriate meaning from the NOT qualifier.

The entry is still labelled with the GO annotation ‘FMN adenylytransferase’ activity, even though PMID 10887197 suggests FMN1 in S. cerevisiae does not display said activity.

This is a problem.

Moreira, D. A., Shah, N. H., and Musen, M. A. (2007). Interpretation errors related to the go annotation file format. AMIA … Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, pages 538-542.

PMID 18693894 and Full text available HERE




CoLIS 7 is in London this year

22 01 2010

The Seventh International Conference on Conceptions of Library and Information Science is coming to London between 21-24 June 2010. The overall theme is:

Integration in the information sciences: unity in diversity

This conference will explore the integration and underlying unity of the information sciences, as both academic disciplines and as work practices.

More information can be found on the CoLIS 7 website:

http://colis.soi.city.ac.uk/





Not too much information when using the Gene Ontology

19 01 2010

I wonder how often results in the biological literature are reproducible. With standards for data description, common analysis tools and the facility to add supplementary information to published articles, it *should* be really easy to provide enough detail to make bioinformatic-type analyses totally transparent.

Rhee et al. (2004) provide suggestions to authors using the Gene Ontology and annotations on how to avoid certain pitfalls when analysing biological data using GO. They conclude:

“…it is crucial for any analysis to cite data sources (including the version of ontology, date of annotation files, numbers and types of annotations used, versions and parameters of software, and so on) to ensure that results are fully reproducible.”

I don’t think the authors would have put this into their review if biologists were already citing full data sources. What is the point of not fully citing data sources? Is it a recurring oversight or something else?

Rhee, S. Y. Y., Wood, V., Dolinski, K., and Draghici, S. (2008). Use and misuse of the Gene Ontology annotations. Nature reviews. Genetics, 9(7):509-515.

dx doi 10.1038/nrg2363 and Full text available HERE





Can a computer be creative?

13 01 2010

Whilst reading a paper on creativity in the sciences, it occured to me that e-science methodologies – which depend heavily on computers and informatic solutions to provide leverage on complex scientific problems – do little to factor in the very important idea of research creativity.

Perhaps e-science enables scientists to be more creative, because they are able to handle much larger datasets than they would otherwise be able to. Data produced by CERN could not be analysed on the back of napkin.

And yet e-science infrastructure is dependent on a certain amount of control and standardisation. The Gene Ontology for example both standarises terminology and meaning for biological language, and determines that which is permitted. Biologists are not given free reign to edit the ontology to their own purposes, which on the one hand is good (everyone is analysing bio-data using the same tool) and on the other hand is bad (everyone is analysing bio-data using the same tool).

If the Gene Ontology contains flaws, or weaknesses, or limitations, then these will shape the kind of explanations biologists can infer from results. It is an example of an e-science approach gifting authority and control at the expense of freedom and creativity.

Perhaps one day computers will be creative. Perhaps they will be able to think, and guess, and imagine novel solutions to difficult and mysterious problems.

Until that day though, I think it is important to remember that a tension exists between the creative impulses of the thinking scientists who, engaged in a human endeavour little different to the composition of music or act of painting, may crave standardisations and informatic tools to tackle complex problems, yet risks adopting these tools at the expense of the freedom to imagine what is not a standard, what is not paradigmatic.

Heinze, T., Shapira, P., Senker, J., and Kuhlmann, S. (2007). Identifying creative research accomplishments: Methodology and results for nanotechnology and human genetics. Scientometrics, 70(1):125-152.

dx doi 10.1007/s11192-007-0108-6 and Full text available HERE





Life scientists just Google it

17 12 2009

Interesting set of case studies recently published by the Research Information Network under the title, ‘Patterns of information use and exchange: case studies of researchers in the life sciences’.

Informal exchange of information was found to be very important in the day-to-day work of life scientists, as were simple search solutions to information needs – ie, biologists just Google it first.

The serendipitous nature of the results provided by search engines like Google, with the extra context information they can provide, is relevant. Life scientists aren’t so different to the rest of us in our searching: they look for easy options, aim at getting lucky and don’t have the time to learn new tools.

Aggregators and meta-searches might be a good solution to some information needs in the sciences. Rather than specialised tools, perhaps simple interfaces, straightforward customisations and ‘pushing’ likely relevant information to desktops, leaving researchers to graze at their leisure, might be better than complex tools, sites you have to visit and search for information or really specific searches on certain topics.

So if you’re interested in the devlopmental biology of Drosophila, perhaps a simple setup where articles, datasets, blog posts, sequence information and commercial kits are combined in a regularly updated page might be useful.

And of course, a great big conspicious Google search box at the top of the page. Who am I to deny what biologists want?





The Wayne’s World 2 approach to research data

17 12 2009

Research data infrastructure planning and associated initiatives in the UK (for example) follow the Wayne’s World 2 model.

Wayne asks Jim Morrison what he should do with his life. Jim tells him to put on a concert:

“How will I get the bands to come?”
“If you book them, they will come.”

The approach is top-down. The logic is thus: UK research data is a national resource, so we need national planning consortia to construct a research data management agenda. Create policies, tools, an infrastructure, and researchers will adopt.

Book the bands, says Jim, and they will come.

I am not optimistic this is going to work, and a comment in response to a RIN blog post on this issue reflects this. Can a ‘coherent national framework’ for research data management be imposed on the academic landscape in the UK?

If researchers exist as small communities, with their own idiosyncracies and habits, a UK-wide strategy for data management may simply not work. Why not go small? Small, local projects, funded to encourage small networks of groups to come up with their own solutions?

The One-Big-Happy-Family image for researchers, creating data, curating it and sharing it for the common good, could be a mirage. A centralised approach won’t work and, if progress on a national e-infrastructure is anything to go by in the last few years, is not working.

If you create an infrastructure. UK scientists will sit on their hands and refuse to come. Perhaps we should turn the problem round and start small instead.





I’m not lazy

16 12 2009

Why haven’t I posted anything in the last few weeks?

As I told my friend, blog posting frequency is inversely related to thesis progression.

I’ve been writing several sections to the traditional literature review that will form part of the preamble to my thesis and to my transfer report next year (transfer to full PhD).

Thus far I have made notes on the history of the Gene Ontology, what an ontology is, classification in biology, formal ontology in biology and science and technology studies (STS) theories relevant to ontology usage.

I’ll post the documents as they stand thus far up before Christmas.





You’ve got a bit of meaning on your face

16 12 2009

Is knowledge objective or interpretive? Epistemologists can be terribly wound up about this.

As I understand it, objectivists believe that knowledge is not bound to the signs and symbols through which it can be represented: it can be torn from practices and processes, from what we do with knowledge. Interpretivists scoff at this, and argue that knowledge inheres in signs and symbols, in how we use these signs and symbols. Knowledge is an action and a meaning in context.

Can we have it both ways, as (Williams, 2008) suggests:

“…knowledge both as ‘‘object’’and as ‘‘meaning’’, not just the one or the other”

I don’t see why not, so long as we avoid mixing up psychology and logic. Logic can generalise a term ‘apple’ to refer to apples and apple-like objects in the real world. We can then ascribe meanings to this term ‘apple’: apples are round, some apples are red, we can eat apples.

We can then argue about what an apple *is* or what the word ‘apple’ *means* by criticism and argument. In this sense, knowledge is objective. If I say: “My concept of what apples are includes the sense that apples can transform into sheep and recite TS Eliot’s ‘The Wasteland’” then my understanding of apples is screwy – I’ve got the meaning wrong.

The logical content of statements composed of symbols is a different knowledge to that inhering in the mind, or in our psychological handling of signs and symbols. Meaning is derived from practices, from communities, from day-to-day usage. These are psychological activities – the handling of mental objects and cognitive processing of these objects. Knowledge is thus personal. We can lie, fail to understand, or impose meaning by force. This knowledge is not objective. It is interpretive.

I agree with (Williams, 2008): both objectivists and interpretivists can be accommodated within epistemology.

Williams, R. (2008). The epistemology of knowledge and the knowledge process cycle: beyond the ” objectivist” vs ” interpretivist”. Journal of Knowledge Management, 12(4):72-85.

dx doi 10.1108/13673270810884264





Thesis questions for October 2009

22 10 2009

I will most likely have to prepare my transfer report (to full PhD) in the new year, so I am working on a final plan for my schedule of practical work for 2010.

The work is shaping up to a mixture of epistemology, philosophy of science, information history and classification theory. The methods will be philosophical arguments, a case study of the Gene Ontology and (hopefully) some user behaviour studies to find out how biologists use and understand GO terms.

  1. How can an objective knowledge be constructed by social co-operation?Philosophical argument using Popper’s 3 Worlds
    Science is the revealing of ‘objective knowledge’ in the sense used by Karl Popper
    The 3 Worlds epistemology – the material, the mental, the objective – is applicable to biology
    Modern biology is the construction of objective knowledge by social means facilitated by computers
  2. Is the knowledge represented by the Gene Ontology ‘objective knowledge’?
    Case study of the Gene Ontology
    The Gene Ontology (GO) is a controlled vocabulary and a knowledge representation for molecular biology
    Formal relations between concepts in GO permit automated annotation and inferencing using computers
    The Ontology with its terms, relations and notes represents a World 3
  3. How does the Gene Ontology change over time?
    Information history approach to the development of the Gene Ontology
    The Gene Ontology changes over time
    The community participates in suggesting new terms to include
    There are other ways GO could be developed
    GO is being used in new ways eg, for information retrieval
  4. Should knowledge representations in biology be constructed by logic, consensus or with impunity?
    Interviews with experts, classification test and literature analysis
    Ontologies are logical
    Biologists want to contribute what they think are appropriate new terms
    Very new knowledge does not have a place in an ontology, so how can it be incorporated?
  5. How can the Gene Ontology be used to retrieve information?
    User relevance study with GO term-driven search results
    GO terms can be used to annotate documents in biology
    They may represent a better way to find new knowledge and relations than simple NLP




Gonna get functional, functional, we’re gonna get

12 10 2009

Been reading about functional analysis over the last week, specifically the papers highlighted here.

Interesting arguments regarding what is a function and how functions operate in scientific explanations have not, so far as I have read, been considered in the bio-ontologies literature. Why should they be relevant?

The Gene Ontology relies extensively on the idea of a biological function for the structure of all three component parts. For example, the GO term ‘cell adhesion’ is defined as:

The attachment of a cell, either to another cell or to an underlying substrate such as the extracellular matrix, via cell adhesion molecules.

See http://www.flickr.com/photos/juliendn/3356149232/

Cell adhesion is a function which operates within biological explanations. The function can be defined as stipulative: the term ‘cell adhesion’ means exactly what we want it to mean, as it is defined in the Gene Ontology.

Or ‘cell adhesion’ is a functional term which takes meaning from the way in which biologists use the term in discussions and theoretical arguments. It means different things to different people, but we can analyse these concepts and submit a descriptive description.

Or ‘cell adhesion’ is a proper function. It is a logical argument, a theoretical entity in an objective sense, which takes its meaning from a real-world referent. For example, ‘cell adhesion’ could be described as a selected effect, selected over evolutionary time to adhere cellular components together.

What is the philosophical basis for functional terms in the Gene Ontology? GO is a controlled vocabulary, but may simply represent the consensus semantics for words which only mean something in biological theories. Alternatively, GO could be viewed as much more authoritarian, in that it demands that the biological community acede to the definitions stated.

Or is GO true – an arrangement of proper functions?