experiences in building an ontology driven image database for

19
Chris Catton Standards and Ontologies for Functional Genomics Conference October 23-26, 2004 University of Pennsylvania School of Medicine Image BioInformatics Laboratory Department of Zoology University of Oxford, UK e-mail: [email protected] EXPERIENCES IN BUILDING AN ONTOLOGY- EXPERIENCES IN BUILDING AN ONTOLOGY- DRIVEN IMAGE DATABASE FOR BIOLOGISTS DRIVEN IMAGE DATABASE FOR BIOLOGISTS

Upload: carla-lima

Post on 02-Jul-2015

243 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Experiences in building an ontology driven image database for

Chris Catton

Standards and Ontologies forFunctional Genomics

ConferenceOctober 23-26, 2004

University of Pennsylvania Schoolof Medicine

Image BioInformatics Laboratory

Department of Zoology

University of Oxford, UK

e-mail: [email protected]

EXPERIENCES IN BUILDING AN ONTOLOGY-EXPERIENCES IN BUILDING AN ONTOLOGY-DRIVEN IMAGE DATABASE FOR BIOLOGISTSDRIVEN IMAGE DATABASE FOR BIOLOGISTS

Page 2: Experiences in building an ontology driven image database for

OutlineOutline

• Why are images important?

• What is the BioImage database?

• Why use a semantic web architecture?

• Lessons and research questions

Page 3: Experiences in building an ontology driven image database for

Why are biological images important in theWhy are biological images important in thepost-genomic age?post-genomic age?

• Images are semantic instruments for capturing aspects of the realworld, and form a vital part of the scientific record, for which words areno substitute

• In the post-genomic world, attention is now focused on the organizationand integration of information within cells, for functional analyses ofgene products

• In a month a single active cell biology lab may generate between 10and 100 Gbytes of multidimensional image data

Page 4: Experiences in building an ontology driven image database for

Images are complex Images are complex ……

• An image database must beable to store original images inany digital format currentlyavailable or yet to be invented,including multi-channel 3Dimages, multi-channel videos,etc.

Page 5: Experiences in building an ontology driven image database for

The need for image databasesThe need for image databases

• The value of digital image information depends upon how easily itcan be located, searched for relevance, and retrieved

• Detailed descriptive metadata about the images are essential

• Without them, digital image repositories become little more thanmeaningless and costly data graveyards

• Despite the growth of on-line journals that permit the inclusion ofmedia objects, few of these resources are freely available, and thosethat are are difficult to locate and are not cross-searchable

• There is thus a need for a free publicly available image database withrich well-structured searchable metadata

• The BioImage Database seeks to fulfil that need

Page 6: Experiences in building an ontology driven image database for

This view has a growing acceptance This view has a growing acceptance ……

Page 7: Experiences in building an ontology driven image database for

What metadata?What metadata?

• Image acquisition (who took the original micrograph, where,when, under what conditions, for what purpose, etc.)

• The media object itself (source and derivation, image type,dynamic range, resolution, format, codec, etc.),

• The denotation of the referent (e.g. the name, age and conditionof the subject),

• Connotation of the referent (the image’s interpretation, meaning,purpose or significance, its relevance to its creator and others,and its semantic relationship to other images).

• Field aspects of the real world that cannot conveniently beattached to any particular object (e.g. variations of illuminationintensity or chemo-attractant concentration across the field ofview of a light microscope image).

• Sequences of change where there is a need to preserve theconcept of object identity in the face of radical spatio-temporalchanges in appearance.

Page 8: Experiences in building an ontology driven image database for

Why use a semantic web architecture?Why use a semantic web architecture?

• Traditional relational databases don’t meet our needs

• Image data is complex, layered, and difficult to model

• Images are searched primarily through their metadata

• Metadata is time consuming and difficult to obtain

• Ontologies offer the promise of better retrieval accuracybetter retrieval accuracy through throughlinking to instances in an ontology, rather than attempting tolinking to instances in an ontology, rather than attempting toprocess free text.process free text.

• Ontologies offer the promise of easy inter-operability with othersystems

Page 9: Experiences in building an ontology driven image database for

The BioImage OntologyThe BioImage Ontology

Page 10: Experiences in building an ontology driven image database for

Lessons learned:Lessons learned:Performance, scalability Performance, scalability ……

• Database retrieval is slower than a traditional database wouldbe

• Scalability remains to be tested (true for all semantic websoftware)

• Query languages (RDQL) are immature when compared to SQL

• Parsing RDF is hard and slow (RDF-ABBREV output of theJena parser is unreliable and the unstriped format requiresmultiple passes to create XML that can easily be transformed toHTML)

Page 11: Experiences in building an ontology driven image database for

A problem with ontologies?A problem with ontologies?

• The volume of data generated in the Life Sciences is nowestimated to be doubling every month

• Already people look less and less at the raw scientific data(unless they are their own results)

• As this volume of data accumulates, few if any of us will havethe time or the mental capacity to assimilate new data, structurethem in a meaningful way and extract information, without firstprocessing the data through an ontology or some other similarmachine-based organisational aid

• THE ONTOLOGY WILL BE WRONG! (or we should all pack upand go home)

Page 12: Experiences in building an ontology driven image database for

Paradigm shiftsParadigm shifts

• Our human understanding of an area of science is never static,but is constantly being revised by new research

• Such revisions in understanding are either evolutionary(incremental), following the progressive discovery of more andmore detail, interpreted according to the prevailing paradigm, orrevolutionary, when the prevailing paradigm is overthrown byanother

• How do paradigm revolutions succeed?

"A new scientific truth does not triumph by convincing itsopponents and making them see the light, but ratherbecause its opponents eventually die, and a new generationgrows up that is familiar with it"

(Max Planck, 1949)

Page 13: Experiences in building an ontology driven image database for

Factors preventing evolutionFactors preventing evolution

• Ontology builders are ‘monks’ (and nuns) - led by an ‘abbot’, arelatively senior domain expert likely to be committed to encapsulatingthe dominant paradigm

• Substantial problems confront any newcomers wishing to contribute,since ontology building is time-consuming and expensive

• Since an ontology expresses the community consensus, there will bemassive social pressures against change

• If large volumes of data have already been encoded using an existingontology, this will make it difficult to introduce change

• The first ontology in a domain may assume a monopolistic position thatbecomes unassailable, even if it has universally acknowledgedweaknesses

• Ontologies are unlikely to evolve in response to the same marketforces that drive the development of applications software

Page 14: Experiences in building an ontology driven image database for

Encapsulating the dominant paradigmEncapsulating the dominant paradigm

• Imagine a section of an ontology describing the development of adultmammalian bone marrow and brain, constructed according to the pre-1980dominant paradigm that bone marrow develops from mesoderm, whilebrain develops from ectoderm

Page 15: Experiences in building an ontology driven image database for

• Subsequently, adult mouse brain was found to contain haemopoietic stem cells

• Bartlett (1982) hypothesised that these cells developed from foetal haemopoietic cells thatentered the brain tissue before the barrier was established

• This challenge to the dominant paradigm that brain tissues are derived exclusively fromectoderm can be accommodated by extending the graph

An example of paradigm evolutionAn example of paradigm evolution

Page 16: Experiences in building an ontology driven image database for

An example of paradigm revolutionAn example of paradigm revolution

• More recently, Brazelton et al. (2000) claimed that haemopoietic stem cells from adultbone marrow can develop into neural cells in adult mouse brain

• If true, this result overthrows the paradigm that neuronal cells can only develop fromembryonic ectoderm, requiring a new ontology incompatible with the old

• This new ontology is no longer an extension of the previous one, since neural cells nolonger develop only from foetal neuroepithelium

Page 17: Experiences in building an ontology driven image database for

A way forward A way forward –– using Named Graphs in using Named Graphs inRDF (and OWL?)RDF (and OWL?)

• In response to considerable frustration and confusion within the RDFcommunity about the best method of reifying RDF statements, JeremyCarroll et al. proposed an extension to RDF

Page 18: Experiences in building an ontology driven image database for

Thanks and acknowledgementsThanks and acknowledgements

• David Shotton and Simon Sparksfor BioImage developments(http://www.bioimage.org)

• John Pybus, our computer systemsmanager, for keeping us running inspite of the problems

• Liz Mellings for unboundedpatience inputting data and testing

• The European Commission forfunding the BioImage Project (ECIST 5th Framework Contract 2001-32688: ORIEL – Online ResearchInformation Environment for the LifeSciences; http://www.oriel.org)

Page 19: Experiences in building an ontology driven image database for

EndEnd