collaborative ontology building: so much more than authoring an ontology

Post on 12-Nov-2014

126 Views

Category:

Science

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Keynote talk at Workshop on Collaborative Construction, Management and Linking of Structured Knowledge (CK 2009), Washinton, 2009

TRANSCRIPT

Collaborative Ontology building: So much more than authoring an

OntologyRobert Stevens

BioHealth Informatics GroupThe University of Manchester

ManchesterUnited Kingdom

Robert.Stevens@manchester.ac.uk

Overview

• An experiment in collaborative authoring• Issues raised• Observations made• The process and the artefact• Bits of technology

Ontologists: What’s their Problem?

David RandallManchester Metropolitan University

What do I Know about Collaborative Ontology Authoring?

• “you’ve never built a real ontology”• Advisor in projects• Experiments in collaborative authoring• Doing it for real in a Kidney and urinary Pathway

Ontology• Informal observational studies with collaborative

protégé

The Software Engineering Life-CycleOntolo

gy

Issues in OntologyAuthoring

SCOPESCOPE

COMPLEXITYCOMPLEXITY

COSTCOST

AUTHORINGAUTHORING

EVALUATIONEVALUATION

http://ontogenesis.ontonet.org/ppt/Issues_mindmapSB.pdf

The NCL Study• A small group met to normalise the OBO Cell

Ontology (CL)• Transform an axiomatically lean hand-crafted

“tangled” ontology to:• An axiomatically rich ontology where the structure is

computationally maintained• Study the process and deliver the artefact• http://www.gong.manchester.ac.uk/CTON.html• Two two day meetings; videoed and observed by an

ethnographer• Part of the OntoGenesis network

Contractile cell CL

What is Ontology Normalisation?

• Hand-crafted ontologies with multiple inheritance are “tangled”

• Usually axiomatically lean• We classify along one axis and use

“restrictions” to other modules to capture other axes

• Then re-build the multiple inheritance using the axiomatically rich ontology

Tangled Ontology of Cars

Tangled Untangled Inferred

Contractile cell nCL

The People

• Ten people “friends and family”• All some sort of biologists• All familiar with OWL and normalisation• All “singing from the same hymn sheet”

The Overall Process• Analyse issues in current OBO CL• Determine primary axis of classification• Identify supporting ontologies• Identify properties and design patterns; determine

representation• Gather knowledge• Generate OWL encoding• Evaluate, iterate• Two face to face meetings; separate work; email and

skype

Questions Raised

• When do we work as a larger group; smaller groups and singly?

• What resources do we use?• Who knows what?• What strategies do we use?• What expertise do we need?• What are the vested interests?

Producing the “schema”

• What is it we want to say about cells?• How do we want to say it?• Most time was spent on these questions (one day)• Best Face to face as the whole group• Perhaps a fait accompli in the large• Lots of modifications through debate• Strong chair and process (“bhenevolent

dictatorship”)

“what about sea urchins?”

“what about sea urchins?”

Ethnographer’sObservations!

Ethnographer’sObservations!

I don’t knowabout plantsI don’t knowabout plants

NCL Schema Captured in a Spreadsheet

Term Name CTO id ploidy morphologyCellular component size germ line nucleation process

slow muscle cell CL:0000189

PATO:0001873

GO:0030017 ; GO:0005739 Large n/a

PATO:0001908 GO:0031444

blue sensitive photoreceptor cell CL:0000495

PATO:0001394

PATO:0001154 ; PATO:0001873 Large Somatic

PATO:0001407

GO:0050908 ; GO:0007603

green sensitive photoreceptor cell CL:0000496

PATO:0001394

PATO:0001154 ; PATO:0001873 Large Somatic

PATO:0001407

GO:0050908 ; GO:0007603

R1 photoreceptor cell CL:0000687

PATO:0001394 ?? Variable Somatic

PATO:0001407

GO:0050908 ; GO:0007603

CL normalisation Workflow

Ontology API

CL Spreadsheet

The Ontology Preprocessor Language

• Adding “select”, “add” and “remove” keywords to MOS

• A “scripting” language for OWL• We generate a list of instructions to build an

ontology• We can embed patterns in to this generation• Saves “mouse clicks”• Rapid production of large amounts of ontology• Easy to apply changes; acts as a macro language

OPPL sampleADD Class: CL_0000811;REMOVE subClassOf owl:Thing;ADD label ``CD8-positive, alpha-beta immature T cell'';ADD subClassOf cto:Cell;ADD subClassOf cto:has_ploidy some pato:PATO_0001394;ADD comment ``MORPHOLOGY: pleiomorphic'';ADD comment ``CELULAR COMPONENT: '';ADD subClassOf cto:has_size some cto:Small;ADD comment ``GERM LINE: n/a'';ADD subClassOf cto:has_nucleation some pato:PATO_0001407;ADD subClassOf cto:participates_in some go:GO_2456;ADD subClassOf cto:participates_in some go:GO_0021700;ADD subClassOf cto:participates_in some go:GO_0032940;ADD comment ``PROCESS: '';ADD comment ``LINEAGE: mesoderm'';ADD subClassOf cto:appears_in some cto:Animalia;ADD comment ``ORGANISM COMMENT: '';ADD subClassOf cto:potentiality some cto:TerminallyDifferentiated;

What we GenerateClass: 'CD8-positive alpha-beta immature T cell'

SubClassOf: Cell, has_morphology some pleomorphic, has_nucleation some mononuclete, has_ploidy some diploid, has_potentiality some TerminallyDifferentiated, derives_from some 'double-positive alpha-beta immature T cell', located_in some 'Animalia',

not (participates_in some gametogenesis), participates_in some 'T cell mediated immunity', participates_in some 'developmental maturation', participates_in some 'secretion by cell'

A Defined ClassClass: “diploid cell”EquivalentTo: cellThat has_ploidy some diploid

• Picks up all cells that has_ploidy some diploid• Trivial, but difficult to do by hand and be complete

Class: “germline cell”EquivalentTo: cellThat (participates_in some gametogenesis) or

(directly_derived_from some gamete)

The Representation

• Aligning with RO and most OBO conventions• Red_blood_cell participates_in some

Oxygen_transport• Red_blood_cell has_disposition some

(realisable_entity that is_realised_in some oxygen_transport)

• First is simple and useful, but not actually true• Second is more ontologically formal and “right”, Can

easily expand the “schema” to either representation• Do experiments with patterns

Entity Quality or Entity Property Quality Pattern?

• At least two ways of representing qualities• Need only one instance of a quality type inhering in

each entity• has_quality exactly 1 diploid • coupled with has_quality max 1 ploidy• Otherwise:• has_ploidy some diploid • has_ploidy is functional and in property hierarchy

under has_quality• Again, applying patterns is easy; do experiments;

gain consistency

Time Spent

• First two day meeting• One day “planning the schema”• Half a day describing 30 cells and producing

an ontology• An hour or so evaluating and re-generating• Quick iterations and always having an

ontology to look at

The Second Meeting

• Six months gatherhing material • An hour or so of review all together• Pairs adding more material• A review• More pair work• More review• Then dispersed activity (all “spare time”)• Short iteration periods (in terms of work spent)

Resources used

• Brain power;• The Web – Wikipedia is our friend• Other ontologies• Text books (minor use)• Research papers• The developing ontology and the reasoner• Phone a friend (who is an authority in the

field?)

Identifying Issues in OBO CL

• CL generated in a few days and not really touched (not true now)

• Lots of well recognised issues: Wrong biology; missing biology; ontological defects; …

• Still observed to be very useful• Issues gave us some “tests”

Identifying Supporting Ontologies

CL Ontology

PATO Qualities

GO

Biological Process

GO

Cellular Component

NCBITaxonomy

FMA Anatomy

Nucleation

Morphology

Size

Ploidy

Muscle ContractionSecretion

Bacillus anthracis str. Ames

ChloroplastCell Membrane

Epithelium

Kidney

“It lets me do the biology” • Is what one of our biologists said• I can see what we’ve said about a cell• I can see where it is in the structure• I relate the two• The work is “turned around”: thinking about the biology and

its consequences• P1: flight muscle cell, thats interesting ... no, a cardiac muscle

cell is not a skeletal muscle cell!! • P2; a flight muscle cell is never a cardiac muscle cell.’• “Why has it put it there?”• Hereit” is the reasoner

Strategies

• Pinning down the scope: Only cells in vivo• Dealing with a representative set of cells:

developing a test plan• Collective wisdom: testing against current

knowledge – “pericytes”• Concentrating on biology and less on ontology

egineering• Using the owners and authorities

Being “Agile”

• Software engineering has moved on from simplistic life cycles

• Agile methods are the fashion• Embedding users• Always have something working• Test driven development• Short iterations• Deliver early

Observations on Collaboration• The work is not mechanical• It involves extensive synchronous face-to-face work on

deciding on scope and purpose• It relies on a socially distributed expertise, and ‘knowing

who knows’• It involves the synchronous or rapid use of a number of

different artefacts, and an understanding of how best to use them.

• It involves constant ‘testing’ and the delaying of final decisions through ambiguity resolution and error checking, and the constant recording of rationales for decision-making

The New KUPO Process

CollaborativeSpreadsheetCollaborativeSpreadsheet

Individual SpreadsheetIndividual

Spreadsheet

Semantic WikiSemantic Wiki

Issue TrackerIssue Tracker

OPPLScript

Formulation

OPPLScript

Formulation

Generate OWL

Generate OWL

Reasoned OntologyReasoned Ontology

View OntologyView Ontology

Summary

• Mass direct authoring of an ontology seems bad• In NCL we only used Protégé to “look at it” – no

hand-building• Mass knowledge gathering and commenting seems

good• Keeping “Agile” seems good• Doing too much by hand seems bad• Developing the schema in a team seems good • The team should have a coherent, non-clashing

interests

Acknowledgements

• Mikel Aranguren and Simon Jupp for slides• Mikel Aranguren, Simon Jupp, Helen

Parkinson, Phil Lord, David Shotton, James Malone, Jonathan Bard, Midori Harris did the work

• Dave Randall did the ethnography• The EPSRC for funding OntoGenesis

top related