plant ontologies – industrial science meets renaissance concepts

26
PIONEER HI-BRED INTERNATIONAL, INC. Plant Ontologies – Plant Ontologies – Industrial Science Industrial Science meets Renaissance meets Renaissance Concepts Concepts Dave Selinger Computational Biologist Pioneer Hi-Bred, DuPont Agriculture and Nutrition

Upload: branden-gonzalez

Post on 30-Dec-2015

23 views

Category:

Documents


0 download

DESCRIPTION

Plant Ontologies – Industrial Science meets Renaissance Concepts. Dave Selinger Computational Biologist Pioneer Hi-Bred, DuPont Agriculture and Nutrition. Outline. What is the nature of the problem that a Plant Anatomy Ontology can solve? What is an Ontology? - PowerPoint PPT Presentation

TRANSCRIPT

PIONEER HI-BRED INTERNATIONAL, INC.

Plant Ontologies – Industrial Plant Ontologies – Industrial Science meets Renaissance Science meets Renaissance

ConceptsConcepts

Dave Selinger

Computational Biologist

Pioneer Hi-Bred,

DuPont Agriculture and Nutrition

RESEARCH

OutlineOutline

What is the nature of the problem that a Plant Anatomy Ontology can solve?

What is an Ontology? How do you make a Plant Anatomy Ontology? Does it really solve the problem?

RESEARCH

Industrial ScienceIndustrial Science

Not science in industry, but the industrialization of data creation, i.e. the ‘omics revolutions.

High-throughput data Sequencing Expression

Medium-throughput data Proteomics Metabolomics

Low-throughput data Gene/protein function Phenotype

RESEARCH

The double-edged sword of The double-edged sword of Industrial ScienceIndustrial Science

Industrial science means lots of cheap data Sequencing << $0.01/base

$10,000 prokaryotic genomes are reality $10,000 eukaryotic genomes will be reality in the next five years

Expression <$0.50/gene And much of this data is available for free after it is

produced!

Lots of data means that you can’t sit down with your lab notebook and analyze the data by hand. Databases, software for searching and comparing Whole new areas of research devoted to finding

meaningful patterns in lots of data.

RESEARCH

Organizing informationOrganizing information

Information is not knowledge. But knowledge can be acquired from information. But only with a lot of effort, see third law of thermodynamics

Central challenge with Industrial science is organizing the information. The organization of the information determines what you can

discover. Experimental design

Good design will produce a contrast that will support or refute a hypothesis.

Statistical rigor – – Is the signal higher than the noise?

– How conclusive will the discoveries be?

RESEARCH

ContextContext

How do we compare across experiments? Not too hard if one person did all the experiments and

kept careful notes. If multiple people, then we need to define what was

done, what the analysis was, and what the sample was. What was done – e.g. MIAME standard for describing the

technical details of an expression experiment. Analysis – e.g. ANOVA, SAM, etc. Sample – ?

RESEARCH

Renaissance concepts (historically Renaissance concepts (historically Enlightenment)Enlightenment)

Things can be systematically described and classified Organisms - Linneaus, Species Plantarum,

1758

Linneaus’ problem is much the same as the sample description problem Variable specificity

California Laurel or Oregon Myrtlewood? Kernel or seed?

In addition, a term like kernel assumes all parts, but this assumption could be wrong

RESEARCH

Ontologies to the rescue?Ontologies to the rescue? Ontology = the study of being (Philosophy)

The specification of a conceptualization of a domain of interest (Computer Science)

Original and continuing computer science interest was Artificial Intelligence.

How can a computer make inferences? Need to define meanings – can for example. Structure and relationships in an ontology allow a computer to make

inferences.– Mary is the mother of Bill. Is Mary a parent of Bill?– IsA Mother Parent

Parts of an ontology Concepts -> objects, real and abstract, processes, functions Partitions -> rules that can classify concepts Attributes -> properties of a concept, can have individual and class

attributes Relationships -> is a, part of

RESEARCH

Does an ontology make sense?Does an ontology make sense?

The value of ontologies is a current debate among information scientists. One group advocates that ontologies are necessary for computers

to understand content. Semantic web -> an extension of the current HTML/XML based web to

something with ontological inference

Others argue that ontologies are not needed and are not practical Complexity is ok and just use a Google like search to connect concepts.

However, some problems, like organismal classification and the periodic table are very amenable to an ontological approach.

Formal categories and stable entities Expert users and catalogers

RESEARCH

Forms of ontologiesForms of ontologies

Ontologies can take several forms (data structures) Controlled vocabulary (List)

Terms but no relationships Enforces systematic naming

Hierarchy (tree structure) => Taxonomy Terms and “is a” relationship Children are unique and have a single parent

Directed acyclic graph => Gene Ontology Multiple relationship types Children with multiple parents

RESEARCH

Features of TreesFeatures of Trees

Because each child node has only one parent There is an unambiguous path to the root from each leaf Child nodes can be easily grouped at any level of the structure

Trees can express only one organizing principle Work well for taxonomy (at least eukaryotic taxonomy)

Organizing principle is classification by similarity All terms have an “is a” relationship to the next level term Organisms were classified before evolution was hypothesized, but

the classification matches the evolutionary relationships Similar example would be the periodic table of the elements Classification can facilitate discovery of underlying principles

RESEARCH

A tree based Anatomy OntologyA tree based Anatomy Ontology

Developed by Winston Hide’s group at SANBI and Electric Genetics

Single concept, orthogonal trees Cells Tissues Organs Disease state

Each tree is independent, but has related dimensions describing a sample

Set operations, intersection or union, between trees allows specific queries.

RESEARCH

Features of DAGsFeatures of DAGs

A tree is a special case of the DAG class Children can have multiple parents.

Allows multiple classifications of the same child E.g. a guard cell is both part of a leaf and is an epidermal cell. Allows for more than a binary classification of a concept

If this results from poor definition of the concept, then it is not good.

Multiple parentage fits a “normalized” data model Like a normalized relational database, a DAG can

minimize duplication of objects (concepts).

RESEARCH

Sample DAGSample DAG

Root Cooking

Spices

– Bay leaf• Laurel nobilis

• Umbellularia californica (California laurel)

Trees Lauraceae

– Laurel• Laurel nobilis

– Umbellularia• Umbellularia californica

RESEARCH

Constructing the Pioneer Plant Constructing the Pioneer Plant OntologyOntology

Decided to produce a DAG Used DAGeditor (editor developed for GO) Developed our own web based viewing tool

AmiGO was too complicated to re-use. Other public browsers did not have the functionality we wanted.

Decided to focus on Corn and Soybeans Used Kiesselbach’s 1949 Monograph on Corn structure

and reproduction as the primary source. Used Iowa State University Ag Extension publications

for the development stages of corn and soybeans Added information from a botany textbook to cover

missing terms from soybean.

RESEARCH

To collaborate or not to collaborate?To collaborate or not to collaborate?

Advantage of just using the Pioneer Ontology was that it served our needs and was focused on corn and soybeans, our major crops.

Disadvantage was that it was not synchronized to the public We would not be able to easily integrate public tissue

classifications to ours We would not be able to easily take advantage of

improvements to the public ontology Presumably the public ontology would be more

“botanically correct” than ours.

RESEARCH

Plant Ontology ConsortiumPlant Ontology Consortium

Focused on model organisms Arabidopsis Rice and other grasses with the rice terms (corn).

Used a DAG approach Multiple concepts

Structure (cells, tissues, sporophyte and gametophyte) Development

Used DAGeditor and other GO approaches Most terms have multiple parents Same software and data structures as GO

RESEARCH

Plant OntologyPlant Ontology

Domain = Plant anatomy and development Concepts

Plant parts (leaf, root, flower, meristem, etc.) Life cycle stages (sporophyte, gametophyte) Developmental stages (V1, flowering, R1, etc.)

Relationships between concepts “A kind of” (Is a)

– A prop root is a root “A part of” (part of)

– A root cap is part of a root In addition, for plant anatomy a “develops from” relation is needed

– For example the relationship between stomatal guard cells and the guard mother cell

– Guard cells develop from guard mother cells

RESEARCH

Adapting the POC ontology for Adapting the POC ontology for Pioneer’s needsPioneer’s needs

Problem is that it has many more terms than required for our experiments Some terms describe tissues or cells that are not

practical to collect (e.g. antipodal cells) Some terms describe parts not found in corn (e.g.

nectary)

Another problem is that we collect samples that are convenient subdivisions of structures Tip and base of an immature ear. Each differs from a

whole immature ear in terms of what it contains. Basal endosperm – morphologically distinct from starchy

endosperm, but not found in the ontology

RESEARCH

Our current solutionOur current solution

Add additional terms to the POC ontology Use a different id system

easily distinguished from POC terms will not be overwritten by on-going public curation efforts.

Label experiments with the terms from the ontology. Create a Custom ontology

Query the whole ontology with the terms used in the labeling and keep only

terms that are used to label an experimental sample Parent terms of used terms.

Can be readily rebuilt if new experiments or terms are added.

RESEARCH

What can you do with the ontology?What can you do with the ontology?

Provides a grouping mechanism Summarize expression for a tissue Compare expression between tissues Make complex queries that involve multiple tissues

Provides a systematic label for annotating genes Where is the gene expressed? Query annotation of genes based on terms

Provides a description of the complexity of tissue samples Leaf sample is composed of multiple cell types with different roles Cell types can be shared between tissues or structures

RESEARCH

Comparing by tissueComparing by tissue

The ontology provides the groupings, but how to summarize Mean? Median? Maximum value?

Significance of differences? Each group will be much more variable than a set of

samples from a controlled experiment. But you may be able to eliminate the inevitable false

discoveries that appear when looking at large numbers of genes.

RESEARCH

Annotating genesAnnotating genes

This is the primary use for TAIR and Gramene Potentially label most genes with tissues of expression However, need to differentiate presence with

preferential expression. A gene may be present in many tissues, but highly expressed in

a few Another gene may be present in the same tissues, but similarly

expressed in all of them.

– Might need to precompute and indicate which tissues the gene is significantly preferentially expressed in.

– Might be able to use the RMS differences between expression in each tissue as a measure of consistency.

RESEARCH

ComplexityComplexity

Genes may appear to differ between tissues for trivial reasons Example: Gene appears to be preferentially expressed

in stem versus leaf tissue. If gene is really specific to vascular tissue and stem has more… Gene is expressed late in development, adjacent leaves and

stems may differ in development.

Ontology can guide further experiments Compare vascular and non-vascular tissue from both leaf and

stem. Compare multiple leaf and stem samples from different positions

(developmental stages).

RESEARCH

ConclusionsConclusions

The Plant Ontology classifies experiments and genes based on anatomical and developmental concepts.

Now that we have significant data, can we, like Darwin, discern the underlying mechanisms for how anatomical and developmental differences occur.

The Plant Ontology will be successful and used long term if it facilitates these kinds of investigations.

RESEARCH

AcknowledgementsAcknowledgements

Pioneer Henry Mirsky Lane Arthur Bob Merrill

POC Doreen Ware (Gramene) Katica Ilic (TAIR)