core 2: bioinformatics

Core 2: Bioinformatics

CBio-Berkeley

Outline

• Berkeley group background• Core 2 first round

– what: aims, milestones– how: software lifecycle, interaction w/

other cores• Current progress • Discussion

Berkeley group: genomics

• Formerly BDGP (Berkeley Drosophila Genome Project) Informatics– Genome sequencing, analysis and

annotation– Genomic application development– Database development

• FlyBase• Generic Model Organism Database

Apollo

GBrowse

In-situ expression database

Genomics applications

• GadFly– analysis and annotation database– pipeline software

• BOP– computational analysis integration

• CGL– Comparative Genomics Software

Library

SO and SOFA

• Sequence Ontology for Feature Annotation

• Ontology for genomics– Sequence feature classes:

• mRNA, intron, UTR, sequence_variant, …

– Sequence feature relations• exon part_of transcript• polypeptide derives_from mRNA

Chado• Model organism relational database schema

– FlyBase, GMOD

• Modules– sequence annotations– expression– map– genotype– phenotype– ontology/cv– …

• Generic schema– Uses ontologies for strong typing

Berkeley group: GO

• Gene Ontology - Informatics– Database, web portal – Ontology editing tools– Ontology QC and integration– OBO

OBO-Edit (formerly DAG-Edit)

AmiGO and GO Database

Obol

• Problem: large ontologies of composite terms are difficult to manage

• Solution: partial automation (reasoners)• Requires logical definitions

– how do we obtain them?

• Solution: Obol– Parses logical definitions from class names– Logical definitions can be reasoned over

• detect errors and automation

– Integrates OBO ontologies

OBO Relations Ontology

• Common relations used across ontologies must mean the same thing

– is_a– part_of– derives_from– has_participant– …

• OBO relations ontology provides precise definitions– defines class-level relations in terms of their

instances

• http://obo.sourceforge.net/relationship– collaboration with core5, Manchester & others

http://obo.sourceforge.net/relationship

Outline



other cores• Current progress • Open questions

Core 2 specific aims

• Aims1. Capture and describe data2. Reconcile annotation and ontology

changes3. Store, view and compare annotations4. Link disease genes

• First round– phenotypes: Fly and Zebrafish– HIV clinical trial data

Aim 1: Capture and describe data

• Phenotype data capture– OBO-Edit plug-ins– Combine classes from multiple

ontologies• PATO, anatomical ontologies

– NLP tools?

• Clinical trial data capture– what are the appropriate tools?

Aim 1: Capture and describe data

• Zebrafish, fly– PaTO: Phenotype and trait ontology

• phenotype ‘primitives’– ‘Entity-Attribute-Value’ model– Phenotype ontologies– Genetic data– Orthologs

• Clinical trial data– generic instance model– what are the appropriate ontologies here?

PATO

• An ontology of attributes and attribute values– e.g. morphology, structure, placement

• Current status of PATO?– needs work to conform to sound ontology

principles• definitions• formalisation of attributes

– working with core3-cambridge (Gkoutos) and core5 (Neuhaus)

Phenotype annotation

• Entity-attribute structured annotations– Entity term; PATO term

• brain FBbt:00005095; fused PATO:0000642

• gut MA:0000917; dysplastic PATO:0000640

• tail fin ZDB:020702-16; ventralized PATO:0000636

• kidney ZDB:020702-16; hypertrophied PATO:0000636

• midface ZDB:020702-16; hypoplastic PATO:0000636

• Pre-composed phenotype terms– Mammalian Phenotype Ontology

• “increased activated B-cell number” MPO:0000319

• “pink fur hue” MPO:0000374

Example (Fly)

Entity Attribute Value Background/Environment

embryp viability lethal Scer\GAL4[hs.PB]

dorsal cuticle shape abnormal

… … … …

wing vein L2 shape branched temperature sensitive

Gene: JraAllele: Jra[bZIP.Scer\UAS]Allele Description:defects in head and dorsal cuticle.Scer\GAL4[hs.PB] induces…..

A481G

bZIP

Genotype-Phenotype datamodel

• Need to model complex genotypes• Environment• Phenotype

– E-A-V is not enough• Relational attributes• Complex phenotypes• Measurements and assays

– CSHL 2005 Phenotype meeting

Aim 2: Reconcile annotation and

ontology changes• Ontology evolution can trigger

annotation changes• Identifiers

– all classes and annotations will have stable identifiers

– Cores 1 and 2 to decide on identifier model• LSID URNs

• OntoTrack

Aim 3: Store, view and compare annotations

• OBO: ontologies• OBD: data annotated using

ontologies– genotype-phenotype– clinical trials– others

OBD: A Database for OBO

• Data warehouse– collected from MODs and other sources

• Annotation versioning• Generic data model

– Any data typed by OBO classes can be stored

• Specific annotation data views– Clinical trial data view– Phenotype data view

• Chado-compliant• Entity-attribute-(value) model

Key technologies

• ‘Semantic Web’ database technology– ontology-aware

• ontologies are part of meta-model• higher level query languages

– SPARQL, SeRQL, …• tool interoperability

– Protégé-OWL, Jena, ..

– SQL compatibility• optionally layered on relational model

– Standards? Maturity?• Many implementations

– Sesame, Kowari,

Aim 3: Store, view and compare annotations

• Browsing– AmiGO-2

• Advanced visualization– work with core 1 (University of

Victoria)

Comparing annotations

• process vs state– regulatory processes:

• acidification of midgut has_quality reduced rate• midgut has_quality low acidity

• development vs behavior– wing development has_quality abnormal– flight has_quality intermittent

• granularity (scale)– chemical vs molecular vs cell vs tissue vs

anatomical part

Integrating anatomical ontologies

• Annotations should be comparable between species– phenotype annotations are composed of anatomical

terms

• Multiple species-centric anatomical ontologies– Problem: how do we compare across species?– XSPAN (Bard et al): creating mappings– Core 1: ontology mappings

Aim 4: Linking disease genes

• Homology data– Orthologous genes

• Genomic data– SNPs, sequence variants

• Ontologies– Disease ontologies– Semantic similarity– Ontology integration

• Obol, XSPAN

Linking disease to phenotype

• Relationship of phenotype to diseases and disorders– essentialist– statistical

• Disease ontologies– OBO disease ontology (Northwestern)– EVOC disease ontology (EVOC)– Others

• Disease ontology workshop (core 5)– November 2006

Outline


– what: aims, milestones– how: software lifecycle,

interaction w/ other cores• Current progress • Open questions

Software lifecycle

• Software is developed in phases• Different phases require

interaction with different cores• Iterative “Agile” methodology

– fast cycles– involve ‘customer’ (core3) at all

phases

Outline



other cores

• Current progress

Current progress

• Meetings– CSHL November 2005

• Phenotype ontology meeting• Phenotype tools workshop

– Berkeley, UVic, Core 3

• OBO-Edit complex class plug-in• Phenotype browser prototype• Genotype-Phenotype datamodel

OBO-Edit complex class plug-in

• Combinatorial composition of classes

• Current use-cases:– plant anatomical structures– integrating GO and OBO-Cell

• Ideal for phenotype classes– extend to make ‘phenotype’ plug-in

OBD Progress

• Genotype-Phenotype data model defined

• Prototype implemented• evaulating technologies

Phenotype browser

• Experimental branch of AmiGO code• Allows browsing and querying of

combinatorial phenotype annotations

• Experimental dataset• Demo

– http://yuri.lbl.gov/amigo/obd

http://yuri.lbl.gov/amigo/obd

core 2: bioinformatics

Documents

sofasequence ontology

ontology changesstore

fused pato

dysplastic pato

hypoplastic pato

hypertrophied pato

ventralized pato

web portal ontology