core 2: bioinformatics

41
Core 2: Bioinformatics CBio-Berkeley

Upload: lea

Post on 18-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Core 2: Bioinformatics. CBio-Berkeley. Outline. Berkeley group background Core 2 first round what: aims, milestones how: software lifecycle, interaction w/ other cores Current progress Discussion. Berkeley group: genomics. Formerly BDGP (Berkeley Drosophila Genome Project) Informatics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Core 2: Bioinformatics

Core 2: Bioinformatics

CBio-Berkeley

Page 2: Core 2: Bioinformatics

Outline

• Berkeley group background• Core 2 first round

– what: aims, milestones– how: software lifecycle, interaction w/

other cores• Current progress • Discussion

Page 3: Core 2: Bioinformatics

Berkeley group: genomics

• Formerly BDGP (Berkeley Drosophila Genome Project) Informatics– Genome sequencing, analysis and

annotation– Genomic application development– Database development

• FlyBase• Generic Model Organism Database

Page 4: Core 2: Bioinformatics

Apollo

Page 5: Core 2: Bioinformatics

GBrowse

Page 6: Core 2: Bioinformatics

In-situ expression database

Page 7: Core 2: Bioinformatics

Genomics applications

• GadFly– analysis and annotation database– pipeline software

• BOP– computational analysis integration

• CGL– Comparative Genomics Software

Library

Page 8: Core 2: Bioinformatics

SO and SOFA

• Sequence Ontology for Feature Annotation

• Ontology for genomics– Sequence feature classes:

• mRNA, intron, UTR, sequence_variant, …

– Sequence feature relations• exon part_of transcript• polypeptide derives_from mRNA

Page 9: Core 2: Bioinformatics

Chado• Model organism relational database schema

– FlyBase, GMOD

• Modules– sequence annotations– expression– map– genotype– phenotype– ontology/cv– …

• Generic schema– Uses ontologies for strong typing

Page 10: Core 2: Bioinformatics

Berkeley group: GO

• Gene Ontology - Informatics– Database, web portal – Ontology editing tools– Ontology QC and integration– OBO

Page 11: Core 2: Bioinformatics

OBO-Edit (formerly DAG-Edit)

Page 12: Core 2: Bioinformatics

AmiGO and GO Database

Page 13: Core 2: Bioinformatics

Obol

• Problem: large ontologies of composite terms are difficult to manage

• Solution: partial automation (reasoners)• Requires logical definitions

– how do we obtain them?

• Solution: Obol– Parses logical definitions from class names– Logical definitions can be reasoned over

• detect errors and automation

– Integrates OBO ontologies

Page 14: Core 2: Bioinformatics

OBO Relations Ontology

• Common relations used across ontologies must mean the same thing

– is_a– part_of– derives_from– has_participant– …

• OBO relations ontology provides precise definitions– defines class-level relations in terms of their

instances

• http://obo.sourceforge.net/relationship– collaboration with core5, Manchester & others

Page 15: Core 2: Bioinformatics

Outline

• Berkeley group background• Core 2 first round

– what: aims, milestones– how: software lifecycle, interaction w/

other cores• Current progress • Open questions

Page 16: Core 2: Bioinformatics
Page 17: Core 2: Bioinformatics

Core 2 specific aims

• Aims1. Capture and describe data2. Reconcile annotation and ontology

changes3. Store, view and compare annotations4. Link disease genes

• First round– phenotypes: Fly and Zebrafish– HIV clinical trial data

Page 18: Core 2: Bioinformatics

Aim 1: Capture and describe data

• Phenotype data capture– OBO-Edit plug-ins– Combine classes from multiple

ontologies• PATO, anatomical ontologies

– NLP tools?

• Clinical trial data capture– what are the appropriate tools?

Page 19: Core 2: Bioinformatics

Aim 1: Capture and describe data

• Zebrafish, fly– PaTO: Phenotype and trait ontology

• phenotype ‘primitives’– ‘Entity-Attribute-Value’ model– Phenotype ontologies– Genetic data– Orthologs

• Clinical trial data– generic instance model– what are the appropriate ontologies here?

Page 20: Core 2: Bioinformatics

PATO

• An ontology of attributes and attribute values– e.g. morphology, structure, placement

• Current status of PATO?– needs work to conform to sound ontology

principles• definitions• formalisation of attributes

– working with core3-cambridge (Gkoutos) and core5 (Neuhaus)

Page 21: Core 2: Bioinformatics

Phenotype annotation

• Entity-attribute structured annotations– Entity term; PATO term

• brain FBbt:00005095; fused PATO:0000642

• gut MA:0000917; dysplastic PATO:0000640

• tail fin ZDB:020702-16; ventralized PATO:0000636

• kidney ZDB:020702-16; hypertrophied PATO:0000636

• midface ZDB:020702-16; hypoplastic PATO:0000636

• Pre-composed phenotype terms– Mammalian Phenotype Ontology

• “increased activated B-cell number” MPO:0000319

• “pink fur hue” MPO:0000374

Page 22: Core 2: Bioinformatics

Example (Fly)

Entity Attribute Value Background/Environment

embryp viability lethal Scer\GAL4[hs.PB]

dorsal cuticle shape abnormal

… … … …

wing vein L2 shape branched temperature sensitive

Gene: JraAllele: Jra[bZIP.Scer\UAS]Allele Description:defects in head and dorsal cuticle.Scer\GAL4[hs.PB] induces…..

A481G

bZIP

Page 23: Core 2: Bioinformatics

Genotype-Phenotype datamodel

• Need to model complex genotypes• Environment• Phenotype

– E-A-V is not enough• Relational attributes• Complex phenotypes• Measurements and assays

– CSHL 2005 Phenotype meeting

Page 24: Core 2: Bioinformatics

Aim 2: Reconcile annotation and

ontology changes• Ontology evolution can trigger

annotation changes• Identifiers

– all classes and annotations will have stable identifiers

– Cores 1 and 2 to decide on identifier model• LSID URNs

• OntoTrack

Page 25: Core 2: Bioinformatics

Aim 3: Store, view and compare annotations

• OBO: ontologies• OBD: data annotated using

ontologies– genotype-phenotype– clinical trials– others

Page 26: Core 2: Bioinformatics

OBD: A Database for OBO

• Data warehouse– collected from MODs and other sources

• Annotation versioning• Generic data model

– Any data typed by OBO classes can be stored

• Specific annotation data views– Clinical trial data view– Phenotype data view

• Chado-compliant• Entity-attribute-(value) model

Page 27: Core 2: Bioinformatics
Page 28: Core 2: Bioinformatics

Key technologies

• ‘Semantic Web’ database technology– ontology-aware

• ontologies are part of meta-model• higher level query languages

– SPARQL, SeRQL, …• tool interoperability

– Protégé-OWL, Jena, ..

– SQL compatibility• optionally layered on relational model

– Standards? Maturity?• Many implementations

– Sesame, Kowari,

Page 29: Core 2: Bioinformatics

Aim 3: Store, view and compare annotations

• Browsing– AmiGO-2

• Advanced visualization– work with core 1 (University of

Victoria)

Page 30: Core 2: Bioinformatics

Comparing annotations

• process vs state– regulatory processes:

• acidification of midgut has_quality reduced rate• midgut has_quality low acidity

• development vs behavior– wing development has_quality abnormal– flight has_quality intermittent

• granularity (scale)– chemical vs molecular vs cell vs tissue vs

anatomical part

Page 31: Core 2: Bioinformatics

Integrating anatomical ontologies

• Annotations should be comparable between species– phenotype annotations are composed of anatomical

terms

• Multiple species-centric anatomical ontologies– Problem: how do we compare across species?– XSPAN (Bard et al): creating mappings– Core 1: ontology mappings

Page 32: Core 2: Bioinformatics

Aim 4: Linking disease genes

• Homology data– Orthologous genes

• Genomic data– SNPs, sequence variants

• Ontologies– Disease ontologies– Semantic similarity– Ontology integration

• Obol, XSPAN

Page 33: Core 2: Bioinformatics

Linking disease to phenotype

• Relationship of phenotype to diseases and disorders– essentialist– statistical

• Disease ontologies– OBO disease ontology (Northwestern)– EVOC disease ontology (EVOC)– Others

• Disease ontology workshop (core 5)– November 2006

Page 34: Core 2: Bioinformatics

Outline

• Berkeley group background• Core 2 first round

– what: aims, milestones– how: software lifecycle,

interaction w/ other cores• Current progress • Open questions

Page 35: Core 2: Bioinformatics

Software lifecycle

• Software is developed in phases• Different phases require

interaction with different cores• Iterative “Agile” methodology

– fast cycles– involve ‘customer’ (core3) at all

phases

Page 36: Core 2: Bioinformatics
Page 37: Core 2: Bioinformatics

Outline

• Berkeley group background• Core 2 first round

– what: aims, milestones– how: software lifecycle, interaction w/

other cores

• Current progress

Page 38: Core 2: Bioinformatics

Current progress

• Meetings– CSHL November 2005

• Phenotype ontology meeting• Phenotype tools workshop

– Berkeley, UVic, Core 3

• OBO-Edit complex class plug-in• Phenotype browser prototype• Genotype-Phenotype datamodel

Page 39: Core 2: Bioinformatics

OBO-Edit complex class plug-in

• Combinatorial composition of classes

• Current use-cases:– plant anatomical structures– integrating GO and OBO-Cell

• Ideal for phenotype classes– extend to make ‘phenotype’ plug-in

Page 40: Core 2: Bioinformatics

OBD Progress

• Genotype-Phenotype data model defined

• Prototype implemented• evaulating technologies

Page 41: Core 2: Bioinformatics

Phenotype browser

• Experimental branch of AmiGO code• Allows browsing and querying of

combinatorial phenotype annotations

• Experimental dataset• Demo

– http://yuri.lbl.gov/amigo/obd