mcw department of physiology

42
Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D.

Upload: trygg

Post on 02-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D. MCW Department of Physiology. Human & Molecular Genetics Center. http://rgd.mcw.edu. Meet the client. Rat researchers ask. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MCW Department of Physiology

Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories

Simon Twigger, Ph.D.

Page 2: MCW Department of Physiology

MCW Department ofPhysiology

Human & Molecular Genetics Center

http://rgd.mcw.edu

Page 3: MCW Department of Physiology

Meet the client

Page 4: MCW Department of Physiology

Rat researchers ask...

What tissue is this gene expressed in?What expression data is known for SD (aka

SD/NHsd, Harlan Sprague Dawley,

Sprague Dawley) rats?

Are any of these genes associated

with my phenotype?Has this gene been seen in the brain?What rat expression studies have

been done on Mammary Cancer(aka breast neoplasms/breast

cancer/cancer of the breast, breast carcinoma...)?

Has anyone done any expression studies using congenic rats?

Page 5: MCW Department of Physiology

Biological Data Warehouse

Really important piece of data...

Page 6: MCW Department of Physiology

Problem...

WherWhere, e,

what,what,when?when?

+

Page 7: MCW Department of Physiology

(one) Solution?

WhereWhere, ,

what,what,when?when?

+

Page 8: MCW Department of Physiology

How to create the index?

Page 9: MCW Department of Physiology

Examine One by One?

Analysis of anterior pituitary glands of ACI, Copenhagen, and Brown Norway males following treatment with the synthetic estrogen diethylstilbestrol (DES).

Copenhagen = COPBrown Norway = BN

Page 10: MCW Department of Physiology

NCBO ontology services

http://bioportal.bioontology.org/annotator

Page 11: MCW Department of Physiology

Open Biomedical Annotator

http://www.bioontology.org/wiki/index.php/Annotator_Web_service

Page 12: MCW Department of Physiology

• Datasets

• Series• Sample

s

• Datasets

• Series• Sample

s

Initial Ontologies & Workflow

Page 13: MCW Department of Physiology

Phase 1Small Scale Testing

Page 14: MCW Department of Physiology

http://gminer.mcw.edu/

Initial Test Load:

30 Rat Dataset records (GDS) out of 23632 Series records (GSE) out of 750587 Sample records (GSM) out of 7288

RubyOnRails web application to view data

Page 15: MCW Department of Physiology

Parallel Annotation Workflow

Page 16: MCW Department of Physiology

Concurrent Annotation Results

August October

Page 17: MCW Department of Physiology

Cloud-enabled Workflow?

Page 18: MCW Department of Physiology

Results/Demo

Page 19: MCW Department of Physiology

Initial Observations - Synonyms

DES

Ept6Searching with synonyms can be great:Ept6 = ACI.COP-(D3Mgh16-D3Rat119)/ShulDES = Diethylystilbestrol

Page 20: MCW Department of Physiology

Initial Observations - Synonyms

Searching with synonyms can cause problems:Estrogen-induced pituitary tumorigenesis = EPTEthanolaminephosphotransferase activity = EPT

Page 21: MCW Department of Physiology

Initial Observations 2Rat Strain symbols

AT, AN, AS, A, B, CDG (1000 x g)C (˚C)TX (Abbreviation for Texas)

...pituitary gland of the ACI, Copenhagen and Brown Norway Rat.

...16 month-old Sprague-Dawley females that......expression data from female SD rats with access to

lifelong......Strain or Line: F344/NCrl ...

...dahl Salt-sensitive (S) rat and S.R(9)x3A congenic rat....

...kidneys from Dahl salt-sensitive males...

Train classifier on real strain phrases? Look for relevant neighboring terms?

Page 22: MCW Department of Physiology

Initial Observations - Anatomy

In GEO recordsIn GEO records Corresponding MA termCorresponding MA termWhite Adipose Tissue White FatBrown Adipose Tissue Brown Fat

Ulnar bone Ulna boneSkeletal Muscle Set of Skeletal Muscle

Anterior Pituitary Anterior Pituitary GlandCalvarial Bone ChondrocraniumLeft Ventricle Heart Left Ventricle

Potential synonyms that could be added to MA

Page 23: MCW Department of Physiology

Phase 2All Rat Affy Samples

1 ontology (Anatomy)

Page 24: MCW Department of Physiology

0 Rat Dataset records (GDS)479 Series records (GSE)12,012 Sample records (GSM)

Larger scale data load

Page 25: MCW Department of Physiology

Targeted Indexing

Mouse AdultGross Anatomy

Ontology

Page 26: MCW Department of Physiology

Results/Demo

Page 27: MCW Department of Physiology

Linking annotations to data

Tm2d1

RGD1306410

Svs4

Hbb

Scgb2a1

Alb

Page 28: MCW Department of Physiology

Tm2d1

RGD1306410

Svs4

HbbScgb2a

1Alb

+

Hbb is_expressed_in rat kidneyTm2d1 is_expressed_in rat kidney

Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2)62,000 samples x ca. 25,000 genes/sample = 1.5B data points

Linking annotations to data

Page 29: MCW Department of Physiology

Probeset results on GMiner

Gabdr

Page 30: MCW Department of Physiology

Probeset results on GMiner

Page 31: MCW Department of Physiology

RDF Data integration

Triple StoreTriple Store

OpenRDF Sesame

Virtuoso Open Source

Rat Genes& xrefs

Probeset toRGD ID

Probesetto MA

Mouse AnatomyOntology

Page 32: MCW Department of Physiology

Ongoing•Work on term recognition, strains, etc.

•Evaluation of Probeset-to-Anatomy results

•Curation interface to add additional terms

•RDF formats, Triple Store implementation

•Integrate Strain and tissue results into RGD

Page 33: MCW Department of Physiology

Education & Outreach

Page 34: MCW Department of Physiology

Meet the student

Page 35: MCW Department of Physiology

You!

Heavy ScientificProblem

Ontologies More knowledge

through

education =

bigger lever!

Researchers

Page 36: MCW Department of Physiology
Page 37: MCW Department of Physiology
Page 38: MCW Department of Physiology
Page 39: MCW Department of Physiology

Video #3 is being shot this week

Page 40: MCW Department of Physiology

Future Videos

Target is the scientist!

• Solve common tasks• Use annotation tools• Evaluate annotations

• Intro to specific ontologies• Interview ontology teams

• Ideas?• What does your community need?

Page 41: MCW Department of Physiology

Acknowledgements

• Joey Geiger - Development of GMiner

• Jennifer Smith - Video creation, data curation

• Rajni Nigam - Rat Strain Ontology

• Clement Jonquet - NCBO OBA tools

• Trish Whetzel - Video script feedback

• Mark Musen & NIH Roadmap Initiative - Our Funding!

Page 42: MCW Department of Physiology

Links

• http://twigger.hmgc.mcw.edu/ncbo/ Project webpage

• http://gminer.mcw.edu Web application

• http://github.com/mcwbbc/gminer Gminer Code

• http://github.com/simont/MCW-RDF RDFizer code

[email protected]