adventures in translational bioinformatics

47
Texas A&M Health Sciences Center February 6, 2013 Harry Hochheiser, [email protected] Adventures in Translational Bioinformatics Harry Hochheiser University of Pittsburgh School of Medicine Department of Biomedical Informatics [email protected] © 2010 Harry Hochheiser Licensed under Creative Commons Attribution-NonCommercial-NoDerivs license

Upload: harry-hochheiser

Post on 29-Nov-2014

326 views

Category:

Health & Medicine


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Adventures in Translational Bioinformatics

Harry Hochheiser

University of Pittsburgh School of Medicine

Department of Biomedical [email protected]

© 2010 Harry HochheiserLicensed under Creative Commons Attribution-NonCommercial-NoDerivs license

Page 2: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Biomedical Informatics The use of informatics for the improvement of biomedical

research and clinical care

Shortliffe & Blois “The Computer Meets Medicine andBiology: Emergence of a Discipline” in Shortliffe & Cimino, eds., “Biomedical Informatics: Computer Applications in Health Care and Biomedicine

Friedman “A "fundamental theorem" of biomedical informatics. JAMIA 2009

Page 3: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

http://www.ncats.nih.gov/research/cts/cts.html

What is “Translational” research?● My research is translational...

● ..because I want to get funding?

Page 4: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Co-Clinical TrialsNardella, Lunardi, Pataik, and Cantley The APL Paradigm and the “Co-Clinical Trial” Project Cancer Discovery 2011● Acute Proyelocytic Leukemia

● 13 years of work – cloning, description, transgenics, trials

● Successful therapy now approved for clinical use..

● Co-clinical trials – synchronize human and animal trials.

Page 5: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Measuring Success

● How is success of translational science measured?

● Cures? Therapies?

● Citations mapping model organism studies to T2, T3, T4 success?

● How do we assess impact that doesn't play out in the usual 3-5 year grant cycle?

Page 6: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

FaceBasehttp://www.facebase.org

National Institute of Dental and Craniofacial Research

Five-year initial phase 2009-2014

“..systematically compile the biological instructions to construct the middle region of the human face and precisely define the genetics underlying its common developmental disorders, such as cleft lip and palate”

10 Projects: U-flavor contracts

Data Management and Coordination Hub

– “One-stop access to craniofacial research data”

– “Allow scientists to more rapidly and effectively generate hypotheses and accelerate the pace of their research”.

Page 7: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

FaceBase Projects Hochheiser, et al. 2011

Page 8: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

FaceBase Data

● 10 projects

● 4 organisms: mouse, zebrafish, chick, human

● Varying developmental time points

● Data types● Expression: microarray, ChIP-Seq, miRNA, ● Images: 3D MRI, OPT, microMRI, microCT, 3D human facial

mesh● Sequences: Genotypes.GWAS, CNV● Software: 3D facial image analysis● Strains: Cre recobminase drivers

● 1 Data management and coordination hub

Page 9: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

What does it mean to effectively share data?

Multiple data types

– Modalities

– Organisms

– Protocols

Multiple Groups

– Some collaborative, others...

.

Diversity of data

Diversity of projects, goals, etc.

Challenges are both technical and social/organizational

Page 10: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

FaceBase Hub Technical Challenges

Data DiversitySequences

microarraysmiRNAImages..

Data Volume Genotypes RNA-Seq

Anatomy Phenotype

Integration UCSC Genome Browser Between datasets

“Controlled Access” Human Data Genotypes Facial Images

Tools Searching Browsing

Metadata

Page 11: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Metadata: Theory

● Well-defined metadata fields/attributes

● Controlled vocabularies provide consistent terminology for each field

● Link to appropriate resources as needed: NCBI, MGI, etc..

● Additional attributes specific to each data type

● Consistent metadata supports search, navigation...

Page 12: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Metadata in practice

“What's the difference between mutation and Genotype?”

“Uhh. I'll have to get back to you on that.”

“Strain name? Is “C57B6J” the same as “C57Bl/6J?”

“We're lucky to have any information at all when it comes to these mice...”

“The official name of the strain is ____, but everybody calls it ____”

Page 13: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Biomedical Ontologies might solvesome terminology and structure problems

Human Anatomy: EHDAA, FMA

Mouse Anatomy: EMAP, MA

Mouse/Human Phenotypes:

MP/HPO

Genes: GO

Data Models: MIAME, MiXX,

ISA-TAB, OBI

Page 14: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Metadata Possibilties: Ontologies & Data Models

Ontologies, Data models, etc. User Practice

Grand Canyon NPS http://www.fotopedia.com/items/fickr-7553734530

Page 15: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Implications

Good metadata is vital for search, retrieval, data sharing, reuse

Curation effort can compensate for some inadequacies, but

– EC

Curation Effort, Mq Metadata Quality

– EC = f( 1/M

q)

– Possibly non-linear

This doesn't scale

Page 16: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Search tools

● Search + Link: current practice in bioinformatics repositories

● Text search over metadata● Linear results list● Results pages with many links

● Advantages● Easy, fast

● Disadvantages● Relationships between items in result list are not clear● No overviews to help with higher-level meta-understanding

Page 17: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

The Ontology of Craniofacial Development and MalformationExtend existing ontologies to provide richer models of

craniofacial anatomy, development, and malformation

Foundational model of Anatomy (FMA), Human Phenotype Ontology (HPO), Mouse Anatomy (MA)..

Add human developmental description

Human-Mouse links

https://www.facebase.org/content/ocdm

But... better models don't guarantee usage

Page 18: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

The tool gap

“Tools have advanced to the point of being able to support users fairly successfully in finding and reading off data (e.g. to classify and find multidimensional relationships of interest) but not in being able to interactively explore these complex relationships in context to infer causal explanations and build convincing biological stories amid uncertainty.”

B. Mirel Supporting cognition in systems biology analysis: findings on users' processes and design implications (2009) J. Biomedical Discovery & Collaboration

• Need: Tools that effectively leverage combination of human intellect and computing power

Page 19: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Information Visualization● Interactive displays of high-dimensional data sets

● Coordinated views facilitate comparison across dimensions and

● Multi-scale

● Overview● Zoom/filter● Full details on requested items

● Rapid, incremental, reversible queries

● Avoid 0-hit or million-hit queries.

Page 20: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Toward integrative tools

Human Mouse Zebrafish

Sequence

Expression

Images

Morphometry

Genotypes

How do we build connections?

How do we get users to think differently about the data?

Page 21: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Information Visualization● Interactive displays of high-dimensional data sets

● Coordinated views facilitate comparison across dimensions and

● Multi-scale

● Overview● Zoom/filter● Full details on requested items

● Rapid, incremental, reversible queries

● Avoid 0-hit or million-hit queries.

Page 22: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Technology Probe: Connecting related data items

• Single search, multiple data types

• Links between items indicate connections

• Images or other details on demand for direct access to data

• Rapid response supports interactivity, exploration, serendipitous discovery

Page 23: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Technology Probe: Gene Atlas

• Genes displayed on timeline, indicating when active

• Images display expression localization

• Large image for detailed views.

• Mouse-over coordinated links

Page 24: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Vision vs. reality

● Data is still coming in

● Gene atlas requires coordinated analysis ..

● What can we do with data that is available?

Page 25: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Technology Probe: Gene Atlas

• Genes displayed on timeline, indicating when active

• Images display expression localization

• Large image for detailed views.

• Mouse-over coordinated links

Page 26: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Current Search Interface

Page 27: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Timeline viewApproximate alignment of datasets on common timeline

Page 28: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

3D Facial Meshes

Two datasets

– Weinberg & Marazita: Caucasian facial norms

– Spritz: Tanzanian

Identifiable data -access only upon DAC approval

Query tool: identify subsets for further analysis.

Page 29: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

UCSC Genome Browser Mirror

●Genomebrowser.facebase.org

● > 30 tracks, mouse mm9

● RNA-seq● ChIP-seq● Various anatomic

locations, time points, etc.

● Human CNV hg19

Page 30: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Imaging

● microMRI,

● MicroCT

● OPT

● Interactive Browsers?

Page 31: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Fishface

Atlas of zebrafish craniofacial development

C. Kimmel, et al.

Page 32: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Current Status

As of 4 February 2013

> 200 datasets

> 100 in queue awaiting curation or investigator approval

Continuing to refine metadata and data organization

Expecting large infux of data over the next year.

Page 33: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Toward Greater Integration

Data

Currently

Data

Metadata Metadata

Links via metadata: Comparable organism, stage, Gene, etc..

Ideally

Data Data

Metadata Metadata

Links via data:Commonly expressed genes,Sequences,Pathways

“Easy” for similar dataExpression, Sequences

Harder for diverse data

Images?

Page 34: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

FaceBase as a model

● Diverse researchers and projects

● Focus: specific clinical domain

● Goal: collect and coordinated data – community resource

● Outstanding challenges?

● Is this even the right idea?

Page 35: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Build it and they (may) come?

●Speculative claim: assemble good data and it will be of use

●How to evaluate this?

●How to compare to n R01s that might otherwise have been funded? (question asked by program officer)

● ENCODE controversy

“..the lesson I learned from ENCODE is that projects like ENCODE are not a good idea.”

M. Eisen http://www.michaeleisen.org/blog/?p=1179

Page 36: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Three key factors

Tools Incentives

Outreach

Page 37: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Desperately needed!

Search tools – finding data

Annotation Tools – describing data

– Using ontologies

– Common data models

Tools

Minimize the effort required to provide the high-quality metadata needed for translational bioinformatics.

Page 38: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Perhaps the worst bioinformatics data management tool

Page 39: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Perhaps the best bioinformatics data management tool

Page 40: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Pros and Cons of Spreadsheets

Cons:

– ad-hoc data models

– ad-hoc data fields

– No consistency, provenance...

Pros:

– Ubiquitous

– Simple

– Flexible

– Familiar

Analytic tools: more rigor, less generality

Alternatives:

Experiment Metadata

Page 41: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Usability and OntologiesCan we at least get the terms right?

Maguire et al. 2012

Page 42: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Beyond Autocomplete

● Substring autocomplete – state of art for finding terms in biomedical ontologies

● Cons: no context, room for specialization or generalization?

● Can we build a tool that will easily support both search and navigation?

Foundational Model Explorerhttp://fme.biostr.washington.edu/FME/index.html

Page 43: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Why not let the curators do it?

● Expense

● Accuracy

● Speed

● Scientists know their data best

Claim – To be sustainable, data sharing must be self-curated.

But.. why should anyone bother?

Page 44: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Scientific incentives...

Research Results

PapersFunding

Traditionally.... Where doesData sharing fit in?

Data Sharing

?

Page 45: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

How to promote sharing..

1. Better tools → reduced effort

Eannotation

< Ecollection

+ epsilon

2. Recognize effort: alternative models of academic credit

ImpactStory...

Altmetrics: Value all research products (H. Piwowar, Nature 1/9/13, doi:10.1038/493159a)

“For all new grant applications from 14 January, the US National Science Foundation (NSF) asks a principal investigator to list his or her research “products” rather than “publications” in the biographical sketch section.”

Page 46: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Related Efforts

● Genomics Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) – microbiome, expression, phenotype sharing

● LAMHDI/MONARCH – User tools for ontologically-driven discovery of model genes for human diseases

● Research Social Network Collaboration Finding tool – using VIVO and related networking tools to find collaborators..

Page 47: Adventures in Translational Bioinformatics

Texas A&M Health Sciences Center February 6, 2013Harry Hochheiser, [email protected]

Acknowledgments

FaceBase: Mary Marazita, Jeff Murray, Yang Chai, Bruce Arnonow, Kristin Artinger, Teri Beaty, Jim Brinkley, David Clouthier, Michael Cunningham, Michael Dixon, Leah Rae Donahue, Scott E. Fraser, Benedikt Hallgrimsson, Junichi Iwata, Ophir Klein, Stephen Murray, Fernando Pardo-Manuel de Villena, John Postlethwait, Steven Potter, Lina Shapiro, Richard Spritz, Axel Visel, Seth Weinberg, Paul Trainor

U. Pittsburgh: Mike Becich, Becky Boes, Chuck Borromeo, Lance Kennelty, Annette Krag-Jensen, Tom Maher, Johnson Paul, Linda Schmandt, Shiyi Shen, Bill Shirey, Cristy Spino, Mike Stefanko, Justin Stickel

OCDM :James Brinkley; Jose Leonardo Mejino; Landon Detwiler; Ravensara Travillian; Melissa Clarkson; Timothy Cox; Carrie Heike; Michael Cunningham; Linda Shapiro

NIDCR: Steven Scholnick, Emily Harris, Jeannine Helm, Nadya Lumelsky, Lillian Shum

Support: NIH Grants U01 DE020057, 3U01DE020050-03S1