march 4, 20101 visualization approaches for gene expression data matt hibbs assistant professor the...

34
March 4, 2010 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

Upload: todd-evans

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 1

Visualization Approaches forGene Expression Data

Matt HibbsAssistant Professor

The Jackson Laboratory

Page 2: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 2

Transcriptomics & Gene Expression

• Simultaneous measurement of transcription for the entire genome

• Useful for broad range of biological questions DNA

mRNA

Proteins

Ribosome

Transcription

Translation

Page 3: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 3

Outline• Technologies & Specific Concerns

– cDNA microarrays (2-color & 1-color arrays)– RNA-seq

• Normalization visualizations• Full data displays• Dimensionality reduction• Sequence-order displays• Comparative visualization• Future Directions

Page 4: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 4

Technology: 2-color cDNA Microarrays

Spot slide with known sequences

Add mRNA to slide for Hybridization

Scan hybridized array

reference mRNA test mRNA

add green dye add red dye

hybridizeA 1.5

B 0.8

C -1.2

D 0.1

A

CB

D

A

CB

D

A

CB

D

Page 5: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 5

Technology: 2-color cDNA Microarrays

Page 6: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 6

Technology: RNA-seq

Image from WikiMedia

Page 7: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 7

Normalization: MA-plot

• Need to account for intensity bias between channels (red/green, or mult. 1-color)

• MA-plot (also called RI-plot) shows relationship between ratio and intensity

Page 8: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 8

Normalization: Box-Whisker Quantile

• Quantile normalization often used to adjust for between chip variance

• Box-Whisker plots typically used to visualize the process

Page 9: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 9

Full Data Displays• Techniques to show all of the data at

once• Heat Maps

– Displays numerical values as colors– Good to see all data intuitively– Requires clustering to see patterns

• Parallel Coordinates– Line plots of high-dimensional data– Easy to see/select trends or patterns– Esp. good for course data (time, drug, etc.)

Page 10: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 10

Heat Maps

Under-Expressed Over-Expressed

ClusterRasterize

0 +3-3

Page 11: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 11

Heat Maps: Stats• Clustering important to see patterns

– Hierarchical, K-means, SOM, etc…– Choice of distance metric in addition to

method

• Match the visualization mapping to the statistics used for analysis– Coloration based on actual numbers

appropriate for Euclidian distance measures

– Centered or normalized measures should use corresponding colorings

Page 12: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 12

Heat Maps: Distance Metrics

Euclidean Distance

Pearson Correlation

Spearman Correlation

Page 13: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 13

Heat Maps: Stats

Data clustered using a rank-based statistic

lowest value highest value

Page 14: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 14

Heat Maps: Overview + Detail

Java TreeView, Saldanha et al.Data from Spellman et al., 1998

Page 15: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 15

Parallel Coordinates• View expression vectors as lines

– X-axis = conditions– Y-axis = value

Time Searcher, Hochheiser et al.

Page 16: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 16

Parallel Coordinates

Time Searcher, Hochheiser et al.

• Selection and Interaction methods can answer specific questions

• Brushing techniques to select patterns

• Cluttered displays for large datasets, limited number of conditions effectively shown

Page 17: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 17

Dimensionality Reduction• Project data from large, high

dimensional space to a smaller space (usually 2 or 3 D)

• Several techniques:– SVD & PCA– Multidimensional scaling

• Once projected into lower dimension, use standard 2D (or 3D) techniques

Page 18: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 18

Dimensionality Reduction

Page 19: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 19

Dimensionality Reduction: SVD

…Transform original data vectors into an orthogonal basis that captures decreasing amounts of variation

Page 20: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 20

Dimensionality Reduction: SVD

SVD

Page 21: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 21

SVD Example

G1

S

G2

M

M/G1

Legend

GeneVAnD, Hibbs et al.Data from Spellman et al., 1998

Page 22: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 22

Sequence-based Visualization• View data in chromosomal order

– Copy number variation & aneuploidies• common in cancers & other disorders

– Competitive Genomic Hybridization (CGH)

– mRNA sequencing (RNA-seq)– Borrows concepts from genome

browsers

Page 23: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 23

Sequence-based: CGH• Karyoscope plots

Java TreeView, Saldanha et al.

Page 24: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 24

Sequence-based: RNA-seq

IGV, http://www.broadinstitute.org/igv

Page 25: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 25

Comparative Visualization• Using multiple simultaneous

complementary views of data• Each scheme emphasizes different

aspects – use multiple to show overall picture

• Show multiple, related datasets to identify common and unique patterns

Page 26: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 26

Comparative Visualization: Single Dataset

MeV, Saeed et al.

Page 27: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 27

Comparative Visualization: Single Dataset

Spotfire

GeneSpring

Page 28: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 28

Comparative Visualization: Multi-dataset

Dendrogram

Heat Map Overview

HIDRA

Data from Spellman et al., 1998

Hibbs et al.

Page 29: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 29

Comparative Visualization: Multi-dataset

HIDRA

Selection

SynchronizedDetails

Data from Spellman et al., 1998

Hibbs et al.

Page 30: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 30

Comparative Visualization: Multi-dataset

HIDRA

Selection

Data from Spellman et al., 1998

Hibbs et al.

Page 31: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 31

Summary & Tools• R & bioconductor• Java TreeView (Saldanha, 2004)• Time Searcher (Hochheiser et al.,

2003)• Integrative Genomics Viewer (IGV;

www.broadinstitute.org/igv)• TIGR’s MultiExperiment Viewer (MeV;

Saeed et al., 2003)• HIDRA (Hibbs et al., 2007)

Page 32: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 32

Trends & Future Directions• Emphasis on usability and audience

– If a “wet bench” biologist can’t use it…

• Incorporate common statistical analysis techniques with visualizations– e.g. differential expression tests, GO enrichments,

etc.

• Isoforms and Splice variants• New user interaction schemes

– e.g. multi-touch interfaces, large-format displays

• Low level “systems analysis”– linking together multiple types of data into unified

displays

Page 33: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 33

Acknowledgements

• Hibbs Lab– Karen Dowell– Tongjun Gu– Al Simons

• Olga Troyanskaya Lab– Patrick Bradley– Maria Chikina– Yuanfang Guan

• Chad Myers• David Hess• Florian Markowetz• Edo Airoldi• Curtis Huttenhower

• Kai Li Lab– Grant Wallace

• Amy Caudy

• Maitreya Dunham

• Botstein, Kruglyak, Broach, Rose labs

• Kyuson Yun

• Carol Bult

Page 34: March 4, 20101 Visualization Approaches for Gene Expression Data Matt Hibbs Assistant Professor The Jackson Laboratory

March 4, 2010 34

The Center for Genome Dynamics at The Jackson Laboratory

www.genomedynamics.org

Investigators use computation, mathematical modeling and statistics, with a shared focus on the genetics of complex traits

Requires PhD (or equivalent) in quantitative field such as computer science, statistics, applied mathematics or in biological sciences with strong quantitative backgroundProgramming experience recommended

The Jackson Laboratory was voted #2 in a poll of postdocs conducted by The Scientist in 2009 and is an EOE/AA employer

Postdoctoral Opportunities Postdoctoral Opportunities inin

Computational & Systems Computational & Systems Biology Biology