multivarite and network tools for biological data analysis

23
Dmitry Grapov and Oliver Fiehn University of California, Davis Multivariate Analysis and Visualization Tools for Metabolomic Data

Upload: dmitry-grapov

Post on 16-Apr-2017

23.290 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Multivarite and network tools for biological data analysis

Dmitry Grapov and Oliver FiehnUniversity of California, Davis

Multivariate Analysis and Visualization Tools for

Metabolomic Data

Page 2: Multivarite and network tools for biological data analysis

State of the art facility producing massive amounts of biological data…

>20-30K samples/yr>200 studies

Page 3: Multivarite and network tools for biological data analysis

Sam

ple

Variable

Data Analysis and Visualization

Quality Assessment• use replicated mesurements

and/or internal standards to estimate analytical variance

Statistical and Multivariate• use the experimental design

to test hypotheses and/or identify trends in analytes

Functional• use statistical and multivariate

results to identify impacted biochemical domains

Network• integrate statistical and

multivariate results with the experimental design and analyte metadata

experimental design - organism, sex, age etc.analyte description and metadata- biochemical class, mass spectra, etc.

VariableSample

Page 4: Multivarite and network tools for biological data analysis

Sam

ple

Variable

Data Analysis and Visualization

Quality Assessment• use replicated mesurements

and/or internal standards to estimate analytical variance

Statistical and Multivariate• use the experimental design

to test hypotheses and/or identify trends in analytes

Functional• use statistical and multivariate

results to identify impacted biochemical domains

Network• integrate statistical and

multivariate results with the experimental design and analyte metadata

Network Mapping

experimental design - organism, sex, age etc.analyte description and metadata- biochemical class, mass spectra, etc.

VariableSample

Page 5: Multivarite and network tools for biological data analysis

Principal Component Analysis (PCA) of all analytes, showing QC sample scores

Data Quality AssessmentDrift in >400 replicated measurements across >100 analytical batches for a single analyte

Acquisition batch

Abun

danc

e QCs embedded among >5,5000 samples (1:10) collected over 1.5 yrs

If the biological effect size is less than the analytical variance

then the experiment will incorrectly yield insignificant results

Page 6: Multivarite and network tools for biological data analysis

Data Quality AssessmentAnalyte specific data quality

overviewSample specific normalization can be used to estimate and remove analytical variance

Raw Data Normalized Data

Normalizations need to be numerically and visually validated

log mean

low precision

%RS

D

high precision

SamplesQCs

Page 7: Multivarite and network tools for biological data analysis

Network Mapping

Ranked statistically significant differences within a a biochemical

context

Statistics

Multivariate

Context

++=

Statistical and Multivariate AnalysesGroup 1

Group 2

What analytes are different between the

two groups of samples?

Statistical

significant differences lacking rank and

context

t-Test

Multivariate

ranked differences lacking significance

and context

O-PLS-DA

Page 8: Multivarite and network tools for biological data analysis

Network Mapping

Statistics

Multivariate

Context

++=

Statistical and Multivariate AnalysesGroup 1

Group 2

What analytes are different between the

two groups of samples?

Statistical

t-Test

Multivariate

O-PLS-DA

To see the big picture it is necessary too view the data from multiple different angles

Page 9: Multivarite and network tools for biological data analysis

DeviumWebhttps://github.com/dgrapov/DeviumWeb

• visualization• statistics• clustering • PCA• O-PLS

Page 10: Multivarite and network tools for biological data analysis

DeviumWebhttps://github.com/dgrapov/DeviumWeb

• visualization• statistics• clustering • PCA• O-PLS

Page 11: Multivarite and network tools for biological data analysis

Functional Analysis

Nucl. Acids Res. (2008) 36 (suppl 2): W423-W426.doi: 10.1093/nar/gkn282

Identify changes or enrichment in biochemical domains

• decrease• increase

Page 12: Multivarite and network tools for biological data analysis

Functional Analysis: opportunity for ‘Omic integration

Use domain knowledge databases to integrate genomic, proteomic and metabolomic data

Current approaches can be limited to pathway level analyses

Page 13: Multivarite and network tools for biological data analysis

Networks

Biochemical•reaction•domain

Structural •molecular fingerprints• mass spectra

Empirical •correlation•partial correlation

BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99

Page 14: Multivarite and network tools for biological data analysis

Mapped Network

- displaying metabolic differences in control vs.

malignant lung tissue

Biochemical Relationships

http://www.genome.jp/dbget-bin/www_bget?rn:R00975

Page 15: Multivarite and network tools for biological data analysis

Structural Similarity

http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi

Page 16: Multivarite and network tools for biological data analysis

Empirical NetworksUse experiment specific or data driven relationships to gain novel insight

into biochemical relationshipsurea cycle

nucleotide

synthesis

protein

glycosylation

Page 17: Multivarite and network tools for biological data analysis

Mass Spectral NetworksUse mass spectra as a proxy for structure to help make sense of

unknown compounds’ biochemical identities

Watrous J et al. PNAS 2012;109:E1743-E1752

unknown compounds are likely phytosterol esters

Page 18: Multivarite and network tools for biological data analysis

Mass Spectral NetworksUse mass spectra and empirical relationships to narrow down the

biochemical roles for unknown compounds

Rigorous chemical experiments identified the unknown compounds as partial derivatization products of glucose

Page 19: Multivarite and network tools for biological data analysis

MetaMapRhttps://github.com/dgrapov/MetaMapR

Page 20: Multivarite and network tools for biological data analysis
Page 21: Multivarite and network tools for biological data analysis

Analysis at the Metabolomic Scale and Beyond

pyruvate lactate

enzyme

gene Bgene A

Pathway independent metabolomic (known and unknown), proteomic and genomic data integration

Page 22: Multivarite and network tools for biological data analysis

Software and Resources•DeviumWeb- Dynamic multivariate data analysis and visualization platformurl: https://github.com/dgrapov/DeviumWeb

•imDEV- Microsoft Excel add-in for multivariate analysisurl: http://sourceforge.net/projects/imdev/

•MetaMapR: Network analysis tools for metabolomicsurl: https://github.com/dgrapov/MetaMapR

•TeachingDemos- Tutorials and demonstrations•url: http://sourceforge.net/projects/teachingdemos/?source=directory•url: https://github.com/dgrapov/TeachingDemos

•Data analysis case studies and Examplesurl: http://imdevsoftware.wordpress.com/

Page 23: Multivarite and network tools for biological data analysis

[email protected] metabolomics.ucdavis.edu

This research was supported in part by NIH 1 U24 DK097154