multivariatedata analysisin omics research · multivariatedata analysisin omics research...
TRANSCRIPT
![Page 1: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/1.jpg)
Multivariate DataAnalysis inOmicsResearch
Diverging AlternativeSplicing FingerprintsIdentified inThoracic Aortic Aneurysm
SanelaKjellqvist,PhDWABIRNAseq course
2017-11-08
![Page 2: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/2.jpg)
Outline• Why multivariate dataanalysis?• Multivariate statistics
– Differentanalyses– Datapreprocessing
• Alternativesplicing inthoracic aortic aneurysm– Thoracic aortic aneurysm– Study setup– Aim of thestudy– Results– Summary
• Today’s exercise
![Page 3: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/3.jpg)
WHYMULTIVARIATE DATAANALYSIS?
![Page 4: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/4.jpg)
DevelopmentofClassicalStatistics–1930s
• Multipleregression• Canonicalcorrelation• Lineardiscriminantanalysis• Analysisofvariance
Assumptions:
• IndependentXvariables
• Manymoreobservationsthanvariables
• RegressionanalysisoneYatatime
• Nomissingdata
N
K
Tablesarelongandlean
![Page 5: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/5.jpg)
Today’sdata• RNASeq,Array,LC-MS/MS,GC/MSor
NMRdata
• Problems– Manyvariables– Fewobservations– Noisydata– Missingdata– Multipleresponses
• Implications– Highdegreeofcorrelation– Difficulttoanalysewith
conventionalmethods
• Data¹ Information– Needwaystoextractinformation
fromthedata– Needreliable,predictive
information– Ignorerandomvariation(noise)
N
K
![Page 6: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/6.jpg)
PoorMethodsofDataAnalysis
X1 Y1 Y2Y3X2 X3
• Plotpairsofvariables– Tedious,impractical– Riskofspuriouscorrelations– Riskofmissinginformation
• SelectafewvariablesanduseMLR– Throwingawayinformation– Assumesno‘noise’inX– OneYatatime
![Page 7: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/7.jpg)
ABetterWay...• MultivariateanalysisbyProjection
– LooksatALLthevariablestogether– Avoidslossofinformation– Findsunderlyingtrends=“latentvariables”– Morestablemodels
![Page 8: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/8.jpg)
FundamentalDataAnalysisObjectives
Overview Discrimination Regression
TrendsOutliersQuality ControlBiological DiversityPatient Monitoring
Discriminating between groupsBiomarker candidatesComparing studies or instrumentation
Comparing blocks of omics dataMetab vs Proteomic vsGenomicOmic vs medicalPrediction
![Page 9: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/9.jpg)
MULTIVARIATE STATISTICS
![Page 10: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/10.jpg)
Differentmethods• Principalcomponentanalysis(PCA)• Partialleast squares tolatentstructuresanalysis(PLS)• Orthogonalpartialleastsquarestolatentstructuresanalysis(OPLS)
• PLS-DA• OPLS-DA• K-meansclustering• Hierarchical clustering• Biplotanalysis• Canonicalcorrelationanalysis
![Page 11: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/11.jpg)
What isaprojection?
Principalcomponentanalysis(PCA) • Algebraically
–Summarizestheinformationintheobservationsasafewnew(latent)variables
• Geometrically– TheswarmofpointsinaKdimensionalspace(K=numberofvariables)isapproximatedbya(hyper)planeandthepointsareprojectedonthatplane.
![Page 12: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/12.jpg)
PCA- GeometricInterpretation
x2
x3
x1
t1
Fit first principal component (line describing maximum variation)
t2
Add second component (accounts for next largest amount of variation) and is at right angles to first - orthogonal
Each component goes through origin
12
![Page 13: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/13.jpg)
PCA- GeometricInterpretation
x2
x3
x1
X
Points are projected down onto a plane with co-ordinates t1, t2
Comp 1
t1
Comp 2
t2
K
N
“Distance to Model”
13
![Page 14: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/14.jpg)
Loadings
x2
x3
x1
How do the principal components relate to the original variables?
Look at the angles betweenPCs and variable axes
t1 t2
XK
N
Comp 1
α2
α3
α1
14
![Page 15: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/15.jpg)
Loadings
x2
x3
x1
Take cos(α) for each axis
Loadings vector p’ - one for each principal component
One value per variable
Comp 1
t1 t2
p’1
α2
α3
α1
cos(α1)
cos(α2)
cos(α3)
XK
N
15
![Page 16: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/16.jpg)
Principalcomponentanalysis(PCA)• PCAcompresstheX datablockintoA numberoforthogonal
components• Variationseeninthescorevectort canbeinterpretedfrom
thecorrespondingloadingvectorp
PT
X1…A
1…A
T
X = t1p1T+ t2p2
T +…+tApAT +E = TPT + EPCAModel
PCA
![Page 17: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/17.jpg)
Recognition of molecular quasi-species(evolving units)inenzyme evolutionbyPCA
Emrén,L.,Kurtovic,S.,Runarsdottir,A.,Larsson,A-K.,&Mannervik,B.(2006)ProcNatlAcadSciUSA,103,10866-10870Kurtovic,S,&MannervikB(2009)Biochemistry,48,9330-9339
![Page 18: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/18.jpg)
Orthogonal partialleast squares to latentstructure –Discriminant analysis (OPLS-DA)
![Page 19: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/19.jpg)
Orthogonal partialleast squares to latentstructure –Discriminant analysis (OPLS-DA)
X OPLSY
Class 1
Class 2
![Page 20: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/20.jpg)
OPLSwithsingleY/modellingandprediction
p1T
XTO
POT
y
’Y-predictive’’Y-orthogonal’
1 11…
1… 1 1q1T
t1 u1OPLS
X = t1p1T + TOPO
T + EOPLSModel Y = t1qT
1 + F
![Page 21: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/21.jpg)
DataPreprocessing – Scaling• PCAandothermethodsarescaledependent
– Isthesizeofavariableimportant?
• Scalingweightis1/SDforeachvariablei.e.divideeachvariablebyitsstandarddeviation– UnitVarianceScaling
• Varianceofscaledvariables=1• Manyotherkindsofscalingexist
Xws
1/SD
UV scaling
![Page 22: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/22.jpg)
Cross-Validation• DataaredividedintoGgroups(defaultin
SIMCA-Pis7)andamodelisgeneratedforthedatadevoidofonegroup
• ThedeletedgroupispredictedbythemodelÞpartialPRESS(PredictiveResidualSumofSquares)
• ThisisrepeatedGtimesandthenallpartialPRESSvaluesaresummedtoformoverallPRESS
• Ifanewcomponentenhancesthepredictivepowercomparedwith thepreviousPRESSvaluethen thenewcomponentisretained
• PCA cross-validation is done in two phases and several deletion rounds: – first removal of
observations (rows)– then removal of variables
(columns)
22
![Page 23: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/23.jpg)
ModelDiagnostics• FitorR2
– ResidualsofmatrixEpooledcolumn-wise– Explainedvariation– Forwholemodelorindividualvariables
– RSS=Σ (observed- fitted)2
– R2 =1- RSS/SSX
• PredictiveAbilityorQ2
– Leaveout1/7th datainturn– ‘CrossValidation’– Predicteachmissingblockofdatainturn– Sumtheresults
– PRESS=Σ (observed- predicted)2
– Q2 =1– PRESS/SSX
Fit
PredictionStopwhenQ2 startstodrop
23
![Page 24: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/24.jpg)
ALTERNATIVESPLICING INTHORACIC AORTIC ANEURYSM
Kurtovic,Paloschi,Folkersen,Gottfries,Franco-Cereceda,Eriksson(2011)MolecularMedicine,17;665-675
![Page 25: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/25.jpg)
Thoracic aortic aneurysm (TAA)
• Monogenic– Marfan syndrome– Loeys Dietz
• Aneurysm associated with bicuspid aortic valve (BAV)
• Idiopathic thoracic aortic aneurysm
![Page 26: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/26.jpg)
Outline of thestudy
• Biopsiesarecollectedfrombothnon-dilatedanddilatedaortaduringvalvereplacementsurgeryandreconstructionofthedilatedaortarespectively
• Mediafromascending aorta• RNA
– Affymetrix humanexon 1.0STmicroarrays (inthis study 81patients)
– RNAseq (30patients)
• Protein– HiRiEF iTRAQ LC-MS/MS– 2Dgelelectrophoresis followed by
iTRAQ LC-MS/MS
Non-dilated Dilated
![Page 27: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/27.jpg)
Aim of thestudy
• Alternativesplicing intransforminggrowthfactor-β(TGFβ)signalingpathway
• TGFβ pathway isknown to beimportant inaorticaneurysm
• Are there any alternatively spliced genesintheTGFβpathway?
• Isalternativesplicing animportant mechanism inthoracic aortic aneurysm (TAA)?
• How dowe analyze alternativesplicing?
![Page 28: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/28.jpg)
Affymetrix exon array design
PSR– probeselectionregion
Exons
Introns
![Page 29: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/29.jpg)
Preprocessing of data• Probe setcore level• Unique hybridization target• Robustmultichipaverage (RMA)normalized• Splice Indexcalculated (incase of exon level analysis)
i=exonj=samplek=genee=exon signalg=genesignal
• Unit variance scaled andmean centered datapriorto MVA
𝑛𝑖,𝑗 ,𝑘 =𝑒𝑖,𝑗 ,𝑘𝑔𝑗 ,𝑘
![Page 30: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/30.jpg)
AlternativesplicingpatternintheTGFβpathwayisdifferentbetweendilatedandnon-dilatedaorta
• TAVandBAVtogether• 81patientsincluded• 614exons included• Good model• Good separationbetween thetwo groups
Non-supervisedPCA SupervisedOPLS-DA
![Page 31: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/31.jpg)
AlternativesplicingpatternintheTGFβpathwayisdifferentbetweendilatedandnon-dilatedaorta
• Only TAVpatients• 29patientsincluded• 614exons included• Good model• Good separationbetween thetwo groups
Non-supervisedPCA SupervisedOPLS-DA
![Page 32: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/32.jpg)
AlternativesplicingpatternintheTGFβpathwayisdifferentbetweendilatedandnon-dilatedaorta
Non-supervisedPCA SupervisedOPLS-DA
• OnlyBAVpatients• 52patientsincluded• 614exonsincluded• Goodmodel• Goodseparationbetweenthetwogroups
![Page 33: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/33.jpg)
AlternativelysplicedexonsarepresentinbothTAVandBAVgroupsofpatients
![Page 34: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/34.jpg)
AlternativesplicinganalysisofallexonsinthehumangenomerevealstheimportanceofTGFβpathwayexons
![Page 35: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/35.jpg)
Geneexpressionpatternsofdifferentiallysplicedgenes
![Page 36: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/36.jpg)
Summary• TGFβ pathway exons clearly important according to anoverallexon
level analysis• Dilated andnon-dilated aortasshowdifferentalternativesplicing
patterns indilated andnon-dilated tissues with respect to TAVandBAVinTGFβ pathway
• Exons responsible forthediverging alternativesplicing fingerprints inTGFβ pathway identified
• Implies that dilatationinTAVhasdifferentunderlying molecularmechanisms compared to BAVpatients
• Newmethods foranalyzing array data
![Page 37: MultivariateData Analysisin Omics Research · MultivariateData Analysisin Omics Research DivergingAlternative SplicingFingerprints Identifiedin ThoracicAorticAneurysm Sanela Kjellqvist,](https://reader035.vdocuments.net/reader035/viewer/2022062920/5f020ccc7e708231d40253d3/html5/thumbnails/37.jpg)
Todayduringtheexercise
• PCAandOPLS-DA• Thoracicaorticaneurysmdataset• ExonlevelexpressionAffymetrix arrays• Comparetwodifferentphenotypesandsubphenotypes