development of otu analysis in nutrigen

18
Development of OTU Analysis in NutriGen Integrating OTU data with other NutriGen Data Mateen Shaikh and Joseph Beyene McMaster University December 19 2014 Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 1 / 18

Upload: others

Post on 16-Oct-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of OTU Analysis in NutriGen

Development of OTU Analysis in NutriGenIntegrating OTU data with other NutriGen Data

Mateen Shaikh and Joseph Beyene

McMaster University

December 19 2014

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 1 / 18

Page 2: Development of OTU Analysis in NutriGen

TOC

BackgroundThe Data’s ContextInvestigations

Differential Abundance TestsPermutation

Various Linear ModelsCandidatesExemplifying ResultsStatistical issues

Next Steps. . . In this framework

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 2 / 18

Page 3: Development of OTU Analysis in NutriGen

Background The Data’s Context

I ≈ 250 infants contributed microbiome samples from CHILD(processed)

I ≈ 180 infants contributed microbiome samples from START(processing)

I Methods developed from the START samples

I Continuing from the work already complete by Mike Surette’s Lab (JSand MS)

I Picking up at the OTU table

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 3 / 18

Page 4: Development of OTU Analysis in NutriGen

Background Investigations

Goals

Determine relationships between the microbiome and

I Changes in breastfeeding

I Mother’s GDM

I Diet

I Other health outcomes (adiposity, asthma, etc.)

I Introduction of (types of) foods

I Integration with other large data types (genotype, methylation,expression)

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 4 / 18

Page 5: Development of OTU Analysis in NutriGen

Background Investigations

Sample from CHILD

SA10 SA11 SA12 SA13 SA14 SA15 SA16 SA17 SA18 SA19

1 8933 1967 8145 1423 4035 4468 5174 12909 1763 1046

2 2321 2148 3708 1655 226 6007 2190 5276 1529 1284

3 352 88 135 28 867 2452 4069 9971 87 2381

4 1 2 2 2274 1 1 9198 3 2473 0

5 72 114 159 165 0 1262 360 63 0 95

6 0 81 0 0 0 0 1353 2 0 0

7 0 13 0 0 0 0 0 1 0 0

8 0 2 0 1 0 1 79 4 0 0

9 0 0 0 0 0 0 0 0 0 0

10 0 0 1 0 0 0 0 0 0 0

11 0 0 0 0 0 0 0 0 0 0

12 0 0 0 0 0 0 0 0 0 0

13 0 0 0 0 0 0 0 0 0 0

> mean(otutable==0)

[1] 0.9776366

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 5 / 18

Page 6: Development of OTU Analysis in NutriGen

Differential Abundance Tests Permutation

Permutation Tests

I Simple for a few categorical variables

I Prefer a quantile-based measure, because of heavy positive skew, butchoosing a quantile (like the median) can be problematic

I Fairly conservative but provides p-values nonetheless

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 6 / 18

Page 7: Development of OTU Analysis in NutriGen

Differential Abundance Tests Permutation

GDMotu# otu pval

17 Bacteroidaceae; g Bacteroides 0.00591 ostridiales; f Lachnospiraceae 0.020

564 ostridiales; f Lachnospiraceae 0.02446 ostridiales; f Lachnospiraceae 0.02518 Lachnospiraceae; g Lachnospira 0.030

313 acteriaceae; g Bifidobacterium 0.032248 ostridiales; f Lachnospiraceae 0.040

76 ococcaceae; g Faecalibacterium 0.046154 Lachnospiraceae; g Lachnospira 0.049408 ostridiales; f Lachnospiraceae 0.051

87 ostridiales; f Lachnospiraceae 0.060153 ostridiales; f Lachnospiraceae 0.068207 acteriaceae; g Bifidobacterium 0.069464 ctinomycetaceae; g Actinomyces 0.071

10 acteriaceae; g Bifidobacterium 0.072263 ostridiales; f Lachnospiraceae 0.075

45 acteriaceae; g Bifidobacterium 0.081410 ostridiales; f Lachnospiraceae 0.082

82 teriales; f Bifidobacteriaceae 0.0836 acteriaceae; g Bifidobacterium 0.085

109 c Clostridia; o Clostridiales 0.09824 omonadaceae; g Parabacteroides 0.103

7 ococcaceae; g Faecalibacterium 0.104397 inobacteria; o Actinomycetales 0.113264 ostridiales; f Lachnospiraceae 0.114220 ococcaceae; g Faecalibacterium 0.121

70 eriales; f Alcaligenaceae; g 0.122100 ostridiales; f Lachnospiraceae 0.126368 Root; p Firmicutes 0.136

12 acteriaceae; g Bifidobacterium 0.146

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 7 / 18

Page 8: Development of OTU Analysis in NutriGen

Differential Abundance Tests Permutation

Still Breast Feedingotu# otu pval

21 Veillonellaceae; g Veillonella <2e-1664 ostridiales; f Veillonellaceae <2e-1645 acteriaceae; g Bifidobacterium 0.001

112 c Clostridia; o Clostridiales 0.00313 acteriaceae; g Bifidobacterium 0.00411 erobacteriaceae; g Escherichia 0.00635 nterococcaceae; g Enterococcus 0.00638 Veillonellaceae; g Veillonella 0.00840 teriales; f Enterobacteriaceae 0.01812 acteriaceae; g Bifidobacterium 0.02441 Veillonellaceae; g Dialister 0.036

239 ococcaceae; g Faecalibacterium 0.07176 ococcaceae; g Faecalibacterium 0.133

6 acteriaceae; g Bifidobacterium 0.13486 acteriaceae; g Bifidobacterium 0.167

7 ococcaceae; g Faecalibacterium 0.1744 acteriaceae; g Bifidobacterium 0.205

176 nterococcaceae; g Enterococcus 0.338220 ococcaceae; g Faecalibacterium 0.368109 c Clostridia; o Clostridiales 0.376

92 s; f Micrococcaceae; g Rothia 0.41743 ostridiales; f Ruminococcaceae 0.43436 Bacteroidaceae; g Bacteroides 0.43922 ucomicrobiaceae; g Akkermansia 0.53347 eptococcaceae; g Streptococcus 0.60518 Lachnospiraceae; g Lachnospira 0.78017 Bacteroidaceae; g Bacteroides 0.864

1 ostridiales; f Lachnospiraceae 1.0002 f Lachnospiraceae; g Blautia 1.0003 achnospiraceae; g Ruminococcus 1.000

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 8 / 18

Page 9: Development of OTU Analysis in NutriGen

Differential Abundance Tests Permutation

Delivery (V/CS)otu# otu pval

186 Root; p Firmicutes 0.02523 Bacteroidaceae; g Bacteroides 0.05736 Bacteroidaceae; g Bacteroides 0.06614 ostridiales; f Lachnospiraceae 0.080

105 Ruminococcaceae; g Clostridium 0.08820 Bacteroidaceae; g Bacteroides 0.119

145 c Clostridia; o Clostridiales 0.12372 Bacteroidaceae; g Bacteroides 0.12417 Bacteroidaceae; g Bacteroides 0.203

320 c Clostridia; o Clostridiales 0.212121 Clostridiaceae; g Clostridium 0.256

32 es; f Erysipelotrichaceae; g 0.25742 ; f Clostridiaceae; g Sarcina 0.337

217 tinobacteria; c Actinobacteria 0.4054 acteriaceae; g Bifidobacterium 0.466

16 riobacteriaceae; g Collinsella 0.49053 f Lachnospiraceae; g Blautia 0.53221 Veillonellaceae; g Veillonella 0.552

1 ostridiales; f Lachnospiraceae 1.0002 f Lachnospiraceae; g Blautia 1.0003 achnospiraceae; g Ruminococcus 1.0005 eptococcaceae; g Streptococcus 1.0006 acteriaceae; g Bifidobacterium 1.0007 ococcaceae; g Faecalibacterium 1.0008 lostridiales; f Clostridiaceae 1.0009 lostridiales; f Clostridiaceae 1.000

10 acteriaceae; g Bifidobacterium 1.00011 erobacteriaceae; g Escherichia 1.00012 acteriaceae; g Bifidobacterium 1.00013 acteriaceae; g Bifidobacterium 1.000

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 9 / 18

Page 10: Development of OTU Analysis in NutriGen

Various Linear Models Candidates

I Poisson regression for count variables

I Issues with model assumptions and fit

I Some strategies to mitigate these

I Handles more complex relationships (non-binary independentvariables)

I p-values can be misleadingly low!

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 10 / 18

Page 11: Development of OTU Analysis in NutriGen

Various Linear Models Candidates

`````````````̀OverdispersionZeroes

GLM Hurdle Zero-Inflated

Poisson • • •Negative Binomial • • •

I All models use canonical link

I When variable is binary, results are comparable to permutation tests

I Run into problems fitting the more flexible models

I Issue with quality of fit on all models (assumption violations, somegross)

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 11 / 18

Page 12: Development of OTU Analysis in NutriGen

Various Linear Models Candidates

Model Selection

QuantitativelyI Two criteria (and problems):

I Choose between models (different methods)I Quality of model (what if all available models fit poorly)

I For the first, various principle-of-parsimony heuristics are applicable

I For the second, deviance might work

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 12 / 18

Page 13: Development of OTU Analysis in NutriGen

Various Linear Models Exemplifying Results

p−values from poisson regression

p−values

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

050

015

00

Log deviance ratios from poisson regression

Log−deviance ratios

Fre

quen

cy

−2 0 2 4 6 8 10

010

0020

00

Deviance−based areas from poisson regression

Tail areas

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

020

00otu# otu pval

1 ostridiales; f Lachnospiraceae < 2.22e-162 f Lachnospiraceae; g Blautia < 2.22e-163 achnospiraceae; g Ruminococcus < 2.22e-164 acteriaceae; g Bifidobacterium < 2.22e-165 eptococcaceae; g Streptococcus < 2.22e-166 acteriaceae; g Bifidobacterium < 2.22e-168 lostridiales; f Clostridiaceae < 2.22e-169 lostridiales; f Clostridiaceae < 2.22e-16

10 acteriaceae; g Bifidobacterium < 2.22e-1611 erobacteriaceae; g Escherichia < 2.22e-1613 acteriaceae; g Bifidobacterium < 2.22e-1614 ostridiales; f Lachnospiraceae < 2.22e-1615 ostridiales; f Lachnospiraceae < 2.22e-1616 riobacteriaceae; g Collinsella < 2.22e-1617 Bacteroidaceae; g Bacteroides < 2.22e-1618 Lachnospiraceae; g Lachnospira < 2.22e-1619 ococcaceae; g Faecalibacterium < 2.22e-1621 Veillonellaceae; g Veillonella < 2.22e-1622 ucomicrobiaceae; g Akkermansia < 2.22e-1623 Bacteroidaceae; g Bacteroides < 2.22e-1624 omonadaceae; g Parabacteroides < 2.22e-1625 ostridiales; f Lachnospiraceae < 2.22e-1626 ostridiales; f Lachnospiraceae < 2.22e-1627 acteriaceae; g Bifidobacterium < 2.22e-1628 Bacteroidaceae; g Bacteroides < 2.22e-1630 treptococcaceae; g Lactococcus < 2.22e-1631 ipelotrichaceae; g Clostridium < 2.22e-1632 es; f Erysipelotrichaceae; g < 2.22e-1633 uminococcaceae; g Ruminococcus < 2.22e-1634 bacillales; f Streptococcaceae < 2.22e-16

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 13 / 18

Page 14: Development of OTU Analysis in NutriGen

Various Linear Models Exemplifying Results

p−values from NB regression

p−values

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

050

015

00

Log deviance ratios from NB regression

Log−deviance ratios

Fre

quen

cy

−5 0 5

010

0025

00

Deviance−based areas from NB regression

Tail areas

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

020

00otu# otu pval

116 cteriales; f Coriobacteriaceae < 2.22e-16183 obacteriaceae; g Adlercreutzia < 2.22e-16470 ostridiales; f Lachnospiraceae < 2.22e-16500 lonellaceae; g Acidaminococcus < 2.22e-16

57 tobacillaceae; g Lactobacillus < 2.22e-16428 Veillonellaceae; g Dialister < 2.22e-16449 Coriobacteriaceae; g Slackia < 2.22e-16324 tinobacteria; c Actinobacteria < 2.22e-16394 cteriales; f Coriobacteriaceae < 2.22e-16373 Moraxellaceae; g Acinetobacter 1.1682e-15151 ostridiales; f Lachnospiraceae 8.0287e-14743 teriales; f Enterobacteriaceae 2.5791e-11335 ipelotrichaceae; g Clostridium 3.1231e-11115 tobacillaceae; g Lactobacillus 1.9556e-10293 uminococcaceae; g Ruminococcus 3.7384e-10299 Veillonellaceae; g Megasphaera 5.6156e-10

68 tinobacteria; c Actinobacteria 8.3059e-10533 Root; p Firmicutes 4.3500e-09121 Clostridiaceae; g Clostridium 5.2253e-09

8 lostridiales; f Clostridiaceae 1.1172e-08132 c Clostridia; o Clostridiales 2.9136e-08

64 ostridiales; f Veillonellaceae 9.2874e-0894 c Clostridia; o Clostridiales 1.1626e-07

301 ostridiales; f Lachnospiraceae 1.3031e-0753 f Lachnospiraceae; g Blautia 1.5183e-07

238 ostridiales; f Lachnospiraceae 2.3633e-07791 Prevotellaceae; g Prevotella 2.4478e-07

51 omonadaceae; g Parabacteroides 2.4608e-07693 es; f Erysipelotrichaceae; g 2.5293e-07106 tobacillaceae; g Lactobacillus 2.7752e-07

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 14 / 18

Page 15: Development of OTU Analysis in NutriGen

Various Linear Models Statistical issues

I Traditional diagnostics would make the NB and its variants appealing

I Concerning distributional issues with OTUs

I Example of a significant OTU: NB(µ = 21.39, θ = 0.0017)

bf 0 5315

91 1

¬bf 0 1 9

117 1 1

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 15 / 18

Page 16: Development of OTU Analysis in NutriGen

Various Linear Models Statistical issues

Picking significant OTUs is highly characteristic of individual methods(inflated false positives)

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 16 / 18

Page 17: Development of OTU Analysis in NutriGen

Next Steps . . . In this framework

I Poorly fitting models may benefit regression from finite mixtures ofpoisson/nb to split the extremes (group starvation is an issue)

I Adjustments by cohort when START arrives

I For variables with ordinality, apply model selection among thewell-fitting models components.

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 17 / 18

Page 18: Development of OTU Analysis in NutriGen

Fin

Mateen and Joseph (McMaster) Development of OTU Analysis in NutriGen December 19 2014 18 / 18