quantitative metagenomics

34
Quantitative Metagenomics Lea Benedicte Skov Hansen, PhD NGS Course 13 th of June 2016

Upload: haquynh

Post on 05-Jan-2017

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Quantitative Metagenomics

Quantitative Metagenomics

Lea Benedicte Skov Hansen, PhD NGS Course 13th of June 2016

Page 2: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 2 DTU Sytems Biology, Technical University of Denmark

Exercise

Page 3: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 3 DTU Sytems Biology, Technical University of Denmark

Exercise

•  Metagenome assembly –  Preassembled with two methods:

•  Soap •  Meta Velvet

–  Contig coverage –  Assembly statistics

Page 4: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 4 DTU Sytems Biology, Technical University of Denmark

Exercise

•  Metagenome assembly –  Preassembled with two methods:

•  Soap •  Meta Velvet

–  Contig coverage –  Assembly statistics

•  Gene prediction –  Prodigal –  Gene clustering based on similarity –  Gene catalogue

Page 5: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 5 DTU Sytems Biology, Technical University of Denmark

Exercise

•  Metagenome assembly –  Preassembled with two methods:

•  Soap •  Meta Velvet

–  Contig coverage –  Assembly statistics

•  Gene prediction –  Prodigal –  Gene clustering based on similarity –  Gene catalogue

•  Gene abundance matrix –  Align reads to gene catalogue with bwa –  Count number of reads mapping to a

gene – samtools

Page 6: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 6 DTU Sytems Biology, Technical University of Denmark

Exercise 2

•  Metagenome assembly –  Preassembled with two methods:

•  Soap •  Meta Velvet

–  Contig coverage –  Assembly statistics

•  Gene prediction –  Prodigal –  Gene clustering based on similarity –  Gene catalogue

•  Gene abundance matrix –  Align reads to gene catalogue with bwa –  Count number of reads mapping to a

gene – samtools •  Taxonomic annotation of gene catalogue

–  Blast gene catalogue •  NCBI Bacterial Genomes •  373 additional genomes

–  Rearranging gene abundance to taxonomic abundance

Page 7: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 7 DTU Sytems Biology, Technical University of Denmark

Ecology

Is the scientific analysis and study of interactions among organisms and their environment, such as the interactions organisms have with each other and with their abiotic environment.

Page 8: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 8 DTU Sytems Biology, Technical University of Denmark

Nothing new – except the technology

Classical measures • Abundance • Diversity • Richness

Page 9: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 9 DTU Sytems Biology, Technical University of Denmark

Abundance (Counts)

Page 10: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 10 DTU Sytems Biology, Technical University of Denmark

Abundance (Count)

Lion 64 Zebra 128 Giraffe 64 leopard 64 rhinoceros 64 hippopotamus 128 gazelle 128 elephant 64 monkey 9

Page 11: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 11 DTU Sytems Biology, Technical University of Denmark

Richness

Lion 64 Zebra 128 Giraffe 64 Leopard 64 Rhinoceros 64 Hippopotamus 128 Gazelle 128 Elephant 64 Monkey 9

9 observed species

Page 12: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 12 DTU Sytems Biology, Technical University of Denmark

Richness

Rarefaction curves

Page 13: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 13 DTU Sytems Biology, Technical University of Denmark

Richness

Rarefaction curves

Page 14: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 14 DTU Sytems Biology, Technical University of Denmark

Richness

Lion 1 Zebra 2 Giraffe 1 Leopard 1 Rhinoceros 1 Hippopotamus 2 Gazelle 2 Elephant 1 Monkey 0

Species richness estimators: Chao1 index = Sobs + f12/(2f2) Sobs = observed species f1 = species observed once f2 = species observed twice

8 observed species

Chao1 index = 8 + 52/(2*3) = 12.17

Page 15: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 15 DTU Sytems Biology, Technical University of Denmark

Evenness

Lion 1 1 Zebra 2 1 Giraffe 1 8 Leopard 1 1 Rhinoceros 1 1 Hippopotamus 2 1 Gazelle 2 1 Elephant 1 1 Monkey 0 0

Page 16: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 16 DTU Sytems Biology, Technical University of Denmark

Alpha Diversity

Richness Evenness Richness: s1 = s2

Lion 1 1 Zebra 2 1 Giraffe 1 8 Leopard 1 1 Rhinoceros 1 1 Hippopotamus 2 1 Gazelle 2 1 Elephant 1 1 Monkey 0 0

Page 17: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 17 DTU Sytems Biology, Technical University of Denmark

Alpha Diversity

Richness: s1 = s2 Evenness s1 ≠ s2

Lion 1 1 Zebra 2 1 Giraffe 1 8 Leopard 1 1 Rhinoceros 1 1 Hippopotamus 2 1 Gazelle 2 1 Elephant 1 1 Monkey 0 0

Page 18: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 18 DTU Sytems Biology, Technical University of Denmark

Alpha Diversity

Shannon index

H = Σ i=1

R

pi ln pi

H = Shannon index p = count of species i / total counts R = observed species

Lion 1 1 Zebra 2 1 Giraffe 1 8 Leopard 1 1 Rhinoceros 1 1 Hippopotamus 2 1 Gazelle 2 1 Elephant 1 1 Monkey 0 0

Page 19: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 19 DTU Sytems Biology, Technical University of Denmark

Alpha Diversity

Lion 1 1 Zebra 2 1 Giraffe 1 8 Leopard 1 1 Rhinoceros 1 1 Hippopotamus 2 1 Gazelle 2 1 Elephant 1 1 Monkey 0 0

Shannon index

H = Σ i=1

R

pi ln pi

Hs1 = 2.02 Hs2 = 1.60

p1 = p2 = p3 .. pR

H = ln(R) = 2.08

Page 20: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 20 DTU Sytems Biology, Technical University of Denmark

Sample Sizes

Page 21: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 21 DTU Sytems Biology, Technical University of Denmark

Sample Sizes

Accounting for different sample sizes:

•  Normalize to sample size

•  Rarefy samples

•  Statistical model of sample variance

Page 22: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 22 DTU Sytems Biology, Technical University of Denmark

Sample Sizes

Lion 64 1 Zebra 128 2 Giraffe 64 1 Leopard 64 1 Rhinoceros 64 1 Hippopotamus 128 2 Gazelle 128 2 Elephant 64 1 Monkey 9 0 Total 713 11

Normalize to library size: Norm = ni/ntot

Lion 8.98 9.09 Zebra 17.95 18.18 Giraffe 8.98 9.09 Leopard 8.98 9.09 Rhinoceros 8.98 9.09 Hippopotamus 17.95 18.18 Gazelle 17.95 18.18 Elephant 8.98 9.09 Monkey 1.26 0 Total 100 100

Page 23: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 23 DTU Sytems Biology, Technical University of Denmark

Sample Sizes

Rarefying to smaller library size:

Lion 64 1 Zebra 128 2 Giraffe 64 1 Leopard 64 1 Rhinoceros 64 1 Hippopotamus 128 2 Gazelle 128 2 Elephant 64 1 Monkey 9 0 Total 713 11

Lion 2 1 Zebra 3 2 Giraffe 0 1 Leopard 1 1 Rhinoceros 0 1 Hippopotamus 3 2 Gazelle 1 2 Elephant 0 0 Monkey 0 0 Total 10 10

Page 24: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 24 DTU Sytems Biology, Technical University of Denmark

Sample sizes

Normalization and downsizing does not account for heteroscedasticity! Statistically modeled variance: •  DESeq2 •  EdgeR

Page 25: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 25 DTU Sytems Biology, Technical University of Denmark

Beta-Diversity

Diversity between communities!

Page 26: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 26 DTU Sytems Biology, Technical University of Denmark

Beta-Diversity

Lion 0 2 Zebra 3 2 Giraffe 0 4 Leopard 0 2 Rhinoceros 1 2 Hippodrome 4 0 Gazelle 0 1 Elephant 1 0 Total 9 13

Page 27: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 27 DTU Sytems Biology, Technical University of Denmark

Beta-Diversity Lion 0 2 Zebra 3 2 Giraffe 0 4 Leopard 0 2 Rhinoceros 1 2 Hippodrome 4 0 Gazelle 0 1 Elephant 1 0 Total 9 13

Bray-Curtis dissimilarity metric

Bij = 1 - 2Cij / (Si + Sj) C = sum of the lowest count of common species S = total count of the sample Bs1s2 = 1 – 2*3 / 22 = 0.73 - Dissimilar C = 3 Ss1 + Ss2 = 22

0 ≤ B ≤ 1

Page 28: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 28 DTU Sytems Biology, Technical University of Denmark

Beta-Diversity

Other similarity metrics •  Eucledian distance

•  Jensen Shannon Distance

M=(x+y)/2

Page 29: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 29 DTU Sytems Biology, Technical University of Denmark

Beta-Diversity

Distance matrix

Page 30: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 30 DTU Sytems Biology, Technical University of Denmark

Diversity - example

Page 31: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 31 DTU Sytems Biology, Technical University of Denmark

Diversity - example

Page 32: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 32 DTU Sytems Biology, Technical University of Denmark

Diversity - example

Page 33: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 33 DTU Sytems Biology, Technical University of Denmark

Diversity - example

Page 34: Quantitative Metagenomics

13/06/2016 Quantitative Metagenomics 34 DTU Sytems Biology, Technical University of Denmark

Hands on!