using network models for analyzing microbiome data · genomics: study of structure, function,...

22
Outline: Brief introduction to metagenomics Case study: Phyllosphere fungal networks, and Bare Patch Disease of Wheat Hands-on example Using network models for analyzing microbiome data By Ravin Poudel Ricardo I. Alcala-Briseño Garrett Lab ICPP, Boston, MA

Upload: others

Post on 27-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Outline:• Brief introduction to metagenomics

• Case study: Phyllosphere fungal networks, andBare Patch Disease of Wheat

• Hands-on example

Using network models for analyzing microbiome data

ByRavin Poudel

Ricardo I. Alcala-BriseñoGarrett Lab

ICPP, Boston, MA

Page 2: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Genomics: Study of structure, function, evolution, and mapping of genomes.

Metagenomics ( Environmental Genomics or Community Genomics) is the study of genomes recovered from environmental samples without the need for culturing them.

Culture-independent analysisv 16S ribosomal RNA (rRNA) sequencingv Whole genome sequencingv Metagenomics: No need for culturing/ use samples directly from environment

Brief introduction to metagenomics

Page 3: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Example: 16S ribosomal RNA (rRNA) sequencing

Sample

PCR amplification

Sequencing

Identification

OTUs / Taxa

Sam

ple

Count Data

Page 4: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Case study- metagenomics and network models • OTU/ taxon is represented as a node

• Links defines the relationship between two OTUs

• Various methods can be used to define the links

• Presence or absence of association (co-occurrence)

• Statistical approach to define the association • Such as correlation, proportionality

• Methods robust for metagenomics data ~ deal with compositional bias and spurious correlation.

• SparCC, SpiecEasi….. and so on…

• Mostly, to reduce compositional bias associated with data type.

OTU/Taxon

Link

Page 5: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Phytopathology, 2016

Page 6: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Pat

hog

en r

esp

onse

var

iabl

e

Host response variable

+

+

_

_

General Network Analysis

• OTU with sequence count

Host-Focused Network Analysis

• OTU with sequence count• Host associated variable

Pathogen-Focused Network Analysis

• OTU with sequence count• Pathogen fitness variable

Diseased-Focused Network Analysis

• OTU with sequence count• Pathogen fitness variable• Host associated variable

General frameworkGoal: To select potential candidate taxa based on – interactions and network attributes

6

Page 7: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Microbiome networks: A systems framework for identifying candidate microbial assemblages for disease management

• Node = Fungal taxon

• Links = association calculated using SparCC

• Fungal phyllosphere data (Jumpponen et. al., 2010)

Data Source:

Page 8: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Microbiome networks: A systems framework for identifying candidate microbial assemblages for disease management

General analysis

Aureobasidium

A

0

14

Aureobasidium

Module hubs Network hubs

Peripherals Connectors-2

-1

0

1

2

3

0.00 0.25 0.50 0.75 1.00Among-module connectivity

With

in-m

odul

e de

gree

B

Phosynthetic rate

C

Pathogen

D

Aureobasidium

A

0

14

Aureobasidium

Module hubs Network hubs

Peripherals Connectors-2

-1

0

1

2

3

0.00 0.25 0.50 0.75 1.00Among-module connectivity

With

in-m

odul

e de

gree

B

Phosynthetic rate

C

Pathogen

D

"rnetcarto"

Guimera et al. (2005).

Connector

Peripheral

Module hub

Network hub

Module

Page 9: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Microbiome networks: A systems framework for identifying candidate microbial assemblages for disease management

First degree neighborsSecond degree neighborsThird degree neighborsMixed Association

First degree neighborsSecond degree neighborsThird degree neighborsMixed Association

Host-focused analysis

Aureobasidium

A 014

Aureobasidium

Module hubs Network hubs

Peripherals Connectors-2

-1

0

1

2

3

0.00 0.25 0.50 0.75 1.00Among-module connectivity

With

in-m

odul

e de

gree

B

Photosynthetic rate

C

Pathogen

D

Goal à Biofertilizers

Blockingà Negatively Linked Nodes

Page 10: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Aureobasidium

A

0

14

Aureobasidium

Module hubs Network hubs

Peripherals Connectors-2

-1

0

1

2

3

0.00 0.25 0.50 0.75 1.00Among-module connectivity

With

in-m

odul

e de

gree

B

Phosynthetic rate

C

Pathogen

DMicrobiome networks: A systems framework for identifying candidate microbial assemblages for disease management

Pathogen-focused analysis

First degree neighborsSecond degree neighborsThird degree neighborsMixed Association

First degree neighborsSecond degree neighborsThird degree neighborsMixed Association

Blockingà Positively Linked Nodes

Goal à Biocontrols

Page 11: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Applied and Environmental Microbiology, 2013

Data source:

Disease-focused analysis

Berendsen, 2012

11

Page 12: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Disease

Enterobacteriaceae

SphingomonadaceaeSphingomonadaceae

Microbacteriaceae

Pseudomonadaceae

Micrococcaceae

Comamonadaceae

Chitinophagaceae

Chitinophagaceae

Streptomycetaceae

PseudomonadaceaeEnterobacteriaceae

Burkholderiaceae

Pseudomonadaceae

Acidobacteriaceae

Streptomycetaceae

Burkholderiaceae

Moraxellaceae

Microbacteriaceae

Nocardiaceae

Enterobacteriaceae

Micromonosporaceae

Chitinophagaceae

Nocardioidaceae

Caulobacteraceae

Flavobacteriaceae

Sphingomonadaceae

Bradyrhizobiaceae

Gemmatimonadetes

Caulobacteraceae

Sphingobacteriaceae

Bradyrhizobiaceae

Phyllobacteriaceae

Gemmatimonadetes

Mycobacteriaceae

Bradyrhizobiaceae

Micrococcaceae

Ellin329

Actinomycetales

Actinomycetales

Dermabacteraceae

Cytophagaceae

Oxalobacteraceae

Gaiellaceae

Comamonadaceae

Flavobacteriaceae

N1423WL

Sphingobacteriaceae

Pseudonocardiaceae

Sphingobacteriaceae

Mycobacteriaceae

Xanthomonadaceae

Frankiaceae

Propionibacteriaceae

Xanthomonadaceae

Actinomycetales

Comamonadaceae

Enterobacteriaceae

Ellin5301

Propionibacteriaceae

Gaiellaceae

Sphingomonadaceae

Glycomycetaceae

Pseudomonadaceae

Mycobacteriaceaea

Enterobaå◊cteriaceae

Nocardioidaceae

Rhizobiales

Pseudomonadaceae

Ellin6075

Frankiaceae

Comamonadaceae

SC-I-84

Burkholderiaceae

N1423WL

odGeodermatophilaceae

Propionibacteriaceae

Rhizobiales

Gemmatimonadetes

Acidobacteriaceae

Methylobacteriaceae

Acetobacteraceae

Rhodospirillaceae

Solirubrobacterales

Chitinophagaceae

Nocardiaceae

Hyphomicrobiaceae

Propionibacteriaceae

Geodermatophilaceae

Oxalobacteraceae

Comamonadaceae

N1423WL

Methylobacteriaceae

Caulobacteraceae

Burkholderiales

Sphingobacteriaceae

TM7-3

TM7-1

Pseudomonadaceae

GaiellaceaeEllin329

Sphingobacteriaceae

Rhodospirillaceae

Gemmatimonadetes

Micrococcaceae

Sphingobacteriaceae

Ellin329

Chitinophagaceae

Koribacteraceae

Acetobacteraceae

Moraxellaceae

Rhodospirillaceae

Acidobacteriaceae

Solirubrobacterales

Gemmatimonadetes

Burkholderiaceae

Caulobacteraceae

Micrococcaceae

Xanthomonadaceae

Microbacteriaceae

Xanthomonadaceae

SC-I-84

Nocardioidaceae

TM7-1

Solirubrobacterales

Solibacteraceae

Koribacteraceae

Hyphomicrobiaceae

Hyphomicrobiaceae

Acidobacteriaceae

Acidobacteriaceae

SC-I-84

Xanthomonadaceae

Flavobacteriaceae

Sphingomonadaceae

SinobacteraceaeAcidobacteriaceae

RhodospirillaceaeAcidobacteriaceae

Solibacteraceae

Solirubrobacterales

GemmatimonadetesEllin6537

WPS-2

Koribacteracea

Disease-focused analysis

• Blue link = positive association with Rhizoctonia solani

• Red link = negative association with Rhizoctonia solani

• Node size= Abundance

• Node associated with disease

• Node associated with healthy state

12

Page 13: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Poster Number: 431-P

Do grafting and rootstock genotype affect the rhizobiome? A study of tomato systems

Page 14: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

Hands-on experience on building network models withmicrobiome date

Let’s switch to RStudio

# Loading required packages library(igraph)library(Hmisc) library(Matrix)

Page 15: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

# Load the data with the OTU table: otudata.csvotu.table<-read.csv(file.choose(), header=T, row.names = 1)

# Read taxonomy file associated with OTU table into new object: otu_taxonomy.csv

tax<-read.csv(file.choose(),header=T, row.names = 1)

# Check how many OTUs we havedim(otu.table)

# Check for the decrease in the number of OTUsdim(otu.table.filter)

# Keep the OTUs with more than 10 countsotu.table.filter<-otu.table[,colSums(otu.table)>10]

Page 16: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

# Calculate pairwise correlation between OTUsotu.cor<-rcorr(as.matrix(otu.table.filter), type="spearman")

# Get p-value matrixotu.pval <- forceSymmetric(otu.cor$P) # Self-correlation as NA

# Select only the taxa for the filtered OTUs by using rownames of otu.pvalsel.tax <-tax[rownames(otu.pval),,drop=FALSE]

# Sanity check all.equal(rownames(sel.tax), rownames(otu.pval))

Page 17: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

# Filter the association based on p-values and level of correlationsp.yes<-otu.pval<0.001

# Select the r values for p.yesr.val=otu.cor$r # select all the correlation values p.yes.r<-r.val*p.yes # only select correlation values based on p-value criterion

# Select OTUs by level of correlationp.yes.r<-abs(p.yes.r)>0.75 # output is logical vectorp.yes.rr<-p.yes.r*r.val # use logical vector for subscripting.

Page 18: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

# Create an adjacency matrixadjm<-as.matrix(p.yes.rr)

# Add taxonomic information associated with adjacency matrixcolnames(adjm)<-as.vector(sel.tax$Family)rownames(adjm)<-as.vector(sel.tax$Family)

# Create an adjacency matrix in igraph format net.grph=graph.adjacency(adjm,mode="undirected",weighted=TRUE,diag=FALSE)

Page 19: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

# Calculate edge weight == level of correlationedgew<-E(net.grph)$weight

# Identify isolated nodesbad.vs<-V(net.grph)[degree(net.grph) == 0]

# Remove isolated nodesnet.grph <-delete.vertices(net.grph, bad.vs)

Page 20: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

# Plot the graph objectplot(net.grph,

vertex.size=4,vertex.frame.color="black",edge.curved=F,edge.width=1.5,layout=layout.fruchterman.reingold,edge.color=ifelse(edgew<0,"red","blue"),vertex.label=NA,vertex.label.color="black",vertex.label.family="Times New Roman",vertex.label.font=2)

Page 21: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

unclassified

PlanctomycetaceaePasteuriaceae

unclassified

unclassified

Bacteroidetes_incertae_sedis_family_incertae_sedis

unclassified

Coxiellaceae

Acidobacteria_Gp6_family_incertae_sedis

PlanctomycetaceaeSpartobacteria_family_incertae_sedis

Planctomycetaceae

Polyangiaceae

Acidobacteria_Gp4_family_incertae_sedis

Parachlamydiaceae

unclassified

Planctomycetaceae

unclassifiedOpitutaceae

unclassified

Alicyclobacillaceae

unclassified

Halomonadaceae

Acidobacteria_Gp16_family_incertae_sedis

unclassified

Clostridiaceae_1

unclassified

unclassified

Planctomycetaceae

Planctomycetaceae

Planctomycetaceae

Planctomycetaceae

unclassified

Planctomycetaceae

Planctomycetaceae

unclassified

PlanctomycetaceaeSubdivision3_family_incertae_sedis

Subdivision3_family_incertae_sedis

RhodospirillaceaeNannocystaceae Peptostreptococcaceae

unclassifiedunclassified

unclassified

Rubrobacteraceae

BRC1_family_incertae_sedis

Acidobacteria_Gp4_family_incertae_sedis

Nitrospiraceae

unclassified

unclassified

Acidobacteria_Gp4_family_incertae_sedisunclassified

unclassified

unclassified

unclassified

Bdellovibrionaceae

unclassified

Coxiellaceae

TM7_family_incertae_sedis

unclassified

unclassified

unclassified

unclassified

unclassified

Parachlamydiaceae

unclassified

Planctomycetaceae

unclassified

unclassified

Spartobacteria_family_incertae_sedis

Cryomorphaceae

OP11_family_incertae_sedis

OD1_family_incertae_sedis

unclassified

Acidobacteria_Gp10_family_incertae_sedis

unclassified

unclassified

unclassified

Parachlamydiaceae

Parachlamydiaceae

Parachlamydiaceae

OD1_family_incertae_sedis

unclassified

unclassified

OD1_family_incertae_sedisOD1_family_incertae_sedis

unclassified

unclassified

unclassified

Legionellaceae

OD1_family_incertae_sedis

unclassified

unclassified

unclassified

Page 22: Using network models for analyzing microbiome data · Genomics: Study of structure, function, evolution, and mapping of genomes. Metagenomics( Environmental Genomicsor Community Genomics)

# Summaries of network traitsV(net.grph)

E(net.grph)

vcount(net.grph)

ecount(net.grph)

plot(degree(net.grph))

max(degree(net.grph))

min(degree(net.grph))

http://igraph.org/r/