Charting the function of microbesand microbial communities
Curtis Huttenhower
11-17-11Harvard School of Public HealthDepartment of Biostatistics
Valm et al, PNAS 2011
What to do with your metagenome?
3
Diagnostic or prognostic
biomarker for host disease
Public health tool monitoring
population health and interactions
Comprehensive snapshot of
microbial ecology and evolution
Reservoir of gene and protein
functional information
Who’s there?What are they doing?
Who’s there varies: your microbiota is plastic and personalized.
This personalization is true at the level of phyla, genera, species, strains, and
sequence variants.
What they’re doing is adapting totheir environment:
you, your body, and your environment.
The NIH Human Microbiome Project (HMP):A comprehensive microbial survey
• What is a “normal” human microbiome?• 300 healthy human subjects• Multiple body sites
• 15 male, 18 female• Multiple visits• Clinical metadata
www.hmpdacc.org
Slides by Dirk Gevers
A three-tier study design…
16S WGS
ref
…for mining metagenomic data
contigs
pathways
~100M readsper sample
Assembly
Annotation
Map on
~50%
~90M proteins
16S WGS
Filtering/trimming
Chimera removal
>3k readsper sample
BLASTagainst
functionalDBs
Organismal censusat different taxonomic levels
ref
Taxonomicclassification
(RDP)
Clusteringinto OTUs
census...
~36%
~57%
genes
Buccal mucosa Tongue dorsum Supragingival plaque
Stool Posterior fornix Anterior nares Retroauricular crease
0
0.02
0.04
0.06
0.08
0.1
0.12 Gardnerella vaginalisAlistipes putredinisGemella haemolysansActinomyces odontolyticusCapnocytophaga sputigenaCapnocytophaga gingivalisCapnocytophaga ochraceaEikenella corrodensBurkholderiales bacteriumPropionibacterium acnesParvimonas micraPorphyromonas gingivalisProteus mirabilisStreptobacillus moniliformisAtopobium rimaeUreaplasma urealyticumEggerthella lentaProteus penneriArcobacter butzleriSalmonella entericaNocardia farcinicaCryptobacterium curtumAv
erag
e Re
lativ
e Ab
unda
nce
“Pathogen” carriage varies a lot
7
Gardnerella
Alistipes
Capnocytophaga
Actinomyces
Gemella
22 ***uniquely identifiable*** nonzero abundance “pathogens” from NIAID’s list of 135
+Propionibacterium>0.66
00.020.040.060.080.1
0.120.14
Supragingival plaqueCapnocytophaga gingivalisCapnocytophaga sputigenaCapnocytophaga ochracea
124 Samples
Rela
tive
Abun
danc
e
00.050.1
0.150.2
0.250.3
0.350.4
Stool
Alistipes putredinis
146 Samples
Rela
tive
Abun
danc
e
0
0.2
0.4
0.6
0.8
1
Posterior fornix
Gardnerella vaginalis
60 Samples
Rela
tive
Abun
danc
e
8
Phenotypes that explain variation(or not) can be surprising
Nor
mal
ized
rela
tive
abun
danc
e
9
Phenotypes that explain variation(or not) can be surprising
Nor
mal
ized
rela
tive
abun
danc
e
10
Phenotypes that explain variation(or not) can be surprising
Nor
mal
ized
rela
tive
abun
danc
e
GeneexpressionSNPgenotypes
A functional perspective on thehuman microbiome
11
Healthy/IBDBMIDiet
Taxon abundancesEnzyme family abundancesPathway abundances
Functional seq.KEGG + MetaCYC
CAZy, TCDB,VFDB, MEROPS…
100 subjects1-3 visits/subject~7 body sites/visit
10-200M reads/sample100bp reads
Metagenomic reads
Enzymes and pathways
?
HUMAnNHMP Unified Metabolic
Analysis Networkhttp://huttenhower.sph.harvard.edu/humann
BLAST
12
HUMAnN: Metabolic reconstruction
Pathway coverage Pathway abundance
← Samples →←
Pat
hway
s→
Vaginal Skin NaresGut Oral (SupP)Oral (BM) Oral (TD)
← P
athw
ays→
← Samples →
Vaginal Skin Nares GutOral (SupP) Oral (BM) Oral (TD)
← Subjects →
← P
athw
ay a
bund
ance
→←
Phy
loty
pe a
bund
ance
→
A portrait of the healthy human microbiome:Who’s there vs. what they’re doing
13
Vaginal SkinNares Gut Oral (SupP)Oral (BM) Oral (TD)
← P
hylo
type
abu
ndan
ce →
← Subjects →
← P
athw
ay a
bund
ance
→
← P
athw
ay a
bund
ance
→
← ~700 HMP communities→
Niche specialization in human microbiome function
14
Metabolic modules in theKEGG functional catalogenriched at one or more
body habitats
• 16 (of 251) modules strongly “core” at 90%+ coverage in 90%+ individuals at 7 body sites
• 24 modules at 33%+ coverage• 71 modules (28%) weakly “core” at 33%+ coverage in 66%+ individuals at 6+ body sites• Contrast zero phylotypes or OTUs meeting this threshold!• Only 24 modules (<10%) differentially covered by body site• Compare with 168 modules (>66%) differentially abundant by body site
Proteoglycan degradationby the gut microbiota
15
AA coreGlycosaminoglycans(Polysaccharide chains)
Proteoglycan degradation:From pathways to enzymes
16
10-310-8
Enzyme relative abundance
• Heparan sulfate degradation
missing due to the absence of
heparanase, a eukaryotic enzyme
• Other pathways not bottlenecked
by individual genes
• HUMAnN links microbiome-wide
pathway reconstructions →
site-specific pathways →
individual gene families
Patterns of variation in human microbiome function by niche
17
Patterns of variation in human microbiome function by niche
18
• Three main axes of variation
• Eukaryotic exterior• Low-diversity vaginal• Gut metabolism• Oral vs. tooth hard
surface• Only broad patterns:
every human-associated habitat
is functionally distinct!
Normal varies a lot at the genus level (16S)
200 subjects
Bacteroides
AlistipesFaecalibacterium
Parabacteroides
343 genera
Rela
tive
frequ
ency
Relative frequency of genera within Stool
Dirk Gevers
Bacteroides vulgatus
Bacteroides sp.
Bacteroides uniformis
Bacteroides sp.Bacteroides stercorisBacteroides caccae
Relative frequency of Bacteroides species within Stool
123 samples
Rela
tive
frequ
ency
Normal varies a lot at the species level (WGS)
Dirk Gevers
What’s wrong with this picture?
21
Lactobacillus crispatus MV-1A-USLactobacillus crispatus JV-V01Lactobacillus crispatus 125-2-CHNLactobacillus crispatus 214-1Lactobacillus crispatus MV-3A-USLactobacillus crispatus ST1Lactobacillus gasseri JV-V03Lactobacillus gasseri 202-4Lactobacillus gasseri 224-1Lactobacillus gasseri MV-22Bifidobacterium breve DSM 20213Bifidobacterium dentium ATCC 27679Mycoplasma hominisClostridiales genomosp BVAB3 str UPII9-5Clostridiales genomosp BVAB3 UPII9-5Gardnerella vaginalis AMDPrevotella timonensis CRIS 5C-B1Megasphaera genomosp type 1 str 28LPorphyromonas uenonis 60-3Gardnerella vaginalis 409-05Gardnerella vaginalis 5-1Atopobium vaginae DSM 15829Gardnerella vaginalis ATCC 14019Lactobacillus jensenii 1153Lactobacillus jensenii 269-3Lactobacillus jensenii SJ-7A-USLactobacillus jensenii 208-1Lactobacillus jensenii JV-V16Lactobacillus jensenii 27-2-CHNLactobacillus jensenii 115-3-CHNLactobacillus iners AB-1Lactobacillus iners DSM 13335
52 posterior fornix microbiomes →
Species and strains matter – but so does your method for
identifying them in a community!
Core gene families
22
Gene X is a core gene for Clade Y
All subclades of Clade Y must have Gene X as core gene (strict definition)
Gene X may be a core gene of several (unrelated) clades
We have to relax the definition for taking into account:• Low-level gene losses• Sequencing errors• Gene calls errors
Gene XA core gene is a gene strongly conserved within a clade
23
Examples of core genes
Clade-specific marker genes
24
Gene XGene X is a marker gene (for Clade Y) if X is a core gene for Y and X never appears outside Clade Y
Examples of marker genes
25
26
The BactoChip: high-throughput microbial species identification
With Olivier Jousson, Annalisa Ballarini
27
BactoChip: detecting single speciesWith Olivier Jousson, Annalisa Ballarini
MetaPhlAn: inferring microbial abundancesfrom metagenomic data using marker genes
28
• Map metagenomic reads to marker genes to infer microbial abundances– Normalizing for copy number, gene length, etc.
Much faster than existing approaches as the marker gene database is ~50 times smaller than the whole microbial sequence DB
Few hours instead of weeks for Illumina samples with 100Gb of sequence data
MetaPhlAn: Metagenomic Phylogenetic Analysishttp://huttenhower.sph.harvard.edu/metaphlan
MetaPhlAn: synthetic validation on log-normal abundances
29
Summary of 8 synthetic communities composed by 2M reads coming from 200 organisms with log-normal distributed abundances concentrations
Species-level Class-levelSpecies level Class level
Matching 16S and more
30
The human microbiome atspecies-level resolution
31
Whence enterotypes?
32
Gen
era
Spec
ies
Microbial community function and structure in the human microbiome: the story so far?
• Who’s there varies even in health– What they’re doing doesn’t (as much)– Both correlate with niche– By the way: both change during disease and treatment
• There are patterns in this variation– Function correlates with membership and phenotype– “Pathogenicity” correlates with lower prevalence– Membership means species, strains, or variants– Patterns aren’t always as simple as enterotypes
• ~1/3 to 2/3 of human metagenome characterized– Job security!
33
Ask both what you can do for your microbiomeand what your microbiome can do for you
Wendy GarrettMichelle Rooks
Ramnik XavierHarry Sokol
Thanks!
35
Nicola Segata Levi Waldron
Fah Sathira
Human Microbiome Project
HMP Metabolic Reconstruction
Owen WhiteGeorge WeinstockKaren NelsonJoe PetrosinoMihai PopPat SchlossMakedonka MitrevaErica SodergrenVivien Bonazzi Jane PetersonLita Proctor
Sahar AbubuckerYuzhen Ye
Beltran Rodriguez-MuellerJeremy ZuckerQiandong Zeng
Mathangi ThiagarajanBrandi Cantarel
Maria RiveraBarbara Methe
Bill KlimkeDaniel Haft
Dirk Gevers
Bruce Birren Mark DalyDoyle Ward Eric AlmAshlee Earl Lisa Cosimi
http://huttenhower.sph.harvard.edu
Joseph Moon
VagheeshNarasimhan
Tim Tickle
Xochi Morgan
Josh Reyes
Jeroen RaesKaroline Faust
Jacques Izard
Olivier JoussonAnnalisa Ballarini
Linking function to community composition
37
← T
axa
and
corr
elat
ed m
etab
olic
pat
hway
s →
← 52 posterior fornix microbiomes →
F-type ATPase, THF
Sugar transport
Phosphate and peptide transport
AA and small molecule biosynthesis
Embden-Meyerhof glycolysis, phosphotransferases
Eukaryotic pathways
Plus ubiquitous pathways: transcription, translation, cell wall, portions of central carbon metabolism…
Lactobacillus crispatus
Lactobacillus jensenii
Lactobacillus gasseri
Lactobacillus iners
Gardnerella/Atopobium
Candida/Bifidobacterium
Linking communities to host phenotype
38
Nor
mal
ized
rela
tive
abun
danc
e
Vaginal pH (posterior fornix)
Body Mass Index
Top correlates with BMI in stool
Vaginal pH, community metabolism, and community composition represent a strong, direct link between
phenotype and function in these data.Vaginal pH (posterior fornix)