annotating metagenomes using the nmpdr rob edwards department of computer sciences, san diego state...
Post on 20-Dec-2015
218 views
TRANSCRIPT
Annotating Metagenomes Using the NMPDR
Annotating Metagenomes Using the NMPDR
Rob Edwards
Department of Computer Sciences, San Diego State University
Mathematics and Computer Sciences Division, Argonne National Laboratory
ASM General Meeting, Boston.
www.nmpdr.org www.theseed.org
See also poster:B-179 (126B)
Aziz et al
Firstbacterial genome
100bacterial genomes
1,000bacterial genomesN
um
ber
of
know
n s
equence
s
Year
How much has been sequenced?How much has been sequenced?
Environmentalsequencing
www.nmpdr.org www.theseed.org
Everybody inBoston
Everybody inUSA
AllculturedBacteria
100people
How much will be sequenced?
One genome fromevery species
Most majormicrobial environments
www.nmpdr.org www.theseed.org
The ProblemThe Problem
How do you generate consistent and accurate annotations for
metagenomes?
www.nmpdr.org www.theseed.org
Annotations using subsystemsAnnotations using subsystems
FIG has developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex
Extended subsystems into FIGfams – protein families that perform the same functions.
www.nmpdr.org www.theseed.org
Subsystems make up metabolismSubsystems make up metabolism
Wik
ipedia
Meta
bolis
mhtt
p:/
/en.w
ikip
edia
.org
/wik
i/Port
al:M
eta
bolis
m
predicted or measured co-regulation
genome context(virulence islands, prophages,
conserved gene clusters)
virulence mechanism
cellular localization
enzymatic activity
common phenotype
combinations of criteria
Subsystems Are Not Just PathwaysSubsystems Are Not Just Pathways
www.nmpdr.org www.theseed.org
Automated Annotations of Complete genomes
Automated Annotations of Complete genomes
• Automated user originated processing
• Takes 1-7 hours depending on size and complexity of the genome
• ~1,500 external submissions, including 150 genomes not yet publicly released.
• Reannotation of >500 genomes complete
• 789 users, 160 organizations, 25 countries.
http://rast.nmpdr.org/
Automated Annotations of Complete Metagenomes
Automated Annotations of Complete Metagenomes
MG-RAST Server
Accurate and consistent annotations in a few days
Automatic metabolic reconstructionFreely available after registration
http://metagenomics.theseed.org/
www.nmpdr.org www.theseed.org
Metagenome AnnotationMetagenome Annotation
Automated pipeline– upload sequences in fasta, with or without
Q-scores– removes exact duplicates (454 artefact)– renumbers sequences (mapping provided)– BLAST against SEED nr, 16S rDNA– Annotations and metabolic reenactment– Taxonomic summary
www.nmpdr.org www.theseed.org
Comparing Metagenomes to Genomes (or other metagenomes!)
Comparing Metagenomes to Genomes (or other metagenomes!)
Hours
of
Com
pute
Tim
e
Input size (MB)
MG-RAST computationMG-RAST computation~19 hours of compute per input megabyte
How much so farHow much so far
676 metagenomes
10,012,793,995 bp (10 Gbp)
Average: ~15 M bp per genome
Compute time (on a single CPU):
190,243 hours = 7,926 days = 21 years
~200 GS20~200 FLX~200 Sanger]
www.nmpdr.org www.theseed.org
Lots of sequencesall pyrosequencing
Lots of sequencesall pyrosequencing
www.nmpdr.org www.theseed.org
Sulfur
CDA 60.2%
CD
A 2
1.7
% Respiration
Capsule Motility
Membranetransport
Stress
Signaling
Phosphorus
RNA
MineSaltern
MarineMicrobialites
CoralFish
AnimalsFreshwater
From Sequences To EnvironmentsFrom Sequences To Environments
Dinsdale et al, Nature 2008
Upcoming FeaturesUpcoming Features
• More user options (removing sequences, E-values, percent identities, etc)
• More databases (ACLAME, human, etc)
• More user generated content (mash-ups) via webservices and published API
www.nmpdr.org www.theseed.org
WorkshopsWorkshops
Free workshops on NMPDR, RAST, mg-RAST, SEED
Upcoming workshops: Greece, Argonne, Urbana-Champaign, San Diego
Contact Leslie McNeil [email protected]
or visithttp://www.nmpdr.org/
AcknowledgementsAcknowledgements
Environmental GenomicsForest Rohwerand the labs that
provided sequence
Metagenomics Annotation ServerRick StevensDaniel Paarman Folker MeyerBob OlsenMark D'Souza Statistics & Web services
Liz DinsdaleDana HallBeltran Rodriguez-BritoBahador Nosrat
FIGRoss OverbeekVeronika VonsteinAnnotators
www.nmpdr.org www.theseed.org