genovo : de novo assembly for metagenomes

22
Genovo: De Novo Assembly for Metagenomes Gao Song 2010/07/14

Upload: jessie

Post on 24-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Genovo : De Novo Assembly for Metagenomes. Gao Song 2010/07/14. Outline. Overview of Metagenomices Current Assemblers Genovo Assembly. Overview of Metagemices. Motivation. Metagenomics is: Why Do We Need Metagenomics ? Snapshot of bacterial community Cannot be cultivated.

TRANSCRIPT

Page 1: Genovo : De Novo Assembly for  Metagenomes

Genovo: De Novo Assembly for Metagenomes

Gao Song2010/07/14

Page 2: Genovo : De Novo Assembly for  Metagenomes

OutlineOverview of MetagenomicesCurrent AssemblersGenovo Assembly

Page 3: Genovo : De Novo Assembly for  Metagenomes

Overview of Metagemices

Page 4: Genovo : De Novo Assembly for  Metagenomes

Metagenomics is:

Why Do We Need Metagenomics?Snapshot of bacterial communityCannot be cultivated

Motivation

<1%

Page 5: Genovo : De Novo Assembly for  Metagenomes

Monitoring the impact of pollutants on ecosystems

Discovery of new genes, enzymes…- Global Ocean Sampling Expedition

Human Microbiome Project

JGI sequenced Acid Mine Drainage sample

Applications

Page 6: Genovo : De Novo Assembly for  Metagenomes

Marker Gene Sequencing16s rRNA:

Two ways

Other marker genes: RuBisCo, NifHOnly composition

Whole Genome Sequencing (WGS)Detailed picture of community

Two Paradigms

Page 7: Genovo : De Novo Assembly for  Metagenomes

Complex Communities>1000X5000200L

1million

Page 8: Genovo : De Novo Assembly for  Metagenomes

Current Assembler

Page 9: Genovo : De Novo Assembly for  Metagenomes

Why not assemble reads?

ORFome assembler*Three steps:

The putative ORFs are annotated for each read ORFs are assembled using EULER ORF homologs are searched for in Integrated Microbial Genomics

(IMG) database

Existing WGS assemblersSanger reads: Phrap, Celera, Arachne, JAZZ…Short reads: Velvet, Newbler…

Current Status

* Y. Ye and H. Tang, "An orfome assembly approach to metagenomics sequences analysis." Journal of bioinformatics and computational biology, vol. 7, no. 3, pp. 455-471, June 2009

Page 10: Genovo : De Novo Assembly for  Metagenomes

Genovo: De Novo Assembly for Metagenomes

Jonathan Laserson, Vladimir Jojic and Daphne Koller. RECOMB 2010, LNBI 6044, pp. 341-356, 2010

Page 11: Genovo : De Novo Assembly for  Metagenomes

Main IdeaPropose a generative model for Metagenome

dataUsing iterated conditional modes (ICM)Using hill-climbing steps iterativelyDesign a score for evaluation

Page 12: Genovo : De Novo Assembly for  Metagenomes

ModelInitialize contigs:

Infinite contigs with infinite length

Partition the readsUsing Chinese Restaurant Process

Page 13: Genovo : De Novo Assembly for  Metagenomes

ModelGenerate the starting point oi

Generate the length of read

Quality of assembly of each read

Page 14: Genovo : De Novo Assembly for  Metagenomes

AlgorithmUsing ICMStarting from initial condition, hill-climbing

moves are performed iterativelyMove 1: Consensus Sequence:

Select the most frequent base

Page 15: Genovo : De Novo Assembly for  Metagenomes

AlgorithmMove 2: Read Mapping

For read i, first remove it, then recalculate its contig and alignment

First, for each potential location, compute alignment

Then, select the location according to possibility

Filtering: using common 10-mer

Page 16: Genovo : De Novo Assembly for  Metagenomes

AlgorithmMove 3: update geometric variable

->Globle moves:

Propose indelsCenterMerge contigs

Chimeric readsDisassemble the dangling contigs

Page 17: Genovo : De Novo Assembly for  Metagenomes

EvaluationBLASTPFAMDesigned score

1st term: quality of assembly2nd term: penalty for total length3rd term: prefer to merge when V>V0

Page 18: Genovo : De Novo Assembly for  Metagenomes

ResultsUsing 454 readsCompare with Newbler, Velvet and EULER-

SRSingle Genome

Page 19: Genovo : De Novo Assembly for  Metagenomes

ResultMetagenome data

Score

PFAM

Page 20: Genovo : De Novo Assembly for  Metagenomes

DiscussionNew ideaApply a mature algorithm to assembly

domainSystematically describe and analyze the

problem and algorithmResults are better

Page 21: Genovo : De Novo Assembly for  Metagenomes

DiscussionSlowly: minute vs. hours for 300k 454 readsMain idea: try to extend as long as possible,

so they will have more hits for BLASTWhy choose 20 for V0?How to deal with branching? Repeats?Model:

Why it can capture the property of metagenomic data?

How to argue the correctness of that model?The distribution of starting points

Page 22: Genovo : De Novo Assembly for  Metagenomes

Thank you