cs/bioe 598agb: genome assembly, part ii tandy warnow

27
CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

Upload: roberta-bruce

Post on 13-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

CS/BioE 598AGB:Genome Assembly, part II

Tandy Warnow

Page 2: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

nature biotechnology volume 29 number 11 november 2011

Page 3: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

Supplementary Figure 1. De Bruijn graph from reads with sequencing errors. (a) A de Bruijn graph E on our set of reads with k = 4. Finding an Eulerian cycle is already a straightforward task, but for this value of k, it is trivial. (b) If TGGAGTG is incorrectly sequenced as a sixth read (in addition to the correct TGGCGTG read), then the result is a bulge in the de Brujin graph, which complicates assembly.

(Supplementary materials from the Compeau, Pevzner, and Tesler paper, Nature Biotech, 2011)

Page 4: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

(c) An illustration of a de Bruijn graph E with many bulges. The process of bulge removal should leave only the red edges remaining, yielding an Eulerian path in the resulting graph.

(Supplementary materials from the Compeau, Pevzner, and Tesler paper,Nature Biotech, 2011)

Page 5: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

(Supplementary materials from the Compeau, Pevzner, and Tesler paper, Nature Biotech, 2011)

Page 6: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 7: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 8: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 9: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 10: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

N50

• The N50 value is the size of the smallest contig (or scaffold) such that 50% of the genome is contained in contigs of size N50 or larger. This is the standard metric used to evaluate the quality of an assembly.

• Salzberg et al. computed “corrected N50” values by splitting contigs (or scaffolds) where errors are identified.

Page 11: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 12: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 13: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 14: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 15: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 16: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 17: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 18: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 19: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 20: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 21: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow
Page 22: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

From Mihai Pop’s paper

Page 23: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

Differing Conclusions

• Compeau et al.: “De Bruijn graphs are not a cure-all…Short read sequencing technologies …favor the use of de Bruijn graphs...and are also well suited to representing genomes with repeats. However, if a future sequencing technology produces high quality reads with tens of thousands of bases,…,the pendulum could swing back toward favoring overlap-based approaches for assembly.”

Page 24: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

Mihai Pop’s conclusion

Page 25: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

Salzberg’s conclusions

Page 26: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow

Salzberg’s conclusions

Page 27: CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow