daniela puiu at #icg12: the first near-complete assembly of the hexaploid bread wheat genome,...
TRANSCRIPT
![Page 1: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/1.jpg)
The first near-complete assembly of the hexaploid bread wheat genome,
Tritricum aestivum
Daniela PuiuAleksey Zimin, Richard Hall, Sarah Kingan, Bernardo Clavijo, Steven Salzberg
ICG-12Oct 27 2017
![Page 2: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/2.jpg)
IGC-12The Wheat Genome 2
Sequencing and Assembly of the Ancestral and Common Wheat
Aegilops tauschii ssp strangulata accession AL8/78Chinese spring variety (CS42, accession Dv418)
2013-2017
![Page 3: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/3.jpg)
IGC-12The Wheat Genome 3
History of Wheat
~8,000 years ago: spontaneous hybridizationEmmer Wheat + Goat grass = Bread Wheat (World's 3rd cereal crop)
Triticum turgidum + Aegilops tauschii = Triticum aestivumAABB + DD = AABBDD
Whole Genome => Assisted Breeding => Improved Yield
![Page 4: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/4.jpg)
IGC-12The Wheat Genome 4
The Wheat Genome
One of the most complex genomes !
1) Genome size: over 15 billion bases 2) Allohexapoild : six copies of each chromosome3) >90% repeats
Multiple past attempts to assemble => assemblies shorter than the estimated genome size.
![Page 5: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/5.jpg)
IGC-12The Wheat Genome 5
New vs Previous Assemblies
Tritricum 3.1
N50
232K
![Page 6: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/6.jpg)
IGC-12The Wheat Genome 6
Data Reduction
Original Reads Number Sum Coverage Accuracy
Illumina 7.06G 1Tb 65x 99.5%
PacBio 55.5M 545Gb 36x 87.5%
Processed Seq Number Sum Coverage Accuracy
super-reads 95.7M 31Gb 2x 99.95%
mega-reads 57M 278Gb 18x 99.65%
MaSuRCA mega-readshybrid correction
![Page 7: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/7.jpg)
IGC-12The Wheat Genome 7
MaSuRCA mega-reads Correction
![Page 8: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/8.jpg)
IGC-12The Wheat Genome 8
Assembly Pipeline
MaSuRCA Correction
Illumina
Celera WGS Assembler
Mega-reads
Remove Duplicates
Tritricum 1.0
Tritricum 2.0
FALCON Correction
PacBio
FALCON Assembler
pReads
Arrow Polishing
FALCON Trit 0.5
FALCON Trit 1.0
k-mer Analysis
Merge
Tritricum 3.1
![Page 9: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/9.jpg)
IGC-12The Wheat Genome 9
k-mer Analysis
50M
k-mers missing from the PacBio assembly only
40M
30M
20M
10M
31-mer frequencies
![Page 10: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/10.jpg)
IGC-12The Wheat Genome 10
Assembly Merge
Merging of the Hybrid and PacBio assemblies Merging of the Hybrid and PacBio assemblies
Tritricum 2.0 contig
FALCON contigA FALCON contigB
Tritricum 3.1
>5Kb >5Kb>5Kb
![Page 11: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/11.jpg)
IGC-12The Wheat Genome 11
Assembly Statistics
Assembly Number Total size (bp)
N50 size (bp)
Triticum 2.0 375,328 14,395,027,822 75,599
FALCON Trit 1.0 97,809 12,939,100,857 215,314
Triticum 3.1 279,439 15,344,693,583 232,659
![Page 12: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/12.jpg)
IGC-12The Wheat Genome 12
Run Time: 100 CPU years
Main Steps
RunTimeCPUhrs
WallTimeMonths
MaSuRCA 100K 1.5
Celera WGS 470K 5
FALCON 150K 0.75
ARROW 160K 0.75
total 880K 9
100K CPU hrs=11.5 years800K CPU hrs=100 years
![Page 13: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/13.jpg)
IGC-12The Wheat Genome 13
Genome Repetitiveness
k-mer uniqueness ratios
WHEAT
FLY
COW
RICE
PINE
Ae tauschii
![Page 14: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/14.jpg)
IGC-12The Wheat Genome 14
Publication
![Page 15: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/15.jpg)
IGC-12The Wheat Genome 15
Conclusions
The most challenging genome (we) assembled!
Learning experience!
Assembly quality vs computational resources?
Share your data!
The most challenging genome (we) assembled!
Learning experience!
Assembly quality vs computational resources?
Share your data!
![Page 16: Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum](https://reader033.vdocuments.net/reader033/viewer/2022051710/5a6cfc337f8b9af2418b4885/html5/thumbnails/16.jpg)
IGC-12The Wheat Genome 16
Acknowledgements
Steven Salzberg
Aleksey ZImin
Johns Hopkins University UCDavis Plant Sciences
Jan Dvorak
Earlham Institute
Bernardo Clavijo
Mingcheng Luo