genome assembly: then and now — v1.1

273
Genome assembly: then and now Keith Bradnam Image from Wellcome Trust

Upload: keith-bradnam

Post on 10-May-2015

1.472 views

Category:

Education


1 download

DESCRIPTION


This was a talk given on 2014-06-19 for the Genome Center’s Bioinformatics Core as part of a 1 week workshop on using Galaxy. It concerns the Assemblathon projects as well as other aspects relating to genome assembly. A version of this talk is also available on Slideshare with embedded notes. Note, this is an evolving talk. There are older and newer versions of the talk also available on slideshare.

TRANSCRIPT

Page 1: Genome assembly: then and now — v1.1

Genome assembly: then and nowKeith Bradnam

Image from Wellcome Trust

Page 2: Genome assembly: then and now — v1.1

Image from flickr.com/photos/dougitdesign/5613967601/

Contents

Sequencing 101!! Genome assembly: then!! Genome assembly: now

Assemblathon 1 & 2!! Advice & Angst!! The future

Page 3: Genome assembly: then and now — v1.1

More info

✤ http://assemblathon.org!

!

✤ http://gigasciencejournal.com!

!

✤ http://twitter.com/assemblathon

Page 4: Genome assembly: then and now — v1.1

Sequencing 101A, C, G, T...

Image from nlm.nih.gov

Page 5: Genome assembly: then and now — v1.1

Read

Page 6: Genome assembly: then and now — v1.1

Read pair

Page 7: Genome assembly: then and now — v1.1

Read pair

Mate pair

Page 8: Genome assembly: then and now — v1.1
Page 9: Genome assembly: then and now — v1.1

Contigs

Page 10: Genome assembly: then and now — v1.1
Page 11: Genome assembly: then and now — v1.1
Page 12: Genome assembly: then and now — v1.1

ScaffoldNNNNNNNNNNNNNNNNNNN

Page 13: Genome assembly: then and now — v1.1

Assembly size

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

55

15

15

15

5

Page 14: Genome assembly: then and now — v1.1

Assembly size

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

55

200 Mbp

15

15

15

5

Page 15: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

55

200 Mbp

15

15

15

5

Page 16: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

55

200 Mbp

15

15

15

5

Page 17: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

55

200 Mbp

15

15

15

5

70

Page 18: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

55

15

15

15

5

200 Mbp

95

Page 19: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

55

15

15

15

5

200 Mbp

95

Page 20: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

55

15

15

15

5

200 Mbp

115

Page 21: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

55

15

15

15

5

200 Mbp

115

Page 22: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

55

15

15

15

5

200 Mbp

Page 23: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

5

15

15

15

5

5

Page 24: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

5

15

15

15

5

5

Page 25: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

5

15

15

15

Page 26: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

5

15

15

15

Page 27: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

5

15

15

15

190 Mbp

Page 28: Genome assembly: then and now — v1.1

N50 length

NNNNNNNNNNNNNNNNNNN

NNNNNNNNNNN

NNNNNNNNNNN

70 25

20

10

10

5

5

15

15

15

190 Mbp

Page 29: Genome assembly: then and now — v1.1

N50 for two assemblies

Page 30: Genome assembly: then and now — v1.1

N50 for two assemblies

208 Mbp 190 Mbp

Page 31: Genome assembly: then and now — v1.1

N50 for two assemblies

208 Mbp 190 Mbp

N50 = 15 Mbp N50 = 25 Mbp

Page 32: Genome assembly: then and now — v1.1

NG50 for two assemblies

208 Mbp 190 Mbp

Page 33: Genome assembly: then and now — v1.1

NG50 for two assemblies

Page 34: Genome assembly: then and now — v1.1

NG50 for two assemblies

Expected genome size = 250 Mbp

Page 35: Genome assembly: then and now — v1.1

Expected genome size = 250 Mbp

NG50 for two assemblies

Page 36: Genome assembly: then and now — v1.1

NG50 = 15 Mbp NG50 = 15 Mbp

Expected genome size = 250 Mbp

NG50 for two assemblies

Page 37: Genome assembly: then and now — v1.1

You should check that high N50 values!are not simply due to lots of Ns in the scaffolds!

Page 38: Genome assembly: then and now — v1.1

Assembly 'x'

Page 39: Genome assembly: then and now — v1.1

Assembly 'x'

Size: 859 Mbp!!

Number of scaffolds: 28!!

N50 = 70.3 Mbp

Page 40: Genome assembly: then and now — v1.1

Assembly 'x'

Size: 859 Mbp!!

Number of scaffolds: 28!!

N50 = 70.3 Mbp

Ns = 90.6% !!!

Page 41: Genome assembly: then and now — v1.1

Assembly 'x'

Size: 859 Mbp!!

Number of scaffolds: 28!!

N50 = 70.3 Mbp

Ns = 90.6% !!!

Page 42: Genome assembly: then and now — v1.1

Basic assembly metrics

Page 43: Genome assembly: then and now — v1.1

Basic assembly metrics

Metric Description

Assembly size With or without very short contigs?

N50 / NG50 For contigs and/or scaffolds

Coverage When compared to a reference sequence

Errors Base errors from alignment to reference sequence !and/or input read data

Number of genes From comparison to reference transcriptome !and/or set of known genes

Page 44: Genome assembly: then and now — v1.1

Basic assembly metrics

Metric Description

Assembly size With or without very short contigs?

N50 / NG50 For contigs and/or scaffolds

Coverage When compared to a reference sequence

Errors Base errors from alignment to reference sequence !and/or input read data

Number of genes From comparison to reference transcriptome !and/or set of known genes

And many, many more...

Page 45: Genome assembly: then and now — v1.1

Genome assemblyBack in the day...

Page 46: Genome assembly: then and now — v1.1

Genome assemblyBack in the day...

1998

Page 47: Genome assembly: then and now — v1.1

Genome assembly: then

Page 48: Genome assembly: then and now — v1.1

Genetic maps ✓

Genome assembly: then

Page 49: Genome assembly: then and now — v1.1

Genetic maps ✓ Physical maps ✓

Genome assembly: then

Page 50: Genome assembly: then and now — v1.1

Genetic maps ✓ Physical maps ✓Understanding of target genome ✓

Genome assembly: then

Page 51: Genome assembly: then and now — v1.1

Genetic maps ✓ Physical maps ✓Understanding of target genome ✓Haploid / low heterozygosity genome ✓

Genome assembly: then

Page 52: Genome assembly: then and now — v1.1

Genetic maps ✓ Physical maps ✓Understanding of target genome ✓Haploid / low heterozygosity genome ✓Accurate & long reads ✓

Genome assembly: then

Page 53: Genome assembly: then and now — v1.1

Genetic maps ✓ Physical maps ✓Understanding of target genome ✓Haploid / low heterozygosity genome ✓Accurate & long reads ✓Resources (time, money, people) ✓

Genome assembly: then

Page 54: Genome assembly: then and now — v1.1

So what was the result of spending millions of dollars !to assemble genomes of well-characterized species,!with accurate long reads, and detailed maps???

Page 55: Genome assembly: then and now — v1.1

✤ 2000: published genome size = 125 Mbp

✤ 2007: genome size = 157 Mbp

✤ 2012: genome size = 135 Mbp

Arabidopsis thaliana

Page 56: Genome assembly: then and now — v1.1

✤ 2000: published genome size = 125 Mbp

✤ 2007: genome size = 157 Mbp

✤ 2012: genome size = 135 Mbp

✤ Amount sequenced = 119 Mbp

Arabidopsis thaliana

Page 57: Genome assembly: then and now — v1.1

✤ 2000: published genome size = 125 Mbp

✤ 2007: genome size = 157 Mbp

✤ 2012: genome size = 135 Mbp

✤ Amount sequenced = 119 Mbp

✤ Ns = 0.2% of genome

Arabidopsis thaliana

Page 58: Genome assembly: then and now — v1.1

Two views of the same gene

Page 59: Genome assembly: then and now — v1.1

Two views of the same gene

Top: from genome sequence view on TAIR web site!Bottom: from gene sequence file on TAIR FTP site

Page 60: Genome assembly: then and now — v1.1

Drosophila melanogaster

✤ Genome published 1998

✤ Heterochromatin finished 2007

Page 61: Genome assembly: then and now — v1.1

Drosophila melanogaster

✤ Genome published 1998

✤ Heterochromatin finished 2007

✤ Ns = 4% of genome

Page 62: Genome assembly: then and now — v1.1

Caenorhabditis elegans

✤ Genome published 1998

✤ 2004: last N removed

Page 63: Genome assembly: then and now — v1.1

Caenorhabditis elegans

✤ Genome published 1998

✤ 2004: last N removed

✤ 1998–2014: genome sequence changes

Page 64: Genome assembly: then and now — v1.1

Caenorhabditis elegans

✤ Genome published 1998

✤ 2004: last N removed

✤ 1998–2014: genome sequence changes

✤ 558 insertions

✤ 230 deletions

✤ 614 substitutions

Page 65: Genome assembly: then and now — v1.1

Caenorhabditis elegans

✤ Genome published 1998

✤ 2004: last N removed

✤ 1998–2014: genome sequence changes

✤ 558 insertions

✤ 230 deletions

✤ 614 substitutions

} Nov 2012

Page 66: Genome assembly: then and now — v1.1

Saccharomyces cerevisiae

✤ Genome published 1997

✤ 12 Mbp genome

✤ 1,653 changes to genome since 1997

Page 67: Genome assembly: then and now — v1.1

Saccharomyces cerevisiae

✤ Genome published 1997

✤ 12 Mbp genome

✤ 1,653 changes to genome since 1997

✤ Last changes made in 2011

Page 68: Genome assembly: then and now — v1.1

Genetic maps ✓ Physical maps ✓Understanding of target genome ✓Haploid / low heterozygosity genome ✓Accurate & long reads ✓Resources (time, money, people) ✓

Genome assembly: then

Page 69: Genome assembly: then and now — v1.1

Genetic maps ✗

Physical maps ✗

Understanding of target genome ✗

Haploid / low heterozygosity genome ✗

Accurate & long reads ✗

Resources (time, money, people) ✗

Genome assembly: now

Page 70: Genome assembly: then and now — v1.1

Assembling & finishing!a genome is not easy!

Page 71: Genome assembly: then and now — v1.1

AssemblathonsA new idea is born

Image from flickr.com/photos/dullhunk/4422952630

Page 72: Genome assembly: then and now — v1.1
Page 73: Genome assembly: then and now — v1.1

If you sequence 10,000 genomes...!...you need to assemble 10,000 genomes

Page 74: Genome assembly: then and now — v1.1

How many assembly tools are out there?

Page 75: Genome assembly: then and now — v1.1

bambus2

How many assembly tools are out there?

Ray

Celera

MIRA

ALLPATHS-LGSGA

Curtain MetassemblerPhusion

ABySS

Amos

Arapan

CLC

Cortex

DNAnexus

DNA Dragon

EdenaForge

GeneiousIDBA

Newbler

PRICE

PADENA

PASHA

Phrap

TIGR

Sequencher

SeqMan NGen

SHARCGS

SOPRA

SSAKE

SPAdes

Taipan

VCAKE

Velvet

Arachne

PCAP

GAM

MonumentAtlas

ABBA

Anchor

ATAC

Contrail

DecGPU GenoMinerLasergene

PE-Assembler

Pipeline Pilot

QSRA

SeqPrep

SHORTY

fermiTelescoper

QuastSCARPA Hapsembler

HapCompass

HaploMerger

SWiPS

GigAssembler

MSR-CA

MaSuRCA

GARM

Cerulean

TIGRA

ngsShoRT

PERGA

SOAPdenovo

REAPR

FRCBam

EULER-SR SSPACE

Opera

mip

gapfiller

image

PBJelly

HGAP

FALCON

Dazzler

GGAKE

A5

CABOG

SHRAPSR-ASM

SuccinctAssembly

SUTTA

Ragout

Tedna

Trinity

SWAP-Assembler

SILP3

AutoAssemblyD

KGBAssembler

MetAMOS

iMetAMOS

MetaVelvet-SL

KmerGenie

Nesoni

Pilon

Platanus

CGAL

GAGM

Enly

BESST

Khmer

GRIT

IDBA-MTP

dipSPAdes

WhatsHap

SHEAR

ELOPER

OMACC

Page 76: Genome assembly: then and now — v1.1

How many assembly tools are out there?

Page 77: Genome assembly: then and now — v1.1

bambus2

How many assembly tools are out there?

Ray

Celera

MIRA

ALLPATHS-LGSGA

Curtain MetassemblerPhusion

ABySS

Amos

Arapan

CLC

Cortex

DNAnexus

DNA Dragon

EdenaForge

GeneiousIDBA

Newbler

PRICE

PADENA

PASHA

Phrap

TIGR

Sequencher

SeqMan NGen

SHARCGS

SOPRA

SSAKE

SPAdes

Taipan

VCAKE

Velvet

Arachne

PCAP

GAM

MonumentAtlas

ABBA

Anchor

ATAC

Contrail

DecGPU GenoMinerLasergene

PE-Assembler

Pipeline Pilot

QSRA

SeqPrep

SHORTY

fermiTelescoper

QuastSCARPA Hapsembler

HapCompass

HaploMerger

SWiPS

GigAssembler

MSR-CA

MaSuRCA

GARM

Cerulean

TIGRA

ngsShoRT

PERGA

SOAPdenovo

REAPR

FRCBam

EULER-SR SSPACE

Opera

mip

gapfiller

image

PBJelly

HGAP

FALCON

Dazzler

GGAKE

A5

CABOG

SHRAPSR-ASM

SuccinctAssembly

SUTTA

Ragout

Tedna

Trinity

SWAP-Assembler

SILP3

AutoAssemblyD

KGBAssembler

MetAMOS

iMetAMOS

MetaVelvet-SL

KmerGenie

Nesoni

Pilon

Platanus

CGAL

GAGM

Enly

BESST

Khmer

GRIT

IDBA-MTP

dipSPAdes

WhatsHap

SHEAR

ELOPER

OMACC

Page 78: Genome assembly: then and now — v1.1

bambus2

How many assembly tools are out there?

Ray

Celera

MIRA

ALLPATHS-LGSGA

Curtain MetassemblerPhusion

ABySS

Amos

Arapan

CLC

Cortex

DNAnexus

DNA Dragon

EdenaForge

GeneiousIDBA

Newbler

PRICE

PADENA

PASHA

Phrap

TIGR

Sequencher

SeqMan NGen

SHARCGS

SOPRA

SSAKE

SPAdes

Taipan

VCAKE

Velvet

Arachne

PCAP

GAM

MonumentAtlas

ABBA

Anchor

ATAC

Contrail

DecGPU GenoMinerLasergene

PE-Assembler

Pipeline Pilot

QSRA

SeqPrep

SHORTY

fermiTelescoper

QuastSCARPA Hapsembler

HapCompass

HaploMerger

SWiPS

GigAssembler

MSR-CA

MaSuRCA

GARM

Cerulean

TIGRA

ngsShoRT

PERGA

SOAPdenovo

REAPR

FRCBam

EULER-SR SSPACE

Opera

mip

gapfiller

image

PBJelly

HGAP

FALCON

Dazzler

GGAKE

A5

CABOG

SHRAPSR-ASM

SuccinctAssembly

SUTTA

Ragout

Tedna

Trinity

SWAP-Assembler

SILP3

AutoAssemblyD

KGBAssembler

MetAMOS

iMetAMOS

MetaVelvet-SL

KmerGenie

Nesoni

Pilon

Platanus

CGAL

GAGM

Enly

BESST

Khmer

GRIT

IDBA-MTP

dipSPAdes

WhatsHap

SHEAR

ELOPER

OMACC

Which is the best?

Page 79: Genome assembly: then and now — v1.1

Comparing assemblers

✤ Can't fairly compare two assemblers if they:

Page 80: Genome assembly: then and now — v1.1

Comparing assemblers

✤ Can't fairly compare two assemblers if they:

✤ produced assemblies from different species

Page 81: Genome assembly: then and now — v1.1

Comparing assemblers

✤ Can't fairly compare two assemblers if they:

✤ produced assemblies from different species

✤ assembled same species, but used sequence data from different sequencing technologies

Page 82: Genome assembly: then and now — v1.1

Comparing assemblers

✤ Can't fairly compare two assemblers if they:

✤ produced assemblies from different species

✤ assembled same species, but used sequence data from different sequencing technologies

✤ used same sequencing technologies but have different sequence libraries

Page 83: Genome assembly: then and now — v1.1

Comparing assemblers

✤ Can't fairly compare two assemblers if they:

✤ produced assemblies from different species

✤ assembled same species, but used sequence data from different sequencing technologies

✤ used same sequencing technologies but have different sequence libraries

✤ Even using different options for the same assembler may produce very different assemblies!

Page 84: Genome assembly: then and now — v1.1

The PRICE genome assembler has 52 command-line options!!!

Page 85: Genome assembly: then and now — v1.1

The PRICE genome assembler has 52 command-line options!!!

how many of them are you going to learn?

Page 86: Genome assembly: then and now — v1.1

A genome assembly competition

Page 87: Genome assembly: then and now — v1.1

An attempt to standardize some aspects !of the genome assembly process

Genome assembly contests

Page 88: Genome assembly: then and now — v1.1

✤ 2010–2011!

✤ Used synthetic data!

✤ Small genome (~100 Mbp)!

✤ We knew the answer!

Assemblathon 1

Page 89: Genome assembly: then and now — v1.1

Here we go again

Page 90: Genome assembly: then and now — v1.1

Type of data Number of genomes

Size of genomes

Do we know the answer?

Assemblathon 1 Synthetic 1 Small ✓

Page 91: Genome assembly: then and now — v1.1

Type of data Number of genomes

Size of genomes

Do we know the answer?

Assemblathon 1 Synthetic 1 Small ✓

Assemblathon 2 Real 3 Large ✗

Page 92: Genome assembly: then and now — v1.1

Melopsittacus undulatus

Boa constrictor constrictorMaylandia zebra

Page 93: Genome assembly: then and now — v1.1

Bird

SnakeFish

Page 94: Genome assembly: then and now — v1.1

Why these three species?

Page 95: Genome assembly: then and now — v1.1

Why these three species?

Because they were there

Page 96: Genome assembly: then and now — v1.1

Species

Bird

Fish

Snake

Estimated genome size

1.2 Gbp

1.0 Gbp

1.6 Gbp

Assemble this!

Page 97: Genome assembly: then and now — v1.1

Species

Bird

Fish

Snake

Estimated genome size

1.2 Gbp

1.0 Gbp

1.6 Gbp

Illumina

285x!(14 libraries)

192x!(8 libraries)

125x!(4 libraries)

Assemble this!

Page 98: Genome assembly: then and now — v1.1

Species

Bird

Fish

Snake

Estimated genome size

1.2 Gbp

1.0 Gbp

1.6 Gbp

Illumina

285x!(14 libraries)

192x!(8 libraries)

125x!(4 libraries)

Roche 454

16x!(3 libraries)

Assemble this!

Page 99: Genome assembly: then and now — v1.1

Species

Bird

Fish

Snake

Estimated genome size

1.2 Gbp

1.0 Gbp

1.6 Gbp

Illumina

285x!(14 libraries)

192x!(8 libraries)

125x!(4 libraries)

Roche 454

16x!(3 libraries)

PacBio

10x!(2 libraries)

Assemble this!

Page 100: Genome assembly: then and now — v1.1

Who took part?

Page 101: Genome assembly: then and now — v1.1

Who took part?

Page 102: Genome assembly: then and now — v1.1

Who took part?

21 teams!43 assemblies!

52,013,623,777 bp of sequence

Page 103: Genome assembly: then and now — v1.1

Species

Bird

Fish

Snake

Competitive entries

12

10

12

Entries

Page 104: Genome assembly: then and now — v1.1

Species

Bird

Fish

Snake

Competitive entries

12

10

12

Evaluation entries

3

6

0

Entries

Page 105: Genome assembly: then and now — v1.1

Goals

Page 106: Genome assembly: then and now — v1.1

Goals

✤ Assess 'quality' of assemblies

Page 107: Genome assembly: then and now — v1.1

Goals

✤ Assess 'quality' of assemblies

✤ Define quality!

Page 108: Genome assembly: then and now — v1.1

Goals

✤ Assess 'quality' of assemblies

✤ Define quality!

✤ Produce ranking of assemblies for each species

Page 109: Genome assembly: then and now — v1.1

Goals

✤ Assess 'quality' of assemblies

✤ Define quality!

✤ Produce ranking of assemblies for each species

✤ Produce ranking of assemblers across species?

Page 110: Genome assembly: then and now — v1.1

Who did what?

Person/group Jobs

Me, Ian Korf, and Joseph Fass Perform various analyses of all assemblies

David Schwarz et al. Produce & evaluate optical maps

Jay Shendure et al. Produce Fosmid sequences !(bird & snake only)

Martin Hunt & Thomas Otto Performed REAPR analysis

Dent Earl & Benedict Paten Help with meta-analysis of final rankings

Page 111: Genome assembly: then and now — v1.1

91 co-authors!

flickr.com/photos/jamescridland/613445810

Page 112: Genome assembly: then and now — v1.1

Results!

Page 113: Genome assembly: then and now — v1.1

Lots of results!

Page 114: Genome assembly: then and now — v1.1
Page 115: Genome assembly: then and now — v1.1

102 different metrics!

Page 116: Genome assembly: then and now — v1.1

10 key metrics

Page 117: Genome assembly: then and now — v1.1

Key Metric Description

1 NG50 scaffold length

2 NG50 contig length

3 Amount of assembly in 'gene-sized' scaffolds

4 Number of 'core genes' present

5 Fosmid coverage

6 Fosmid validity

7 Short-range scaffold accuracy

8 Optical map: level 1

9 Optical map: levels 1–3

10 REAPR summary score

Page 118: Genome assembly: then and now — v1.1

Key Metric Description

1 NG50 scaffold length

2 NG50 contig length

3 Amount of assembly in 'gene-sized' scaffolds

4 Number of 'core genes' present

5 Fosmid coverage

6 Fosmid validity

7 Short-range scaffold accuracy

8 Optical map: level 1

9 Optical map: levels 1–3

10 REAPR summary score

Page 119: Genome assembly: then and now — v1.1

1) Scaffold NG50 lengths

✤ Can calculate NG50 length for each assembly!

✤ But also calculate NG60, NG70 etc.!

✤ Plot all results as a graph

Page 120: Genome assembly: then and now — v1.1

1) Scaffold NG50 lengths

Page 121: Genome assembly: then and now — v1.1

2) Contig vs scaffold NG50

Page 122: Genome assembly: then and now — v1.1

2) Contig vs scaffold NG50

Page 123: Genome assembly: then and now — v1.1

2) Contig vs scaffold NG50

Page 124: Genome assembly: then and now — v1.1

3) Gene-sized scaffolds

Page 125: Genome assembly: then and now — v1.1

3) Gene-sized scaffolds

✤ Some assembly folks get a little obsessed by length!

Page 126: Genome assembly: then and now — v1.1

3) Gene-sized scaffolds

✤ Some assembly folks get a little obsessed by length!

✤ How long is 'long enough' for a scaffold?

Page 127: Genome assembly: then and now — v1.1

3) Gene-sized scaffolds

✤ Some assembly folks get a little obsessed by length!

✤ How long is 'long enough' for a scaffold?

✤ What if you just wanted to find genes?

Page 128: Genome assembly: then and now — v1.1

3) Gene-sized scaffolds

✤ Some assembly folks get a little obsessed by length!

✤ How long is 'long enough' for a scaffold?

✤ What if you just wanted to find genes?

✤ Average vertebrate gene = ~25 Kbp

Page 129: Genome assembly: then and now — v1.1

3) Gene-sized scaffolds

Page 130: Genome assembly: then and now — v1.1

4) Core genes

Page 131: Genome assembly: then and now — v1.1

4) Core genes

✤ Used CEGMA (Core Eukaryotic Gene Mapping Approach)

Page 132: Genome assembly: then and now — v1.1

4) Core genes

✤ Used CEGMA (Core Eukaryotic Gene Mapping Approach)

✤ CEGMA uses a set of 458 'Core Eukaryotic Genes' (CEGs)

Page 133: Genome assembly: then and now — v1.1

4) Core genes

✤ Used CEGMA (Core Eukaryotic Gene Mapping Approach)

✤ CEGMA uses a set of 458 'Core Eukaryotic Genes' (CEGs)

✤ CEGs are conserved in: S. cerevisiae, S. pombe, A. thaliana, C. elegans, D. melanogaster, and H. sapiens

Page 134: Genome assembly: then and now — v1.1

4) Core genes

✤ Used CEGMA (Core Eukaryotic Gene Mapping Approach)

✤ CEGMA uses a set of 458 'Core Eukaryotic Genes' (CEGs)

✤ CEGs are conserved in: S. cerevisiae, S. pombe, A. thaliana, C. elegans, D. melanogaster, and H. sapiens

✤ How many full-length CEGs are in each assembly?

Page 135: Genome assembly: then and now — v1.1

4) Core genes

Species

Bird

Fish

Snake

Core genes (out of 458)

Best individual assembly

420

436

438

Page 136: Genome assembly: then and now — v1.1

4) Core genes

Species

Bird

Fish

Snake

Core genes (out of 458)

Best individual assembly

420

436

438

Across all assemblies

442

455

454

Page 137: Genome assembly: then and now — v1.1

4) Core genes

Page 138: Genome assembly: then and now — v1.1

ABYSS MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED BCM MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED CRACS MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED CURT MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED GAM MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVMLFYEVRKIKNVED MERAC MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED PHUS MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED RAY MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED SGA MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED SYMB MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVMLFYEVRKIKNVED SOAP MNTVLTRANSLFAFSLSVMAALTFGCFITTAFKERTVPVSIAVSRVML-------KNVED ************************************************ ***** !ABYSS FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ BCM FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ CRACS FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ CURT FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ GAM FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNNLPHTHI MERAC FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ PHUS FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ RAY FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ SGA FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ SYMB FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ SOAP FTGPGERSDLGIITFNISANILYYKHSSLFPNIFDWNVKQLFLYLSAEYSTKNN------ ****************************************************** !ABYSS ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG BCM ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG CRACS ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG CURT ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG GAM YGHALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLK------------------ MERAC ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG PHUS ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG RAY ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG SGA ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG SYMB ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG SOAP ---ALNQVVLWDKIILRGDDPNLLLKDMKSKYFFFDDGNGLKGNRNVTLTLSWNVVPNAG *************************************** !

4) Core genes

Page 139: Genome assembly: then and now — v1.1

8 & 9) Optical maps

Page 140: Genome assembly: then and now — v1.1

8 & 9) Optical maps

✤ Stretch out DNA

Page 141: Genome assembly: then and now — v1.1

8 & 9) Optical maps

✤ Stretch out DNA

✤ Cut with restriction enzymes

Page 142: Genome assembly: then and now — v1.1

8 & 9) Optical maps

✤ Stretch out DNA

✤ Cut with restriction enzymes

✤ Note lengths of fragments

Page 143: Genome assembly: then and now — v1.1

8 & 9) Optical maps

✤ Stretch out DNA

✤ Cut with restriction enzymes

✤ Note lengths of fragments

✤ Compare to in silico digest of scaffolds

Page 144: Genome assembly: then and now — v1.1

8 & 9) Optical maps

✤ Stretch out DNA

✤ Cut with restriction enzymes

✤ Note lengths of fragments

✤ Compare to in silico digest of scaffolds

✤ Not all scaffolds suitable for analysis

Page 145: Genome assembly: then and now — v1.1

8 & 9) Optical maps

Image from University of Wisconsin-Madison

Page 146: Genome assembly: then and now — v1.1

8 & 9) Optical maps

Page 147: Genome assembly: then and now — v1.1

8 & 9) Optical maps

Page 148: Genome assembly: then and now — v1.1

8 & 9) Optical maps

Page 149: Genome assembly: then and now — v1.1

What does this all mean?

Page 150: Genome assembly: then and now — v1.1

102 metrics!per assembly

10 key !metrics

1 final!ranking

Page 151: Genome assembly: then and now — v1.1

Assembly

CRACS

SYMB

PHUS

BCM

SGA

MERAC

ABYSS

SOAP

RAY

GAM

CURT

Number of !core genes

438

436

435

434

433

430

429

428

422

415

360

Page 152: Genome assembly: then and now — v1.1

Assembly

CRACS

SYMB

PHUS

BCM

SGA

MERAC

ABYSS

SOAP

RAY

GAM

CURT

Number of !core genes

438

436

435

434

433

430

429

428

422

415

360

Rank

1

2

3

4

5

6

7

8

9

10

11

Page 153: Genome assembly: then and now — v1.1

Assembly

CRACS

SYMB

PHUS

BCM

SGA

MERAC

ABYSS

SOAP

RAY

GAM

CURT

Number of !core genes

438

436

435

434

433

430

429

428

422

415

360

Rank

1

2

3

4

5

6

7

8

9

10

11

Z-score

+0.68

+0.59

+0.54

+0.49

+0.44

+0.30

+0.25

+0.21

–0.08

–0.41

–3.02

Page 154: Genome assembly: then and now — v1.1
Page 155: Genome assembly: then and now — v1.1
Page 156: Genome assembly: then and now — v1.1
Page 157: Genome assembly: then and now — v1.1
Page 158: Genome assembly: then and now — v1.1
Page 159: Genome assembly: then and now — v1.1
Page 160: Genome assembly: then and now — v1.1
Page 161: Genome assembly: then and now — v1.1
Page 162: Genome assembly: then and now — v1.1
Page 163: Genome assembly: then and now — v1.1

What does this all mean?

Page 164: Genome assembly: then and now — v1.1

No really, what does this all mean?

Page 165: Genome assembly: then and now — v1.1

Some conclusions

✤ Very hard to find assemblers that performed well across all 10 key metrics!

✤ Assemblers that perform well in one species, do not always perform as well in another!

✤ Bird & snake assemblies appear better than fish!

✤ No real 'winner' for bird and fish

Page 166: Genome assembly: then and now — v1.1

SGA — best assembler for snake?

Page 167: Genome assembly: then and now — v1.1

SGA — best assembler for snake?

Page 168: Genome assembly: then and now — v1.1

Description Rank of snake SGA assembly

NG50 scaffold length 2

NG50 contig length 5

Amount of assembly in 'gene-sized' scaffolds 7

Number of 'core genes' present 5

Fosmid coverage 2

Fosmid validity 2

Short-range scaffold accuracy 3

Optical map: level 1 2

Optical map: levels 1–3 1

REAPR summary score 2

Page 169: Genome assembly: then and now — v1.1

Description Rank of snake SGA assembly

NG50 scaffold length 2

NG50 contig length 5

Amount of assembly in 'gene-sized' scaffolds 7

Number of 'core genes' present 5

Fosmid coverage 2

Fosmid validity 2

Short-range scaffold accuracy 3

Optical map: level 1 2

Optical map: levels 1–3 1

REAPR summary score 2

Page 170: Genome assembly: then and now — v1.1

Best assembler across species?

Page 171: Genome assembly: then and now — v1.1

Best assembler across species?

Assembler Number of 1st places (out of 27)

BCM 5

Meraculous 4

Symbiose 4

Ray 3

Excluding evaluation entries

Page 172: Genome assembly: then and now — v1.1

Best assembler across species?

Assembler Number of 1st places (out of 27)

BCM 5

Meraculous 4

Symbiose 4

Ray 3

Excluding evaluation entries

Page 173: Genome assembly: then and now — v1.1

Ray performance

Species Final ranking

Bird 7th

Fish 7th

Snake 9th

Page 174: Genome assembly: then and now — v1.1
Page 175: Genome assembly: then and now — v1.1
Page 176: Genome assembly: then and now — v1.1

Assembler

BCM - evaluation

BCM - competitive

Final rank

1

2

NGS data used in

assembly

Illumina + 454

Illumina + 454 + PacBio

BCM bird assemblies

Page 177: Genome assembly: then and now — v1.1

Assembler

BCM - evaluation

BCM - competitive

Final rank

1

2

NGS data used in

assembly

Illumina + 454

Illumina + 454 + PacBio

BCM bird assemblies

Page 178: Genome assembly: then and now — v1.1

Assembler

BCM - evaluation

BCM - competitive

Final rank

1

2

NGS data used in

assembly

Illumina + 454

Illumina + 454 + PacBio

Coverage!Z-score

+2.0

–0.3

BCM bird assemblies

Page 179: Genome assembly: then and now — v1.1

Assembler

BCM - evaluation

BCM - competitive

Final rank

1

2

NGS data used in

assembly

Illumina + 454

Illumina + 454 + PacBio

Coverage!Z-score

+2.0

–0.3

Validity!Z-score

+1.4

–0.8

BCM bird assemblies

Page 180: Genome assembly: then and now — v1.1

Assembler

BCM - evaluation

BCM - competitive

Final rank

1

2

NGS data used in

assembly

Illumina + 454

Illumina + 454 + PacBio

Coverage!Z-score

+2.0

–0.3

Validity!Z-score

+1.4

–0.8

NG50 Contig Z-score

+1.5

+2.7

BCM bird assemblies

Page 181: Genome assembly: then and now — v1.1

BCM evaluation scaffold

NNNNNNNNNNNNNNNNNNN

Page 182: Genome assembly: then and now — v1.1

BCM evaluation scaffold

NNNNNNNNNNNNNNNNNNN

BCM competition scaffold

NNNNNNNNNNNNNNNNNNN

Page 183: Genome assembly: then and now — v1.1

BCM evaluation scaffold

NNNNNNNNNNNNNNNNNNN

BCM competition scaffold

NNNNNNNNNNNNNNNNNNN

PacBio sequence

Page 184: Genome assembly: then and now — v1.1

BCM evaluation scaffold

NNNNNNNNNNNNNNNNNNN

BCM competition scaffold

CGTCGNNATCNNGGTTACG

Page 185: Genome assembly: then and now — v1.1

BCM evaluation scaffold

NNNNNNNNNNNNNNNNNNN

BCM competition scaffold

CGTCGNNATCNNGGTTACG

Mismatches from PacBio sequence penalized alignment !score more than matching unknown bases

Page 186: Genome assembly: then and now — v1.1

The choice of one command-line option,!used by one tool in the calculation of one key metric...

...probably made enough difference to drop!the PacBio-containing assembly to 2nd place.

Page 187: Genome assembly: then and now — v1.1

Other conclusions

✤ Different metrics tell different stories!

✤ Heterozygosity was a big issue for bird & fish assemblies!

✤ Final rankings very sensitive to changes in metrics!

✤ N50 is a semi-useful predictor of assembly quality

Page 188: Genome assembly: then and now — v1.1
Page 189: Genome assembly: then and now — v1.1
Page 190: Genome assembly: then and now — v1.1

Inter-specific differences matter

Page 191: Genome assembly: then and now — v1.1

Inter-specific differences matter

✤ The three species have genomes with different properties !

✤ repeats!

✤ heterozygosity

Page 192: Genome assembly: then and now — v1.1

Inter-specific differences matter

✤ The three species have genomes with different properties !

✤ repeats!

✤ heterozygosity

✤ The three genomes had very different NGS data sets!

✤ Only bird had PacBio & 454 data!

✤ Different insert sizes in short-insert libraries

Page 193: Genome assembly: then and now — v1.1

The Big Conclusion

Page 194: Genome assembly: then and now — v1.1

The Big Conclusion

"You can't always get what you want"Sir Michael Jagger, 1969

Page 195: Genome assembly: then and now — v1.1

What comes next?

Page 196: Genome assembly: then and now — v1.1

What comes next?

Page 197: Genome assembly: then and now — v1.1

What comes next?

3?

Page 198: Genome assembly: then and now — v1.1

A wish list for Assemblathon 3

Page 199: Genome assembly: then and now — v1.1

A wish list for Assemblathon 3

✤ Only have 1 species

Page 200: Genome assembly: then and now — v1.1

A wish list for Assemblathon 3

✤ Only have 1 species

✤ Teams have to 'buy' resources using virtual budgets

Page 201: Genome assembly: then and now — v1.1

A wish list for Assemblathon 3

✤ Only have 1 species

✤ Teams have to 'buy' resources using virtual budgets

✤ Factor in CPU time/cost?

Page 202: Genome assembly: then and now — v1.1

A wish list for Assemblathon 3

✤ Only have 1 species

✤ Teams have to 'buy' resources using virtual budgets

✤ Factor in CPU time/cost?

✤ Agree on metrics before evaluating assemblies!

Page 203: Genome assembly: then and now — v1.1

A wish list for Assemblathon 3

✤ Only have 1 species

✤ Teams have to 'buy' resources using virtual budgets

✤ Factor in CPU time/cost?

✤ Agree on metrics before evaluating assemblies!

✤ Encourage experimental assemblies

Page 204: Genome assembly: then and now — v1.1

A wish list for Assemblathon 3

✤ Only have 1 species

✤ Teams have to 'buy' resources using virtual budgets

✤ Factor in CPU time/cost?

✤ Agree on metrics before evaluating assemblies!

✤ Encourage experimental assemblies

✤ Use new FASTG genome assembly file format

Page 205: Genome assembly: then and now — v1.1

A wish list for Assemblathon 3

✤ Only have 1 species

✤ Teams have to 'buy' resources using virtual budgets

✤ Factor in CPU time/cost?

✤ Agree on metrics before evaluating assemblies!

✤ Encourage experimental assemblies

✤ Use new FASTG genome assembly file format

✤ Get someone else to write the paper!

Page 206: Genome assembly: then and now — v1.1

Intermission

Page 207: Genome assembly: then and now — v1.1

NGS must die!

Page 208: Genome assembly: then and now — v1.1

NGS must die!

‘NGS’ is used to refer to everything post-Sanger

Page 209: Genome assembly: then and now — v1.1

NGS must die!

‘NGS’ is used to refer to everything post-Sanger

Pyrosequencing was developed ~1996

Page 210: Genome assembly: then and now — v1.1
Page 211: Genome assembly: then and now — v1.1
Page 212: Genome assembly: then and now — v1.1

NGS madness

Next generation sequencing

aka second generation sequencing

Page 213: Genome assembly: then and now — v1.1

NGS madness

Next generation sequencing

aka second generation sequencing

but there’s also:

Page 214: Genome assembly: then and now — v1.1

NGS madness

Next generation sequencing

aka second generation sequencing

but there’s also: third generation sequencing

Page 215: Genome assembly: then and now — v1.1

NGS madness

Next generation sequencing

aka second generation sequencing

but there’s also: third generation sequencing

fourth generation sequencing

Page 216: Genome assembly: then and now — v1.1

NGS madness

Next generation sequencing

aka second generation sequencing

but there’s also: third generation sequencing

fourth generation sequencing

next-next generation sequencing

Page 217: Genome assembly: then and now — v1.1

NGS madness

Next generation sequencing

aka second generation sequencing

but there’s also: third generation sequencing

fourth generation sequencing

next-next generation sequencing

next-next-next generation sequencing

Page 218: Genome assembly: then and now — v1.1

NGS madness

Technology

Complete Genomics

Ion Torrent

PacBio

Oxford Nanopore

According to some papers…

2nd generation

2nd generation

2nd generation

3rd generation

Page 219: Genome assembly: then and now — v1.1

NGS madness

Technology

Complete Genomics

Ion Torrent

PacBio

Oxford Nanopore

According to some papers…

2nd generation

2nd generation

2nd generation

3rd generation

According to other papers…

3rd generation

3rd generation

3rd generation

4th generation

Page 220: Genome assembly: then and now — v1.1

NGS madness

“PacBio is a 2.5th generation”

“Helicos lies between the transition of next-generation to third generation”

Page 221: Genome assembly: then and now — v1.1

NGS madness

There are different sequencing methodologies, !and there are different sequencing platforms.

Page 222: Genome assembly: then and now — v1.1

NGS madness

There are different sequencing methodologies, !and there are different sequencing platforms.

Use one or the other.

Page 223: Genome assembly: then and now — v1.1

NGS madness

There are different sequencing methodologies, !and there are different sequencing platforms.

Use one or the other.

Or just say ‘current sequencing technologies’.

Page 224: Genome assembly: then and now — v1.1

Intermission

Page 225: Genome assembly: then and now — v1.1

My #1 piece!of advice

flickr.com/julia_manzerova

Page 226: Genome assembly: then and now — v1.1

flickr.com/thomashawk

Page 227: Genome assembly: then and now — v1.1

flickr.com/thomashawk

Look at your data!

Page 228: Genome assembly: then and now — v1.1
Page 229: Genome assembly: then and now — v1.1
Page 230: Genome assembly: then and now — v1.1
Page 231: Genome assembly: then and now — v1.1

I looked at the shortest 10 sequences in 34 different genome assemblies…

Page 232: Genome assembly: then and now — v1.1

I looked at the shortest 10 sequences in 34 different genome assemblies…

Page 233: Genome assembly: then and now — v1.1

I looked at the shortest 10 sequences in 34 different genome assemblies…

Page 234: Genome assembly: then and now — v1.1

I looked at the shortest 10 sequences in 34 different genome assemblies…

Page 235: Genome assembly: then and now — v1.1

From a vertebrate genome assembly with 72,214 sequences…

Page 236: Genome assembly: then and now — v1.1

From a vertebrate genome assembly with 72,214 sequences…

Page 237: Genome assembly: then and now — v1.1

From a vertebrate genome assembly with 72,214 sequences…

Page 238: Genome assembly: then and now — v1.1

From a vertebrate genome assembly with 72,214 sequences…

Page 239: Genome assembly: then and now — v1.1

From a vertebrate genome assembly with 72,214 sequences…

Page 240: Genome assembly: then and now — v1.1

From a vertebrate genome assembly with 72,214 sequences…

Length of 10 shortest sequences: !100, 100, 99, 88, 87, 76, 73, 63, 12, and 3 bp!

Page 241: Genome assembly: then and now — v1.1
Page 242: Genome assembly: then and now — v1.1
Page 243: Genome assembly: then and now — v1.1

Reasons to be cheerful

flickr.com/danielygo

Page 244: Genome assembly: then and now — v1.1

Data from Lex Nederbragt’s blog, June 2014

Page 245: Genome assembly: then and now — v1.1

Data from Lex Nederbragt’s blog, June 2014

Page 246: Genome assembly: then and now — v1.1

Long-read technology

Moleculo read data from Illumina BaseSpace, July 2013

Page 247: Genome assembly: then and now — v1.1

Long-read technology

From https://flxlexblog.wordpress.com (Lex Nederbragt's blog)

PacBio!data

Page 248: Genome assembly: then and now — v1.1

Long-read technology

MinIon from Oxford Nanopore

Page 249: Genome assembly: then and now — v1.1

Long-read technology

MinIon from Oxford Nanopore

Page 250: Genome assembly: then and now — v1.1

Where is the data?

Page 251: Genome assembly: then and now — v1.1

Where is the data?

Page 252: Genome assembly: then and now — v1.1

Where is the data?

Nick Loman published the first real-world data on June 10th

Page 253: Genome assembly: then and now — v1.1
Page 254: Genome assembly: then and now — v1.1

Single chromosome assembly?

Page 255: Genome assembly: then and now — v1.1

Single chromosome assembly?

Page 256: Genome assembly: then and now — v1.1

Single chromosome assembly?

Page 257: Genome assembly: then and now — v1.1

Tackling heterozygosity

1000 Genomes project plans to sequence 15 'trios' in high-depth

Page 258: Genome assembly: then and now — v1.1

Hi-C

✤ Nature Biotechnology, 31, 2013 !

✤ Burton et al.!

✤ Selvaraj et al.!

✤ Kaplan & Dekker

Page 259: Genome assembly: then and now — v1.1

The future of genome assembly

Page 260: Genome assembly: then and now — v1.1

Kwik-E-Assembler

acgtaacacaancac gggaacnnnacatta acnactagcataata nnnnnnnnnnaacac actttaaattatatc

The future of genome assembly

Page 261: Genome assembly: then and now — v1.1

The future of genome assembly

Page 262: Genome assembly: then and now — v1.1

The future of genome assembly

✤ At some point we will look back with embarrassment at this era.

Page 263: Genome assembly: then and now — v1.1

The future of genome assembly

✤ At some point we will look back with embarrassment at this era.

✤ Assembly must, and will, get better, but...

Page 264: Genome assembly: then and now — v1.1

The future of genome assembly

✤ At some point we will look back with embarrassment at this era.

✤ Assembly must, and will, get better, but...

✤ ...'perfect' genomes may remain elusive.

Page 265: Genome assembly: then and now — v1.1

The future of genome assembly

✤ At some point we will look back with embarrassment at this era.

✤ Assembly must, and will, get better, but...

✤ ...'perfect' genomes may remain elusive.

✤ Data management will remain an issue:

Page 266: Genome assembly: then and now — v1.1

The future of genome assembly

✤ At some point we will look back with embarrassment at this era.

✤ Assembly must, and will, get better, but...

✤ ...'perfect' genomes may remain elusive.

✤ Data management will remain an issue:

✤ the human genome -> human genomes -> tissue-specific genomes

Page 267: Genome assembly: then and now — v1.1

Summary

Page 268: Genome assembly: then and now — v1.1

Summary

✤ There is no real consensus on how to make a good genome assembly

Page 269: Genome assembly: then and now — v1.1

Summary

✤ There is no real consensus on how to make a good genome assembly

✤ Try different assemblers, try different command-line options

Page 270: Genome assembly: then and now — v1.1

Summary

✤ There is no real consensus on how to make a good genome assembly

✤ Try different assemblers, try different command-line options

✤ Decide what it is you want to get out of a genome assembly

Page 271: Genome assembly: then and now — v1.1

Summary

✤ There is no real consensus on how to make a good genome assembly

✤ Try different assemblers, try different command-line options

✤ Decide what it is you want to get out of a genome assembly

✤ Look at your input and output data

Page 272: Genome assembly: then and now — v1.1

Summary

✤ There is no real consensus on how to make a good genome assembly

✤ Try different assemblers, try different command-line options

✤ Decide what it is you want to get out of a genome assembly

✤ Look at your input and output data

✤ Wait 5 years and come back, we’ll (probably) have solved everything!

Page 273: Genome assembly: then and now — v1.1

Resources

✤ Lex Nederbragt’s blog - https://flxlexblog.wordpress.com!

✤ Nick Loman’s blog - http://pathogenomics.bham.ac.uk/blog/!

✤ Assemblathon twitter feed - https://twitter.com/assemblathon