everyday de novo assembly

47
FOR RESEARCH USE ONLY. Not for use in diagnostic procedures. Everyday de novo assembly GRC Assembly Workshop at Genome Informatics Deanna M. Church Senior Director of Applications Sep 19, 2016 @deannachurch

Upload: genome-reference-consortium

Post on 17-Jan-2017

73 views

Category:

Health & Medicine


2 download

TRANSCRIPT

Page 1: Everyday de novo assembly

FORRESEARCHUSEONLY.Notforuseindiagnosticprocedures.

Everydaydenovoassembly

GRCAssemblyWorkshopatGenomeInformatics

DeannaM.ChurchSeniorDirectorofApplicationsSep19,2016

@deannachurch

Page 2: Everyday de novo assembly

2

Acknowledgements

Theentireteamat10x

DavidJaffe

NeilWeisenfeld

VijayKumar

Preyas Shah

NCBI:

FrancoiseThibaud-Nissen

ValerieSchneider

Page 3: Everyday de novo assembly

3

Disclosures

EmployeeandShareholder

Shareholder

10xGenomics

Personalis

10xGenomicsproductsdescribedareforResearchUseOnly.Notforuseindiagnosticprocedures.

Page 4: Everyday de novo assembly

4

Questionsfromtheorganizers

Arenewassembliesusingthereference?Cantheyhelpmakethereferencebetter?Dotheymakethereferenceobsolete?

Page 5: Everyday de novo assembly

5

Agenda

Whyhaven’twealwaysdonedenovogenomeanalysis?

Page 6: Everyday de novo assembly

6

Agenda

Whyhaven’twealwaysdonedenovogenomeanalysis?WhatareLinked-Reads?

Page 7: Everyday de novo assembly

7

Agenda

Whyhaven’twealwaysdonedenovogenomeanalysis?WhatareLinked-Reads?HowdoLinked-Readsenableeverydaydenovoassembly?

Page 8: Everyday de novo assembly

8

Whyhaven’twealwaysdonedenovo genomeanalysis?

Page 9: Everyday de novo assembly

9

KellyHowe,LawrenceBerkeleyLaboratory

KellyHowe,LawrenceBerkeleyLaboratory

Page 10: Everyday de novo assembly

10

ReferencequalityisHARD

DOI:10.1038/nature03001

Page 11: Everyday de novo assembly

11

Ouractualgenome:diploid

Page 12: Everyday de novo assembly

12

Howwerepresentourgenome:haploid

Page 13: Everyday de novo assembly

13

Currentapproach:averagingoverhaplotypes

Page 14: Everyday de novo assembly

14

Currentapproach:averagingoverhaplotypes

Page 15: Everyday de novo assembly

15

Currentapproach:averagingoverhaplotypes

Page 16: Everyday de novo assembly

16

Currentapproach:averagingoverhaplotypes

Page 17: Everyday de novo assembly

17

Problem:bothallelesdifferfromeachother

Page 18: Everyday de novo assembly

18

WhatareLinked-Reads?

Page 19: Everyday de novo assembly

19

Unlinked-Reads:shortrangeinformation

Page 20: Everyday de novo assembly

20

Linked-Reads:longrangeinformation

Page 21: Everyday de novo assembly

21

StartwithlongmoleculesNA19240

Page 22: Everyday de novo assembly

22

MakingLinked-Reads

P5 16bpBCR1 Nmer gDNA Insert

Page 23: Everyday de novo assembly

23

MakingLinked-Reads

Longinputmolecule

Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule

P5 16bpBCR1 Nmer gDNA Insert

Page 24: Everyday de novo assembly

24

MakingLinked-Reads

Longinputmolecule(50Kb)

Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule

P5 16bpBCR1 Nmer gDNA Insert

Longinputmolecule(50Kb)

30xsequence~35fragments~0.2xcoverage

Standardreferencebasedanalysisrecommendations

Page 25: Everyday de novo assembly

25

MakingLinked-Reads

Longinputmolecule(50Kb)

Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule

P5 16bpBCR1 Nmer gDNA Insert

Longinputmolecule(50Kb)

56xsequence~65fragments~0.4xcoverage

Supernovaanalysisrecommendations

Page 26: Everyday de novo assembly

26

SyntheticLongReads:lessphysicalcoverage

CA B

SequencingcostPhysicalcoverage

Page 27: Everyday de novo assembly

27

Linked-Reads:greaterphysicalcoverage

CA B

SequencingcostPhysicalcoverage

Page 28: Everyday de novo assembly

28

Example– MoleculevsReadCoverage

150X avgmolecule coverage

Agivengenomiclocuswillhave

150X avg moleculedepth,and30X avg readdepth

(150Xmoleculedepth)x (0.2Xread/m)=30Xreaddepth

Chr13: BRCA2

4/4/2016 Loupe

http://loupe.fuzzplex.com/loupe/view/MTk1MzgtUEhBU0VSX1NWQ0FMTEVSX1BELTEwMTMuMC4yNi5sb3VwZQ==/reads?ranges=chr13%2B32850000-chr1… 1/1

쁛 ►

>30X avgread coverage

Page 29: Everyday de novo assembly

29

GeneratingLinked-Reads

Startwith:

HMWgDNA,100Kb+molecules1.0ng inputDNA=300copiesofthegenome

0.5ngDNA=150 copiesofthegenome,partitionedinto>1MGEMs

DNA

OilBarcodedPrimerLibrary Enzyme Collect

Page 30: Everyday de novo assembly

30

HowdoLinked-Readsenableeverydaydenovoassembly?

Page 31: Everyday de novo assembly

31

Assemblymadeeasy

FASTABCL SupernovaDenovoAssembly

1200MNA19240

http://www.biorxiv.org/content/early/2016/08/19/070425

1server348Gbmemory2dayscompute

1library1.25nginput

Page 32: Everyday de novo assembly

32

Assemblymadeeasy

FASTABCL SupernovaDenovoAssembly

1200MNA192401library

1.25nginput

http://www.biorxiv.org/content/early/2016/08/19/070425

1server(28cores)348Gbmemory2dayscompute

Page 33: Everyday de novo assembly

33

Assemblymadeeasy

Measure ValueNumberof scaffolds>=10Kb 1.17 KEdgeN50 17.45KbContig N50 118.8KbPhaseblock N50 9.3MbScaffoldN50 16.4Mb

FASTABCL SupernovaDenovoAssembly

1200MNA19240

http://www.biorxiv.org/content/early/2016/08/19/070425

1server(28cores)348Gbmemory2dayscompute

1library1.25nginput

Page 34: Everyday de novo assembly

34

Performanceovermultiplesamples

http://www.biorxiv.org/content/early/2016/08/19/070425

sample ethnicity sex cov frag N50contig

N50scaffold

N50Phaseblock

gap

NA19238 YRI F 56 115 114.6 18.7 8 2.1

NA19240 YRI F 56 125 118.8 16.4 9.3 2.3

HG00733 PR F 56 106 123.6 17.8 3.4 2.0

HG00512 HAN M 56 102 113.2 15.4 2.7 2.2

NA24385 AJ M 56 120 106.4 15.1 4.2 2.6

HGP EUR M 56 139 120.2 18.6 4.5 2.5

NA12878 EUR F 56 92 118.5 16.4 2.8 2.9

Page 35: Everyday de novo assembly

35

HighqualityAssemblyatlowercoverage

102104106108110112114116118120122

500 700 900 1,100 1,300

ContigN50

(kb)

Numberofreads(millions)

0

5

10

15

20

25

500 700 900 1,100 1,300

ScaffoldN50

(Mb)

Numberofreads(millions)

0

1

2

3

4

5

500 700 900 1,100 1,300PhaseBlockN50

(Mb)

Numberofreads(millions)

Page 36: Everyday de novo assembly

36

DeNovoPerformanceDrasticallyImproveswithIncreasedDNALength

020,00040,00060,00080,000100,000120,000

0 10,000 20,000 30,000 40,000 50,000 60,000

ContigN50

0

5

10

15

20

0 10,000 20,000 30,000 40,000 50,000 60,000

ScaffoldN50

(Mb)

0100,000200,000300,000400,000500,000

0 10,000 20,000 30,000 40,000 50,000 60,000PhaseBlock

N50

DNALength

Page 37: Everyday de novo assembly

37

SupernovaAssembler

stuff

separateassembliesofhomologousloci

http://www.biorxiv.org/content/early/2016/08/19/070425

Page 38: Everyday de novo assembly

38

Assemblyarchitecture=phaseblocks

megabubble megabubble megabubble

multi-Mbphaseblocks

manyMbscaffold

microstructure• bubbles,oftenatindeterminatepoly-A• shortgaps,oftenatpoly-A

Page 39: Everyday de novo assembly

39

Assemblyassessment

Supernova10x Othermethods

0

5

10

15

20

25

NA19238 NA19240 HG00733 HG00512 NA24385 HGP NA12878 YH NA12878 NA12878 NA12878 NA24385 NA24143

PercentGRCh37100mersmissingperassembly

Missing100mershaploid Missing100mersdiploid

Diploid Haploid

Page 40: Everyday de novo assembly

40

Comparisontotruthdata

Page 41: Everyday de novo assembly

41

Improvingthereferenceassembly?

GRCh38:chr6(NC_000006.12

NA12878,hap0,scaf.21653(prev1.1)

260Kbofnewsequence

Page 42: Everyday de novo assembly

42

Bettergenotypereconstruction

chrX:6,219,000-6,220,500(GRCh38)NLGN4X(neuroligin 4,x-linked)

Page 43: Everyday de novo assembly

43chrX:6,218,359-6,221,000(GRCh38)

Bettergenotypereconstruction

Page 44: Everyday de novo assembly

44

Questionsfromtheorganizers

Arenewassembliesusingthereference?

Supernova:denovoassemblyDiploidreconstruction

NOYes

Assemblyconstruction

Assemblyanalysis

Page 45: Everyday de novo assembly

45

Questionsfromtheorganizers

Cantheyhelpmakethereferencebetter?

Yes

Supernova:individualgenomereconstructionContributingnewsequencestopopulationgraph

Page 46: Everyday de novo assembly

46

Questionsfromtheorganizers

Dotheymakethereferenceobsolete?

NO

Supernova:NotreferenceassembliesBetterindividualgenomereconstruction

Page 47: Everyday de novo assembly

47

Thanks!