everyday de novo diploid assembly

53
FOR RESEARCH USE ONLY. Not for use in diagnostic procedures. © 10x Genomics, Inc. 2016 Everyday de novo diploid assembly Deanna M. Church Oct, 2016 @deannachurch

Upload: genome-reference-consortium

Post on 17-Jan-2017

109 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Everyday de novo diploid assembly

FORRESEARCHUSEONLY.Notforuseindiagnosticprocedures.©10xGenomics,Inc.2016

Everydaydenovodiploid assembly

DeannaM.ChurchOct,2016

@deannachurch

Page 2: Everyday de novo diploid assembly

2

Disclosures

EmployeeandShareholder

Shareholder

10xGenomics

Personalis

10xGenomicsproductsdescribedareforResearchUseOnly.Notforuseindiagnosticprocedures.

Page 3: Everyday de novo diploid assembly

3

Acknowledgements

Theentireteamat10x

DavidJaffeNeilWeisenfeldVijayKumarPreyas Shah

PatrickMarks

Page 4: Everyday de novo diploid assembly

4

Agenda

•Whyhaven’twealwaysdonedenovoassemblyoneverysample?

Page 5: Everyday de novo diploid assembly

5

Agenda

•Whyhaven’twealwaysdonedenovoassemblyoneverysample?

•WhatareLinked-Reads?

Page 6: Everyday de novo diploid assembly

6

Agenda

•Whyhaven’twealwaysdonedenovoassemblyoneverysample?

•WhatareLinked-Reads?•Whatdoeseverydaydenovoassemblyenabletoday?

Page 7: Everyday de novo diploid assembly

7

Whyhaven’twealwaysdonedenovo genomeanalysis?

Page 8: Everyday de novo diploid assembly

8

Page 9: Everyday de novo diploid assembly

9

Page 10: Everyday de novo diploid assembly

10

Page 11: Everyday de novo diploid assembly

11

Currentapproach:averagingoverhaplotypes

Page 12: Everyday de novo diploid assembly

12

Averagingoverhaplotypesfailswithincreaseddiversitydoi:10.1038/nature20098

AK1

Page 13: Everyday de novo diploid assembly

13

Newtechnologyrevealsmoreinformation

Page 14: Everyday de novo diploid assembly

14

Newtechnologyrevealsmoreinformation

10.1038/nrg3933

Page 15: Everyday de novo diploid assembly

15

Publichumanassembliestodate

https://www.ncbi.nlm.nih.gov/assembly/organism/9606/latest/

Compositegenomes IndividualGenomes

Hydatidiform moles(singlehaplotypes)

• GRCh38• Celera(2)

• CHM1(9)

• CHM13(5)

• NA12878(9)• HX1• A/JSon• A/JMother

• A/JFather• NA18507• YH1• HS1011• AK1• HuRef

• Lotsoflabor• Lotsoftime• Lotsofcoverage• Lotsofmoney

Page 16: Everyday de novo diploid assembly

16

WhatareLinked-Reads?

Page 17: Everyday de novo diploid assembly

17

Unlinked-Reads:shortrangeinformation

Page 18: Everyday de novo diploid assembly

18

Linked-Reads:longrangeinformation

Page 19: Everyday de novo diploid assembly

19

StartwithlongmoleculesNA19240

Page 20: Everyday de novo diploid assembly

20

MakingLinked-Reads

P5 16bpBCR1 Nmer gDNA Insert

Page 21: Everyday de novo diploid assembly

21

MakingLinked-Reads

Longinputmolecule

Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule

P5 16bpBCR1 Nmer gDNA Insert

Page 22: Everyday de novo diploid assembly

22

MakingLinked-Reads

Longinputmolecule(50Kb)

Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule

P5 16bpBCR1 Nmer gDNA Insert

Longinputmolecule(50Kb)

30xsequence~35fragments~0.2xcoverage

Standardreferencebasedanalysisrecommendations

Page 23: Everyday de novo diploid assembly

23

MakingLinked-Reads

Longinputmolecule(50Kb)

Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule

P5 16bpBCR1 Nmer gDNA Insert

Longinputmolecule(50Kb)

56xsequence~65fragments~0.4xcoverage

Supernovaanalysisrecommendations

Page 24: Everyday de novo diploid assembly

24

SyntheticLongReads:lessphysicalcoverage

CA B

SequencingcostPhysicalcoverage

Page 25: Everyday de novo diploid assembly

25

Linked-Reads:greaterphysicalcoverage

CA B

SequencingcostPhysicalcoverage

Page 26: Everyday de novo diploid assembly

26

Linked-Readsallowforincreasedphysicalcoverage

150X avg physicalcoverage

Chr13: BRCA2

4/4/2016 Loupe

http://loupe.fuzzplex.com/loupe/view/MTk1MzgtUEhBU0VSX1NWQ0FMTEVSX1BELTEwMTMuMC4yNi5sb3VwZQ==/reads?ranges=chr13%2B32850000-chr1… 1/1

쁛 ►

>56X avgread coverage(assembly)

Page 27: Everyday de novo diploid assembly

27

GeneratingLinked-Reads

Startwith:

HMWgDNA,100Kb+molecules1.0ng inputDNA=300copiesofthegenome

0.5ngDNA=150 copiesofthegenome,partitionedinto>1MGEMs

DNA

OilBarcodedPrimerLibrary Enzyme Collect

Page 28: Everyday de novo diploid assembly

28

Assemblymadeeasy

FASTABCL SupernovaDenovoAssembly

1200MNA19240

http://www.biorxiv.org/content/early/2016/08/19/070425

1server348Gbmemory2dayscompute

1library1nginput

Page 29: Everyday de novo diploid assembly

29

Assemblymadeeasy

FASTABCL SupernovaDenovoAssembly

1200MNA192401library

1nginput

http://www.biorxiv.org/content/early/2016/08/19/070425

1server(28cores)348Gbmemory2dayscompute

Page 30: Everyday de novo diploid assembly

30

Assemblymadeeasy

FASTABCL SupernovaDenovoAssembly

1200MNA192401library

1nginput

http://www.biorxiv.org/content/early/2016/08/19/070425

1server(28cores)348Gbmemory2dayscompute

megabubble megabubble megabubble

Page 31: Everyday de novo diploid assembly

31

Performanceovermultiplehumansamples

http://www.biorxiv.org/content/early/2016/08/19/070425

sample ethnicity sex cov frag

N50contig(Kb)

N50scaffold(Mb)

N50Phaseblock(Mb)

Gap(%)

NA19238 YRI F 56 115 114.6 18.7 8 2.1

NA19240 YRI F 56 125 118.8 16.4 9.3 2.3

HG00733 PR F 56 106 123.6 17.8 3.4 2.0

HG00512 HAN M 56 102 113.2 15.4 2.7 2.2

NA24385 AJ M 56 120 106.4 15.1 4.2 2.6

HGP EUR M 56 139 120.2 18.6 4.5 2.5

NA12878 EUR F 56 92 118.5 16.4 2.8 2.9

Page 32: Everyday de novo diploid assembly

32

HighqualityAssemblyatlowercoverage

102104106108110112114116118120122

500 700 900 1,100 1,300

ContigN50

(kb)

Numberofreads(millions)

0

5

10

15

20

25

500 700 900 1,100 1,300

ScaffoldN50

(Mb)

Numberofreads(millions)

00.51

1.52

2.53

3.54

4.55

500 700 900 1,100 1,300

PhaseBlockN50

(Mb)

Numberofreads(millions)

Page 33: Everyday de novo diploid assembly

33

DeNovoPerformanceDrasticallyImproveswithIncreasedDNALength

020,00040,00060,00080,000100,000120,000

0 10,000 20,000 30,000 40,000 50,000 60,000

ContigN50

0

5

10

15

20

0 10,000 20,000 30,000 40,000 50,000 60,000

ScaffoldN50

(Mb)

0100,000200,000300,000400,000500,000

0 10,000 20,000 30,000 40,000 50,000 60,000PhaseBlock

N50

DNALength

Page 34: Everyday de novo diploid assembly

34

Comparisontotruthdata

Page 35: Everyday de novo diploid assembly

35

Assemblyassessment

Supernova10x Othermethods

0

5

10

15

20

25

NA19238 NA19240 HG00733 HG00512 NA24385 HGP NA12878 YH NA12878 NA12878 NA12878 NA24385 NA24143

PercentGRCh37100mersmissingperassembly

Missing100mershaploid Missing100mersdiploid

Diploid Haploid

Page 36: Everyday de novo diploid assembly

36

Whatdoeseverydaydenovoassemblyenable?

Page 37: Everyday de novo diploid assembly

37

Ideal:Completegenomeinformation

doi:10.1038/nature09534

• SNVs• Deletions• Insertions• Inversions• Translocations

Page 38: Everyday de novo diploid assembly

38

Areasinwhichassemblyexcels:diverseregions

AluY

Supernova(denovo)

PacBio Reads

IlluminaReads

Page 39: Everyday de novo diploid assembly

39

Areasinwhichassemblyexcels:insertions

Supernova(denovo)

PacBioReads

IlluminaReads

Page 40: Everyday de novo diploid assembly

40

Areasinwhichassemblyexcels:insertions

Page 41: Everyday de novo diploid assembly

41

Areasinwhichassemblyexcels:insertions

SHANK2

GRCh37:chr11

GRCh37.p13:chr11_fix_patch

Page 42: Everyday de novo diploid assembly

42

Areasinwhichassemblyexcels:insertions

SHANK2

GRCh37:chr11

GRCh37.p13:chr11_fix_patch35kb

Page 43: Everyday de novo diploid assembly

43

Areasinwhichassemblyexcels:insertions

Hap1_scaffold7938

Hap2_scaffold7939

chr11

SHANK2

Page 44: Everyday de novo diploid assembly

44

Areasinwhichassemblyexcels:insertions

Hap1_scaffold7938

Hap2_scaffold7939

chr11

SHANK2

Page 45: Everyday de novo diploid assembly

45

Areasinwhichassemblyexcels:insertions

Hap1_scaffold7938

Hap1_scaffold7939

chr11

SHANK2

chr11

Hap2_scaffold7939

SHANK2

Hap1_scaffold7938

Page 46: Everyday de novo diploid assembly

46

Assemblyanalysis:alignmentworkneeded

SHANK2

Supernova(denovo)

PacBio Reads

IlluminaReads

Page 47: Everyday de novo diploid assembly

47

Areasinwhichassemblyexcels:inversions

GRCh37 chrX:6137041-6138541 (NLGN4X)

Supernova(denovo)

PacBio Reads

IlluminaReads

Page 48: Everyday de novo diploid assembly

48

Assemblyanalysis:alignmentworkneeded

GRCh37 chrX:6137041-6138541 (NLGN4X)

Hap1_scaffold5127

Hap2_scaffold5128

Page 49: Everyday de novo diploid assembly

49

Fasta isalossy format

megabubble megabubble megabubble

multi-Mbphaseblocks

manyMbscaffold

microstructure• bubbles,oftenatindeterminatepoly-A• shortgaps,oftenatpoly-A

Page 50: Everyday de novo diploid assembly

50

Nativeformatshavemoreinformation

Supernova(denovo)

LongRangerReferencebased

Page 51: Everyday de novo diploid assembly

51

Nativeformatshavemoreinformation

Supernova(denovo)

LongRangerReferencebased

Page 52: Everyday de novo diploid assembly

52

Nativeformatshavemoreinformation

Supernova(denovo)

LongRangerReferencebased

Page 53: Everyday de novo diploid assembly

53

Conclusions

•Routine,denovo,diploidassemblyof1000sofsamplesispossibletoday!

•Earlyuseswillbeforbetterresolutionofdivergentregionsandnovelsequence

•Anewgenerationoftoolsneedstobedevelopedtofullyutilizeassemblydata