reference genome enabled variant discovery in octoploid...

1
The absence of a reference genome sequence for (Fragaria × ananassa Duch.), an allo- octoploid (2n=8x=56), previously hindered the application of genotyping-by-sequencing (GBS) and other next generation sequencing (NGS)-based DNA variant discovery. An octoploid reference genome assembly drastically simplifies data analysis and provides a technical framework for the rigorous discovery of markers using short-read sequencing technologies. The ability to determine the sub-genome specificity of short-reads needs to be investigated to enable the exploitation of NGS-based high-throughput genotyping approaches in octoploid strawberry. Here, we employ the double enzyme digest protocol put forward by Poland et al (2012). Introduction Objectives 1. Develop a custom bioinformatics pipeline for GBS-facilitated DNA variant discovery in octoploid strawberry. 2. Utilizing a custom bioinformatics pipeline (8X-GBS), quantify the genomic distribution and sub-genome specificity of DNA variants mapped to an octoploid reference genome from sequenced genomic libraries prepared with either HindIII-MspI and PstI-MspI. 1 Department of Plant Sciences, University of California, Davis, Davis, CA, 2 Department of Horticulture, Michigan State University, East Lansing, MI Mitchell J. Feldmann 1 , Michael A. Hardigan 1 , Thomas J. Poorten 1 , Charlotte B. Acharya 1 , Marivi Colle 2 , Patrick P. Edger 2 , Robert VanBuren 2 , and Steven J. Knapp 1 Reference Genome Enabled Variant Discovery in Octoploid Strawberry HindIII - MspI PstI - MspI 1. Plate DNA with adapter pairs 2. Digest DNA with RE pair 3. Ligate adapters 4. Pool DNAs & Clean Up 5. Perform PCR 6. Clean up PCR 1, 2, 3 4, 5, 6 PCR Primers 1 & 2 Category HindIII - MspI PstI - MspI Total Reads 3,906,096 per sample 3,798,060 per sample Mapped Reads 3,753,634 (96.1% of total) 3,730,571 (98.2% of total) Uniquely Mapped Reads (MAPQ30) 2,023,367 (51.8% of total) 1,859,605 (48.9% of total) Unique Sites 2,362,556 1,591,764 Raw SNPs 988,879 436,903 Filtered SNPs 415,048 (41.9% of raw) 176,626 (40.4% of raw) Average Depth per Site 294x 507x HindIII-MspI PstI-MspI Methods: Library Preparation & Bioinformatics Results Conclusions 1. Both libraries had equivalent fragment length distributions, total reads, and uniquely mapped reads per individual. However, SNP coverage per site from HindIII-MspI appears to be much more even across all chromosomes. 2.8X-GBS pipeline resulted in~175k & 415k genome-wide SNPs with sub-genome specificity for PstI-MspI and HindII-MspI, respectively. 3.For variant discovery against in octoploid strawberry in this study, HindIII-MspI is the preferred enzymatic pair. References: (1) Poland et al. PLoS One (2012). (2) Elshire et al. PLoS One (2010). DNA from 30 samples each Pooled DNA Libraries PCR-amplified DNA Libraries Fig 1. (A) Library prep adapted from Elshire et al (2010). (B) Bioinformatics toolset used in this study. Fig 2. Fragment length distribution for individual libraries; HindIII-MspI & PstI-MspI Table 1. Read count, variant count, and sequencing depth. Fig 3. Variant distribution, coverage, and quality: HindIII-MspI Fig 4. Variant distribution, coverage, and quality: PstI-MspI SNP count SNP Quality Site Coverage A. B. Position Position SNP count SNP Quality Site Coverage Density Density Acknowledgments: This work is supported by the Specialty Crops Research Initiative (#2017-51181-26833) from the USDA National Institute of Food and Agriculture. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the U.S. Department of Agriculture. University of California, Davis

Upload: others

Post on 12-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reference Genome Enabled Variant Discovery in Octoploid ...knapp.plantsciences.ucdavis.edu/wp-content/uploads/...Mitchell J. Feldmann1, Michael A. Hardigan1, ThomasJ. Poorten1, Charlotte

The absence of a reference genome sequence for (Fragaria × ananassa Duch.), an allo-octoploid (2n=8x=56), previously hindered the application of genotyping-by-sequencing(GBS) and other next generation sequencing (NGS)-based DNA variant discovery. Anoctoploid reference genome assembly drastically simplifies data analysis and provides atechnical framework for the rigorous discovery of markers using short-read sequencingtechnologies. The ability to determine the sub-genome specificity of short-reads needs tobe investigated to enable the exploitation of NGS-based high-throughput genotypingapproaches in octoploid strawberry. Here, we employ the double enzyme digest protocolput forward by Poland et al (2012).

Introduction Objectives

1. Develop a custom bioinformatics pipeline for GBS-facilitated DNAvariant discovery in octoploid strawberry.

2. Utilizing a custom bioinformatics pipeline (8X-GBS), quantify thegenomic distribution and sub-genome specificity of DNA variantsmapped to an octoploid reference genome from sequenced genomiclibraries prepared with either HindIII-MspI and PstI-MspI.

1Department of Plant Sciences, University of California, Davis, Davis, CA, 2Department of Horticulture, Michigan State University, East Lansing, MI

Mitchell J. Feldmann1, Michael A. Hardigan1, Thomas J. Poorten1, Charlotte B. Acharya1, Marivi Colle2, Patrick P. Edger2, Robert VanBuren2, and Steven J. Knapp1

Reference Genome Enabled Variant Discovery in Octoploid Strawberry

Figure 1. GBS adapters, PCR and sequencing primers. (a) Sequences of double-stranded barcode and common adapters. Adapters are shownligated to ApeKI-cut genomic DNA. Positions of the barcode sequence and ApeKI overhangs are shown relative to the insert DNA; (b) Sequences ofPCR primer 1 and paired end sequencing primer 1 (PE-1). Binding sites for flowcell oligonucleotide 1 and barcode adapter are indicated; (c)Sequences of PCR primer 2 and paired end sequencing primer 2 (PE-2). Binding sites for flowcell oligonucleotide 2 and common adapter areindicated.doi:10.1371/journal.pone.0019379.g001

Figure 2. Steps in GBS library construction. Note: Up to 96 DNA samples can be processed simultaneously. (1) DNA samples, barcode, andcommon adapter pairs are plated and dried; (2–3) samples are then digested with ApeKI and adapters are ligated to the ends of genomic DNAfragments; (4) T4 ligase is inactivated by heating and an aliquot of each sample is pooled and applied to a size exclusion column to remove unreactedadapters; (5) appropriate primers with binding sites on the ligated adapters are added and PCR is performed to increase the fragment pool; (6–7) PCRproducts are cleaned up and fragment sizes of the resulting library are checked on a DNA analyzer(BioRad ExperionH or similar instrument). Librarieswithout adapter dimers are retained for DNA sequencing.doi:10.1371/journal.pone.0019379.g002

Genotyping Approach for High Diversity Species

PLoS ONE | www.plosone.org 3 May 2011 | Volume 6 | Issue 5 | e19379

Figure 1. GBS adapters, PCR and sequencing primers. (a) Sequences of double-stranded barcode and common adapters. Adapters are shownligated to ApeKI-cut genomic DNA. Positions of the barcode sequence and ApeKI overhangs are shown relative to the insert DNA; (b) Sequences ofPCR primer 1 and paired end sequencing primer 1 (PE-1). Binding sites for flowcell oligonucleotide 1 and barcode adapter are indicated; (c)Sequences of PCR primer 2 and paired end sequencing primer 2 (PE-2). Binding sites for flowcell oligonucleotide 2 and common adapter areindicated.doi:10.1371/journal.pone.0019379.g001

Figure 2. Steps in GBS library construction. Note: Up to 96 DNA samples can be processed simultaneously. (1) DNA samples, barcode, andcommon adapter pairs are plated and dried; (2–3) samples are then digested with ApeKI and adapters are ligated to the ends of genomic DNAfragments; (4) T4 ligase is inactivated by heating and an aliquot of each sample is pooled and applied to a size exclusion column to remove unreactedadapters; (5) appropriate primers with binding sites on the ligated adapters are added and PCR is performed to increase the fragment pool; (6–7) PCRproducts are cleaned up and fragment sizes of the resulting library are checked on a DNA analyzer(BioRad ExperionH or similar instrument). Librarieswithout adapter dimers are retained for DNA sequencing.doi:10.1371/journal.pone.0019379.g002

Genotyping Approach for High Diversity Species

PLoS ONE | www.plosone.org 3 May 2011 | Volume 6 | Issue 5 | e19379

HindIII-MspI

Figure 1. GBS adapters, PCR and sequencing primers. (a) Sequences of double-stranded barcode and common adapters. Adapters are shownligated to ApeKI-cut genomic DNA. Positions of the barcode sequence and ApeKI overhangs are shown relative to the insert DNA; (b) Sequences ofPCR primer 1 and paired end sequencing primer 1 (PE-1). Binding sites for flowcell oligonucleotide 1 and barcode adapter are indicated; (c)Sequences of PCR primer 2 and paired end sequencing primer 2 (PE-2). Binding sites for flowcell oligonucleotide 2 and common adapter areindicated.doi:10.1371/journal.pone.0019379.g001

Figure 2. Steps in GBS library construction. Note: Up to 96 DNA samples can be processed simultaneously. (1) DNA samples, barcode, andcommon adapter pairs are plated and dried; (2–3) samples are then digested with ApeKI and adapters are ligated to the ends of genomic DNAfragments; (4) T4 ligase is inactivated by heating and an aliquot of each sample is pooled and applied to a size exclusion column to remove unreactedadapters; (5) appropriate primers with binding sites on the ligated adapters are added and PCR is performed to increase the fragment pool; (6–7) PCRproducts are cleaned up and fragment sizes of the resulting library are checked on a DNA analyzer(BioRad ExperionH or similar instrument). Librarieswithout adapter dimers are retained for DNA sequencing.doi:10.1371/journal.pone.0019379.g002

Genotyping Approach for High Diversity Species

PLoS ONE | www.plosone.org 3 May 2011 | Volume 6 | Issue 5 | e19379

Figure 1. GBS adapters, PCR and sequencing primers. (a) Sequences of double-stranded barcode and common adapters. Adapters are shownligated to ApeKI-cut genomic DNA. Positions of the barcode sequence and ApeKI overhangs are shown relative to the insert DNA; (b) Sequences ofPCR primer 1 and paired end sequencing primer 1 (PE-1). Binding sites for flowcell oligonucleotide 1 and barcode adapter are indicated; (c)Sequences of PCR primer 2 and paired end sequencing primer 2 (PE-2). Binding sites for flowcell oligonucleotide 2 and common adapter areindicated.doi:10.1371/journal.pone.0019379.g001

Figure 2. Steps in GBS library construction. Note: Up to 96 DNA samples can be processed simultaneously. (1) DNA samples, barcode, andcommon adapter pairs are plated and dried; (2–3) samples are then digested with ApeKI and adapters are ligated to the ends of genomic DNAfragments; (4) T4 ligase is inactivated by heating and an aliquot of each sample is pooled and applied to a size exclusion column to remove unreactedadapters; (5) appropriate primers with binding sites on the ligated adapters are added and PCR is performed to increase the fragment pool; (6–7) PCRproducts are cleaned up and fragment sizes of the resulting library are checked on a DNA analyzer(BioRad ExperionH or similar instrument). Librarieswithout adapter dimers are retained for DNA sequencing.doi:10.1371/journal.pone.0019379.g002

Genotyping Approach for High Diversity Species

PLoS ONE | www.plosone.org 3 May 2011 | Volume 6 | Issue 5 | e19379

Figure 1. GBS adapters, PCR and sequencing primers. (a) Sequences of double-stranded barcode and common adapters. Adapters are shownligated to ApeKI-cut genomic DNA. Positions of the barcode sequence and ApeKI overhangs are shown relative to the insert DNA; (b) Sequences ofPCR primer 1 and paired end sequencing primer 1 (PE-1). Binding sites for flowcell oligonucleotide 1 and barcode adapter are indicated; (c)Sequences of PCR primer 2 and paired end sequencing primer 2 (PE-2). Binding sites for flowcell oligonucleotide 2 and common adapter areindicated.doi:10.1371/journal.pone.0019379.g001

Figure 2. Steps in GBS library construction. Note: Up to 96 DNA samples can be processed simultaneously. (1) DNA samples, barcode, andcommon adapter pairs are plated and dried; (2–3) samples are then digested with ApeKI and adapters are ligated to the ends of genomic DNAfragments; (4) T4 ligase is inactivated by heating and an aliquot of each sample is pooled and applied to a size exclusion column to remove unreactedadapters; (5) appropriate primers with binding sites on the ligated adapters are added and PCR is performed to increase the fragment pool; (6–7) PCRproducts are cleaned up and fragment sizes of the resulting library are checked on a DNA analyzer(BioRad ExperionH or similar instrument). Librarieswithout adapter dimers are retained for DNA sequencing.doi:10.1371/journal.pone.0019379.g002

Genotyping Approach for High Diversity Species

PLoS ONE | www.plosone.org 3 May 2011 | Volume 6 | Issue 5 | e19379

Figure1.GBSadapters,PCRandsequencingprimers.(a)Sequencesofdouble-strandedbarcodeandcommonadapters.AdaptersareshownligatedtoApeKI-cutgenomicDNA.PositionsofthebarcodesequenceandApeKIoverhangsareshownrelativetotheinsertDNA;(b)SequencesofPCRprimer1andpairedendsequencingprimer1(PE-1).Bindingsitesforflowcelloligonucleotide1andbarcodeadapterareindicated;(c)SequencesofPCRprimer2andpairedendsequencingprimer2(PE-2).Bindingsitesforflowcelloligonucleotide2andcommonadapterareindicated.doi:10.1371/journal.pone.0019379.g001

Figure2.StepsinGBSlibraryconstruction.Note:Upto96DNAsamplescanbeprocessedsimultaneously.(1)DNAsamples,barcode,andcommonadapterpairsareplatedanddried;(2–3)samplesarethendigestedwithApeKIandadaptersareligatedtotheendsofgenomicDNAfragments;(4)T4ligaseisinactivatedbyheatingandanaliquotofeachsampleispooledandappliedtoasizeexclusioncolumntoremoveunreactedadapters;(5)appropriateprimerswithbindingsitesontheligatedadaptersareaddedandPCRisperformedtoincreasethefragmentpool;(6–7)PCRproductsarecleanedupandfragmentsizesoftheresultinglibraryarecheckedonaDNAanalyzer(BioRadExperionHorsimilarinstrument).LibrarieswithoutadapterdimersareretainedforDNAsequencing.doi:10.1371/journal.pone.0019379.g002

GenotypingApproachforHighDiversitySpecies

PLoSONE|www.plosone.org3May2011|Volume6|Issue5|e19379

PstI-MspI

1. Plate DNA with adapter pairs2. Digest DNA with RE pair3. Ligate adapters4. Pool DNAs & Clean Up5. Perform PCR6. Clean up PCR

1, 2, 3 4, 5, 6

PCR Primers 1 & 2

Category HindIII - MspI PstI - MspI

Total Reads 3,906,096 per sample 3,798,060 per sample

Mapped Reads 3,753,634 (96.1% of total) 3,730,571 (98.2% of total)

Uniquely Mapped Reads (MAPQ≥30)

2,023,367 (51.8% of total) 1,859,605 (48.9% of total)

Unique Sites 2,362,556 1,591,764

Raw SNPs 988,879 436,903

Filtered SNPs 415,048 (41.9% of raw) 176,626 (40.4% of raw)

Average Depth per Site 294x 507x

HindIII-MspI

PstI-MspI

Methods: Library Preparation & Bioinformatics

Results

Conclusions1. Both libraries had equivalent fragment length distributions, total reads, and uniquely mapped reads per individual. However, SNP

coverage per site from HindIII-MspI appears to be much more even across all chromosomes.2. 8X-GBS pipeline resulted in~175k & 415k genome-wide SNPs with sub-genome specificity for PstI-MspI and HindII-MspI, respectively.3. For variant discovery against in octoploid strawberry in this study, HindIII-MspI is the preferred enzymatic pair.

References: (1) Poland et al. PLoS One (2012). (2) Elshire et al. PLoS One (2010).

DNA from 30 samples each

Pooled DNA Libraries

PCR-amplified DNA Libraries

Fig 1. (A) Library prep adapted from Elshire et al (2010). (B) Bioinformatics toolset used in this study.

Fig 2. Fragment length distribution for individual libraries; HindIII-MspI & PstI-MspI Table 1. Read count, variant count, and sequencing depth.

Fig 3. Variant distribution, coverage, and quality: HindIII-MspI Fig 4. Variant distribution, coverage, and quality: PstI-MspISNP count SNP Quality Site Coverage

A. B.

Position PositionSNP count SNP Quality Site Coverage

Den

sity

Den

sity

Acknowledgments: This work is supported by the Specialty Crops Research Initiative (#2017-51181-26833) from the USDA National Institute of Food and Agriculture. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the U.S. Department of Agriculture.

University of California, Davis