snp discovery from amphidiploid species and ... · • m2p = 1*g • snp score = 2 3. no conflict...

51
1 SNP discovery from amphidiploid species and transferability across the Brassicaceae Jacqueline Batley University of Queensland, Australia [email protected]

Upload: others

Post on 07-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    SNP discovery from amphidiploid species and transferability across

    the Brassicaceae

    Jacqueline BatleyUniversity of Queensland, Australia

    [email protected]

  • 2

    Outline

    • Objectives• Brassicas• Genome Sequencing• SNP discovery• SNP validation• Cross species transferability• Application• Future work

  • Objectives

    • Development of bioinformatics tool for SNP discovery and annotation

    • Establish cost effective discovery and validation of SNPs within the amphidiploid B. napus

    • Assess association of SNPs with genes for agronomic traits

    • Assess the extent of LD within B. napus• Assess genetic diversity of important agronomic genes

    within cultivated Brassica spp. and wild relatives• Establish a strategy for SNP discovery from other large

    and complex genomes.

    3

  • Methodology

    • Paired end sequence from parents of mapping populations

    • SNP discovery• Genotyping using golden gate and infiunium

    assays• SNPs genetically and physically mapped• Cross species amplification to other

    Brassicaceae members

    4

  • Brassicas

  • Diversity genomics

    • Characterising genomic and phenotypic diversity in cultivated and wild plant species and their pathogens� Brassicaceae, Leptosphaeria maculans

    • Investigating genetic variation in crops and wild relatives

    • Investigating the evolution of plant pathogen interactions

    • Identifying novel genes and genetic markers for traits of interest, such as disease resistance

    6

  • 7

    Genetic diversity

    • Germplasm collections are valuable gene pools • Assessing genetic and genomic diversity within these

    collections:� assign lines and populations to diverse groups� study the evolutionary history of wild relatives� verify pedigrees and fill in the gaps in incomplete pedigree or

    selection history� monitor changes in allele frequencies in cultivars or populations� help narrow the search for new alleles at loci of interest.

  • 8

    Domestication bottlenecks

    • B. napus canola, B.juncea mustard and B. carinata are allopolyploids.

    • Rare natural polyploids only incorporate a limited genetic diversity from progenitor diploids.

    • Wide genetic diversity in B. rapa, B. nigra, B. oleracea progenitors and wild relatives, options to enhance canola and mustard.

    • A range of strategies is available to realise the genetic potential of the Brassicaceae.

  • Sequence data

    • Illumina GAIIx and Hi-Seq data for:• 8 B. napus cultivars• 2 B. rapa cultivars• B. oleracea• 3 Brassicaceae

    • Funding for 100+ Brassicas

    9

  • 10

    Brassica genome sequencing

    • B. rapa ssp. Pekinensis var. Chiifu• 10 chromosomes, ~550 Mbp• Multinational Brassica genome sequencing

    committee originally agreed BAC by BAC sequencing approach

    • >100,000 BAC end sequences• >600 BACs sequenced• Genome sequenced using Illumina GAIIx

  • B. rapa SNP discovery and genotyping

    • Illumina paired end sequence from parents of mapping populations

    • SNP discovery• Genotyping using golden gate• Physical mapping• Cross species amplification to other

    Brassicaceae members

    11

  • SNP validation

  • Genotyping

    • Illumina Golden gate system• 384 SNPs

    • 2 B. rapa mapping populations• Parents of B. napus mapping populations• Selection of wild Brassicaceae

    13

  • SNP Validation

    SNP Pool 3Lenient Criteria

    SNP Pool 2Less strict

    Criteria

    SNP Pool 1Strictest Criteria

    ~ 320SNPs

    ~ 50SNPs ~ 15

    SNPs

    GoldenGate Oligo Pool

  • SNP Validation

    SNP Pool 3Lenient Criteria

    SNP Pool 2Less strict

    Criteria

    SNP Pool 1Strictest filtering Criteria

    94% conversion

    80% conversion

    30% conversion

  • SNP Genotyping

    16

  • SNP Genotyping

    17

  • 18

    Genetic diversity

    • Assess relationships within the Brassicaceae• Correlate this with morphological and inter-

    specific hybridisation data

  • Brassicaceae diversity

  • Brassicaceae diversity

  • B. napus SNP discovery

    • Custom algorithm developed for SNP discovery from Illumina data for amphidiploid species

    • Distinguish between inter and intra genomic SNPs

  • 22

    The SGSautoSNP algorithm

    • We do not consider the reference in SNP discovery• the reference is only used to bring the reads together• SNPs are called from these reads

    => different to most other SNP callers

    1. coverage must be at least 42. SNP score must be at least 2

    – Example:• SP1 = 6*A• AP1 = 1*G• M2P = 1*G• SNP score = 2

    3. no conflict within a variety– i.e. all bases in each cultivar must be the same– if e.g. Junior 3 * A and 1 * T => conflict

  • 23

    Output visualisation

  • B. napus SNP discovery

    • Custom algorithm developed for SNP discovery from Illumina data for amphidiploid species

    • Distinguish between inter and intra genomic SNPsXA_0011r 1252 1252 3 S=G=2;M1=G=3;Sr=X=0;A=G=3;J=T=3;M2=G=1;Bn=X=0;E=X=0; T;G;XA_0011r 1379 1379 5 S=T=2;M1=T=3;Sr=X=0;A=T=1;J=C=3;M2=X=0;Bn=X=0;E=C=2; C;T;XA_0011r 2036 2036 4 S=G=1;M1=G=2;Sr=X=0;A=G=1;J=T=8;M2=T=3;Bn=X=0;E=T=6; T;G;XA_0011r 4921 4921 2 S=X=0;M1=X=0;Sr=X=0;A=T=8;J=X=0;M2=X=0;Bn=X=0;E=C=2; C;T;XA_0011r 5070 5070 4 S=X=0;M1=G=2;Sr=X=0;A=G=2;J=A=6;M2=X=0;Bn=X=0;E=X=0; A;G;XA_0011r 5273 5273 3 S=C=4;M1=C=5;Sr=X=0;A=C=6;J=G=2;M2=X=0;Bn=X=0;E=G=1; C;G;XA_0011r 5442 5442 8 S=T=1;M1=X=0;Sr=X=0;A=T=7;J=C=5;M2=X=0;Bn=C=1;E=C=3; C;T;XA_0011r 5512 5512 7 S=G=3;M1=G=3;Sr=X=0;A=G=5;J=A=4;M2=X=0;Bn=A=2;E=A=1; A;G;XA_0011r 5976 5976 11 S=T=8;M1=T=1;Sr=X=0;A=T=2;J=C=6;M2=X=0;Bn=C=2;E=C=3; C;T;XA_0011r 5992 5992 10 S=A=9;M1=A=1;Sr=X=0;A=A=3;J=G=5;M2=X=0;Bn=G=2;E=G=3; A;G;

  • B. napus SNP discovery

    Base Change Type Number

    A>G transition105045

    C>T transition105513

    A>C transversion42480

    A>T transversion49287

    C>G transversion29828

    G>T transversion42217

  • B. napus SNP discovery

    Base Change Type Number

    A>G transition105045

    C>T transition105513

    A>C transversion42480

    A>T transversion49287

    C>G transversion29828

    G>T transversion42217

    Base Change Type Number

    A>G transition 24207

    C>T transition 24375

    A>C transversion 10158

    A>T transversion 12254

    C>G transversion 6621

    G>T transversion 9918

  • B. napus SNP density

    0

    5

    10

    15

    20

    25

    30

    0 100000 200000 300000 400000 500000 600000 700000

    Series1

  • B. napus SNP validation

    • 24/25 SNPs correctly predicted through validation by PCR and sequencing

    • 20/22 SNPs correctly predicted through Golden gate

    • Range of sequence coverage and confidence scores

    28

  • 29

    Gene discovery

    • Finding the genes for the traits

    • Integration of genetic data with genomic data• Mapping of QTL regions to genomic data

    ...Annotation

  • 30

    Gene discovery - application

    Genetic map

    Physical map

    Physical scaffolds

    Na12 E11 BRAS023BRMS040

    BRMS005

    CB10278 BRMS075

    KBRH143H15

    BRMS036Na12 A02

    RA2 A05

    1Mbp

    CB10439

    10 cMOI09 A06

  • Scaffold and Marker Assembly

    Chromosome MarkerScaffoldA7

  • CMap3D

    32Duran et al. (2010) Bioinformatics 26: 273-274

  • Identification of Candidate Blackleg Resistance Genes

    TNL (Gene number) Scaffold1 32 33 34 35 36 37 128 129 1210 1211 312 313 314 315 1216 1217 1918 1919 1920 1921 19

  • TNL6 Sequence and Protein Alignment

    B. rapaB. napus

    1B. napus

    2

    B. rapaB. napus

    1B. napus

    2

    B. rapaB. napus

    1B. napus

    2

  • Gene Mutation Species Predicted Number of Reads Sequence Verified

    TNL 1 18,240Reference: B. rapa G N/A

    B. napus 1 G 3 G

    B. napus 2 C 1 C

    TNL5 5,208,963Reference: B. rapa C N/A

    B. napus 1 C 1 C

    B. napus 2 T 4 T

    TNL 5 5,209,056Reference: B. rapa A N/A

    B. napus 1 G 1 G

    B. napus 2 A 5 A

    TNL5 5,209,772Reference: B. rapa A N/A

    B. napus 1 A 1 A

    B. napus 2 T 6 T

    TNL5 5,207,023Reference: B. rapa G N/A

    B. napus 1 T 4 T

    B. napus 2 G 1 G

    TNL 6 5,891,882Reference: B. rapa T N/A

    B. napus 1 T 4 T

    B. napus 2 C 3 C

  • Change in charge was the most common change due to protein differences

  • 37

  • Gene discovery

    Gene/EST

    Primer

    genomic sequence

    PCR

    Known Unknown(Arabidopsis) (Brassica)

  • http://flora.acpfg.com.au/tagdb/

    39

    http://flora.acpfg.com.au/tagdb

    Marshall, D.J., et al. (2010) Plant Methods. 6:19

  • 40

    TAGdb output

  • Sym genes

    Brassicas can not form symbiotic associations with rhizobia or mycorrhizae - BUT - contain homologues for many genes involved in these processes.

    •What is the diversity of and selection pressure on these genes across the Brassicaceae?

    •What are these proteins doing? – general pathogen/microbial perception and response?

    Ferguson et al., 2010

    Tagdb results

    e.g. LjPOLLUX

    e.g. NFR1NFR5

    9 Arabidopsis homologues

    e.g. LjNUP85, LjNUP133

  • Sequencing SYM genes in Brassicas: NSP1 (Nodulation Signalling Pathway1)

    Ferguson et al., 2010

    BrNSP1 and BoNSP1 vs MtNSP1 = 57% CDS similarityAtNSP1 vs MtNSP1 = 58% CDS similarityBrNSP1 vs AtNSP1 = 83.8% CDS similarityBoNSP1 vs AtNSP1 = 83.7% CDS similarityBrNSP1 vs BoNSP1 = 98% CDS similarity

  • Sequencing SYM genes in Brassicas: NSP1 (Nodulation Signalling Pathway1)

    Ferguson et al., 2010

    High conservation in the GRAS domain.

    Residues important for NSP1 function in Lotus japonicus are conserved in the Brassicaceae.

  • Sequencing SYM genes in Brassicas: NSP2 (Nodulation Signalling Pathway2)

    Ferguson et al., 2010

    BrNSP2 vs BoNSP2 = 98% CDS similarityBrNSP2 vs AtNSP2 = 78.2% CDS similarityBoNSP2 vs AtNSP2 = 78.5% CDS similarityBrNSP2 and BoNSP2 vs MtNSP2 = 55% CDS similarity

  • Sequencing SYM genes in Brassicas: NSP1 (Nodulation Signalling Pathway1)

    Ferguson et al., 2010

    Alanine residue important for NSP1-NSP2 interaction in Lotus japonicus is not conserved in the Brassicaceae, but conserved in rice. Rice NSP1 and NSP2 are functional in nodulation in transgenic Lotus japonicus.

  • Sequencing SYM genes in Brassicas: POLLUX

    Ferguson et al., 2010

    • One copy on both the A and the C genomes: BrPOLLUX (A), BoPOLLUX (C) – 98% similar.

  • •BrPOLLUX CDS is 69.4% similar to Lj POLLUX CDS, Bo POLLUX = 69%, AtPOLLUX = 61%.

    •85.6% similarity between BrPOLLUX and AtPOLLUX, 85.4% between Bo and At.

    •Currently sequencing POLLUXin other Brassicaceae members.

    •Least similarity in N-terminal transit peptide.

  • Consistent with cation channel function:

    POLLUX

    Geneious Pro Transmembrane Prediction (Biomatters).

  • 49

    Future work

    • SNP identification and genotyping of cultivated and wild Brassicaceae

    • Large scale SNP discovery and genotyping for fine mapping and LD studies

    • Identify which Brassicaceae to sequence • Use next generation sequencing data, molecular

    markers and morphological variation to study diversity across Brassica species and wild relatives

  • 50

    Summary

    • Next generation sequencing data is suitable for gene, promoter and SNP discovery in non-sequenced and orphan species

    • SNPs can be applied for gene discovery and evolution in crop species and wild relatives

    • High throughput genotyping can be used for fine mapping and LD studies

  • Acknowledgements

    Emma CampbellChristina DelayMegan McKenzieReece TolleneareJoanne McLandersManuel ZanderAlice Hayward

    Paul Berkman Chris DuranKaitao LaiMichal LorencSahana ManoliAdam SkarshewskiLars SmitsJiri StillerDavid Edwards

    Bob ReddenHarsh RamanXiaowu Wang