snp discovery from amphidiploid species and ... · • m2p = 1*g • snp score = 2 3. no conflict...
TRANSCRIPT
-
1
SNP discovery from amphidiploid species and transferability across
the Brassicaceae
Jacqueline BatleyUniversity of Queensland, Australia
-
2
Outline
• Objectives• Brassicas• Genome Sequencing• SNP discovery• SNP validation• Cross species transferability• Application• Future work
-
Objectives
• Development of bioinformatics tool for SNP discovery and annotation
• Establish cost effective discovery and validation of SNPs within the amphidiploid B. napus
• Assess association of SNPs with genes for agronomic traits
• Assess the extent of LD within B. napus• Assess genetic diversity of important agronomic genes
within cultivated Brassica spp. and wild relatives• Establish a strategy for SNP discovery from other large
and complex genomes.
3
-
Methodology
• Paired end sequence from parents of mapping populations
• SNP discovery• Genotyping using golden gate and infiunium
assays• SNPs genetically and physically mapped• Cross species amplification to other
Brassicaceae members
4
-
Brassicas
-
Diversity genomics
• Characterising genomic and phenotypic diversity in cultivated and wild plant species and their pathogens� Brassicaceae, Leptosphaeria maculans
• Investigating genetic variation in crops and wild relatives
• Investigating the evolution of plant pathogen interactions
• Identifying novel genes and genetic markers for traits of interest, such as disease resistance
6
-
7
Genetic diversity
• Germplasm collections are valuable gene pools • Assessing genetic and genomic diversity within these
collections:� assign lines and populations to diverse groups� study the evolutionary history of wild relatives� verify pedigrees and fill in the gaps in incomplete pedigree or
selection history� monitor changes in allele frequencies in cultivars or populations� help narrow the search for new alleles at loci of interest.
-
8
Domestication bottlenecks
• B. napus canola, B.juncea mustard and B. carinata are allopolyploids.
• Rare natural polyploids only incorporate a limited genetic diversity from progenitor diploids.
• Wide genetic diversity in B. rapa, B. nigra, B. oleracea progenitors and wild relatives, options to enhance canola and mustard.
• A range of strategies is available to realise the genetic potential of the Brassicaceae.
-
Sequence data
• Illumina GAIIx and Hi-Seq data for:• 8 B. napus cultivars• 2 B. rapa cultivars• B. oleracea• 3 Brassicaceae
• Funding for 100+ Brassicas
9
-
10
Brassica genome sequencing
• B. rapa ssp. Pekinensis var. Chiifu• 10 chromosomes, ~550 Mbp• Multinational Brassica genome sequencing
committee originally agreed BAC by BAC sequencing approach
• >100,000 BAC end sequences• >600 BACs sequenced• Genome sequenced using Illumina GAIIx
-
B. rapa SNP discovery and genotyping
• Illumina paired end sequence from parents of mapping populations
• SNP discovery• Genotyping using golden gate• Physical mapping• Cross species amplification to other
Brassicaceae members
11
-
SNP validation
-
Genotyping
• Illumina Golden gate system• 384 SNPs
• 2 B. rapa mapping populations• Parents of B. napus mapping populations• Selection of wild Brassicaceae
13
-
SNP Validation
SNP Pool 3Lenient Criteria
SNP Pool 2Less strict
Criteria
SNP Pool 1Strictest Criteria
~ 320SNPs
~ 50SNPs ~ 15
SNPs
GoldenGate Oligo Pool
-
SNP Validation
SNP Pool 3Lenient Criteria
SNP Pool 2Less strict
Criteria
SNP Pool 1Strictest filtering Criteria
94% conversion
80% conversion
30% conversion
-
SNP Genotyping
16
-
SNP Genotyping
17
-
18
Genetic diversity
• Assess relationships within the Brassicaceae• Correlate this with morphological and inter-
specific hybridisation data
-
Brassicaceae diversity
-
Brassicaceae diversity
-
B. napus SNP discovery
• Custom algorithm developed for SNP discovery from Illumina data for amphidiploid species
• Distinguish between inter and intra genomic SNPs
-
22
The SGSautoSNP algorithm
• We do not consider the reference in SNP discovery• the reference is only used to bring the reads together• SNPs are called from these reads
=> different to most other SNP callers
1. coverage must be at least 42. SNP score must be at least 2
– Example:• SP1 = 6*A• AP1 = 1*G• M2P = 1*G• SNP score = 2
3. no conflict within a variety– i.e. all bases in each cultivar must be the same– if e.g. Junior 3 * A and 1 * T => conflict
-
23
Output visualisation
-
B. napus SNP discovery
• Custom algorithm developed for SNP discovery from Illumina data for amphidiploid species
• Distinguish between inter and intra genomic SNPsXA_0011r 1252 1252 3 S=G=2;M1=G=3;Sr=X=0;A=G=3;J=T=3;M2=G=1;Bn=X=0;E=X=0; T;G;XA_0011r 1379 1379 5 S=T=2;M1=T=3;Sr=X=0;A=T=1;J=C=3;M2=X=0;Bn=X=0;E=C=2; C;T;XA_0011r 2036 2036 4 S=G=1;M1=G=2;Sr=X=0;A=G=1;J=T=8;M2=T=3;Bn=X=0;E=T=6; T;G;XA_0011r 4921 4921 2 S=X=0;M1=X=0;Sr=X=0;A=T=8;J=X=0;M2=X=0;Bn=X=0;E=C=2; C;T;XA_0011r 5070 5070 4 S=X=0;M1=G=2;Sr=X=0;A=G=2;J=A=6;M2=X=0;Bn=X=0;E=X=0; A;G;XA_0011r 5273 5273 3 S=C=4;M1=C=5;Sr=X=0;A=C=6;J=G=2;M2=X=0;Bn=X=0;E=G=1; C;G;XA_0011r 5442 5442 8 S=T=1;M1=X=0;Sr=X=0;A=T=7;J=C=5;M2=X=0;Bn=C=1;E=C=3; C;T;XA_0011r 5512 5512 7 S=G=3;M1=G=3;Sr=X=0;A=G=5;J=A=4;M2=X=0;Bn=A=2;E=A=1; A;G;XA_0011r 5976 5976 11 S=T=8;M1=T=1;Sr=X=0;A=T=2;J=C=6;M2=X=0;Bn=C=2;E=C=3; C;T;XA_0011r 5992 5992 10 S=A=9;M1=A=1;Sr=X=0;A=A=3;J=G=5;M2=X=0;Bn=G=2;E=G=3; A;G;
-
B. napus SNP discovery
Base Change Type Number
A>G transition105045
C>T transition105513
A>C transversion42480
A>T transversion49287
C>G transversion29828
G>T transversion42217
-
B. napus SNP discovery
Base Change Type Number
A>G transition105045
C>T transition105513
A>C transversion42480
A>T transversion49287
C>G transversion29828
G>T transversion42217
Base Change Type Number
A>G transition 24207
C>T transition 24375
A>C transversion 10158
A>T transversion 12254
C>G transversion 6621
G>T transversion 9918
-
B. napus SNP density
0
5
10
15
20
25
30
0 100000 200000 300000 400000 500000 600000 700000
Series1
-
B. napus SNP validation
• 24/25 SNPs correctly predicted through validation by PCR and sequencing
• 20/22 SNPs correctly predicted through Golden gate
• Range of sequence coverage and confidence scores
28
-
29
Gene discovery
• Finding the genes for the traits
• Integration of genetic data with genomic data• Mapping of QTL regions to genomic data
...Annotation
-
30
Gene discovery - application
Genetic map
Physical map
Physical scaffolds
Na12 E11 BRAS023BRMS040
BRMS005
CB10278 BRMS075
KBRH143H15
BRMS036Na12 A02
RA2 A05
1Mbp
CB10439
10 cMOI09 A06
-
Scaffold and Marker Assembly
Chromosome MarkerScaffoldA7
-
CMap3D
32Duran et al. (2010) Bioinformatics 26: 273-274
-
Identification of Candidate Blackleg Resistance Genes
TNL (Gene number) Scaffold1 32 33 34 35 36 37 128 129 1210 1211 312 313 314 315 1216 1217 1918 1919 1920 1921 19
-
TNL6 Sequence and Protein Alignment
B. rapaB. napus
1B. napus
2
B. rapaB. napus
1B. napus
2
B. rapaB. napus
1B. napus
2
-
Gene Mutation Species Predicted Number of Reads Sequence Verified
TNL 1 18,240Reference: B. rapa G N/A
B. napus 1 G 3 G
B. napus 2 C 1 C
TNL5 5,208,963Reference: B. rapa C N/A
B. napus 1 C 1 C
B. napus 2 T 4 T
TNL 5 5,209,056Reference: B. rapa A N/A
B. napus 1 G 1 G
B. napus 2 A 5 A
TNL5 5,209,772Reference: B. rapa A N/A
B. napus 1 A 1 A
B. napus 2 T 6 T
TNL5 5,207,023Reference: B. rapa G N/A
B. napus 1 T 4 T
B. napus 2 G 1 G
TNL 6 5,891,882Reference: B. rapa T N/A
B. napus 1 T 4 T
B. napus 2 C 3 C
-
Change in charge was the most common change due to protein differences
-
37
-
Gene discovery
Gene/EST
Primer
genomic sequence
PCR
Known Unknown(Arabidopsis) (Brassica)
-
http://flora.acpfg.com.au/tagdb/
39
http://flora.acpfg.com.au/tagdb
Marshall, D.J., et al. (2010) Plant Methods. 6:19
-
40
TAGdb output
-
Sym genes
Brassicas can not form symbiotic associations with rhizobia or mycorrhizae - BUT - contain homologues for many genes involved in these processes.
•What is the diversity of and selection pressure on these genes across the Brassicaceae?
•What are these proteins doing? – general pathogen/microbial perception and response?
Ferguson et al., 2010
Tagdb results
e.g. LjPOLLUX
e.g. NFR1NFR5
9 Arabidopsis homologues
e.g. LjNUP85, LjNUP133
-
Sequencing SYM genes in Brassicas: NSP1 (Nodulation Signalling Pathway1)
Ferguson et al., 2010
BrNSP1 and BoNSP1 vs MtNSP1 = 57% CDS similarityAtNSP1 vs MtNSP1 = 58% CDS similarityBrNSP1 vs AtNSP1 = 83.8% CDS similarityBoNSP1 vs AtNSP1 = 83.7% CDS similarityBrNSP1 vs BoNSP1 = 98% CDS similarity
-
Sequencing SYM genes in Brassicas: NSP1 (Nodulation Signalling Pathway1)
Ferguson et al., 2010
High conservation in the GRAS domain.
Residues important for NSP1 function in Lotus japonicus are conserved in the Brassicaceae.
-
Sequencing SYM genes in Brassicas: NSP2 (Nodulation Signalling Pathway2)
Ferguson et al., 2010
BrNSP2 vs BoNSP2 = 98% CDS similarityBrNSP2 vs AtNSP2 = 78.2% CDS similarityBoNSP2 vs AtNSP2 = 78.5% CDS similarityBrNSP2 and BoNSP2 vs MtNSP2 = 55% CDS similarity
-
Sequencing SYM genes in Brassicas: NSP1 (Nodulation Signalling Pathway1)
Ferguson et al., 2010
Alanine residue important for NSP1-NSP2 interaction in Lotus japonicus is not conserved in the Brassicaceae, but conserved in rice. Rice NSP1 and NSP2 are functional in nodulation in transgenic Lotus japonicus.
-
Sequencing SYM genes in Brassicas: POLLUX
Ferguson et al., 2010
• One copy on both the A and the C genomes: BrPOLLUX (A), BoPOLLUX (C) – 98% similar.
-
•BrPOLLUX CDS is 69.4% similar to Lj POLLUX CDS, Bo POLLUX = 69%, AtPOLLUX = 61%.
•85.6% similarity between BrPOLLUX and AtPOLLUX, 85.4% between Bo and At.
•Currently sequencing POLLUXin other Brassicaceae members.
•Least similarity in N-terminal transit peptide.
-
Consistent with cation channel function:
POLLUX
Geneious Pro Transmembrane Prediction (Biomatters).
-
49
Future work
• SNP identification and genotyping of cultivated and wild Brassicaceae
• Large scale SNP discovery and genotyping for fine mapping and LD studies
• Identify which Brassicaceae to sequence • Use next generation sequencing data, molecular
markers and morphological variation to study diversity across Brassica species and wild relatives
-
50
Summary
• Next generation sequencing data is suitable for gene, promoter and SNP discovery in non-sequenced and orphan species
• SNPs can be applied for gene discovery and evolution in crop species and wild relatives
• High throughput genotyping can be used for fine mapping and LD studies
-
Acknowledgements
Emma CampbellChristina DelayMegan McKenzieReece TolleneareJoanne McLandersManuel ZanderAlice Hayward
Paul Berkman Chris DuranKaitao LaiMichal LorencSahana ManoliAdam SkarshewskiLars SmitsJiri StillerDavid Edwards
Bob ReddenHarsh RamanXiaowu Wang