grm 2013: genome-wide selection update -- rk varshney and a rathore
DESCRIPTION
TRANSCRIPT
Rajeev K. Varshney and Abhishek Rathore Email: [email protected] [email protected]
Genome-Wide Selection Update GCP General Research Meeting Session IX 30th September 2013
ISMU 1.0 Challenges in SNP Detection
• Mostly Command-line based Linux Tools
• Multiple steps involved • Difficult pre-processing & cleaning of raw data • Specialized skills required to
process the job • Developing genotyping assays
(GoldenGate and KASPar) • Very few user-friendly
software
Solution: ISMU 1.0 Pipeline
Features: – Multicore Architecture – One stop shop for SNP detection – Graphical User Interface – Automated Cleaning of Data – Integration of various popular alignment
tools – Customized operation of tools for advanced users – Available in Online and Standalone versions – Easy Installation – Works on CentOS, RHEL & Fedora – Visualization of SNP and Alignment (TABLE/FLAPJACK)
Raw Reads
Reference
ISMU V1.0
Assemble & Align Raw Reads Mine SNPs Generate Marker Matrix Visualize in TABLET and FLAPJACK Export in FLAT Files
• Assemble & Align Raw Reads
• Mine SNPs • Generate Marker Matrix • Automated Visualize in
TABLET and FLAPJACK • Developing genotyping
assays • Export in FLAT Files
ISMU V1.0
ISMU 1.0 Standalone Edition Selection of Alignment Tool & SNP Approach
ISMU 1.0 Standalone Edition Results
Locus Forward Polymorphism Reverse TC00001_1272 CGCTCAAGAGAACCAGTGTTGGAATGGTGGCGGCGATGGCTGTATTTCCA A/T GAAAAGTAAGGGACTAGAAG TC00075_852 T GAGATGTTCCTATCACCAATGCAAATATCAGGGCAAATGCACTAACATA C/T TTGAGTAAATTTCCCATCTT TC00118_13765 AATTAAGTTAGTAATGACTGGACGAAACCAAGAAATAACTACTTACGTGC T/G AAATTATAGAAGGTCTCCTG TC00130_2668 GTTGTTGATCGAAAGAAAATTTAATTTCTTGTTCGACTGATCACCTTGCT G/A GGTTCCAACTATTCTAAAGT TC00191_3430 TTAATGAATTTGCTTCATCGTCCAAGGTTTACCATTTAGGTGGGTAGAGC T/C ACAGAAATTAAGTATCTGGT TC00212_866 CCCATGTCAATCATCCCAATTTTCTTGCATAAATTATCCTTAAATGGATA G/T CTTTACGTATGATGCTGATC TC00295_2234 AGCCAGTGGAAGCTCCACCAGCAGCAGTAGCAGAAGTTCCAATTGAGACT C/T CTGAAGCTTAGACCAATGGA TC00329_2112 GAGGCGTGAAAAGAAAAAGGCAAAGGAGGAGAGGGAGAAGCAAATAAGGG A/C TGCTGAGGAAAGACTACTGG TC00336_3122 CTGAAATGGAGTGTTTTTATACAAGTTGTAAATAGTGATGTTTTGTACAT C/T TTTCTGGAAGATGATTCATG
[HEADING] Customer_Name Company_Name Email_Address Platform_Type GGGT Format_Type Gene; Region; Sequence; Identity; ExistingDesign; or Score [select one] Design_iteration prelim Species Number_of_Assays [DATA] Locus_name,Target_Type,Sequence,Chromosome,Coordinate,Genome_Build_Version,Source,Source_Version,Sequence_Orienta TC00001_1272,SNP,TACTTCATCCCGCTCAAGAGAACCAGTGTTGGAATGGTGGCGGCGATGGCTGTATTTCCA[A/T]GAAAAGTAAGGGACTAGAAGGGCAGAGTGGA72,0,0,0,Forward,Plus TC00075_852,SNP,TTGTCGACATTGAGATGTTCCTATCACCAATGCAAATATCAGGGCAAATGCACTAACATA[C/T]TTGAGTAAATTTCCCATCTTCATTTGCACAAA,0,0,0,Forward,Plus TC00118_13765,SNP,ATCTAAAAATAATTAAGTTAGTAATGACTGGACGAAACCAAGAAATAACTACTTACGTGC[T/G]AAATTATAGAAGGTCTCCTGTAAGATCCAA3765,0,0,0,Forward,Plus TC00130_2668,SNP,TGCGGTCATTGTTGTTGATCGAAAGAAAATTTAATTTCTTGTTCGACTGATCACCTTGCT[G/A]GGTTCCAACTATTCTAAAGTAATACAGGCAT68,0,0,0,Forward,Plus
KASPar
ILLUMINA
MABC, MARS and GS approaches seem to most
promising for crop improvement
Need to have genomic resources and cost-effective genotyping platforms
Breeders-friendly pipelines and decision support tools required for prediction of phenotype
Novel breeding approaches for developing countries
MBDT
MBDT
OptiMAS
GS
?
Breeding Cycle
Crossing
Field evaluation Line Selection
yirR A
tσ
=genetic gain over time
years per cycle
selection intensity selection accuracy
genetic variance
NEW
cheaper to genotype = larger populations for
same $$
make selections in ‘off target’ years
maintain favorable rare alleles
Select years earlier on single
plant basis
Inbreeding
Multi-location, Multi-year testing
Seed Increase
Based on discussions with several colleagues
e.g. Jesse Poland, J-L Jannink, Gary Atlin
GS-Models • Usually involves relatively high
number of markers • To meet the challenges,
statistical methods that can handle high-dimensional data have been developed
• However, their respective properties are still not fully understood,
• Causing considerable uncertainty about the choice of models for genomic prediction
• Factors affecting GS are also not very clear
GS
ISMU V2 Raw Reads
Reference
Assemble & Align Raw Reads Mine SNPs Generate Marker Matrix Visualize in TABLET and FLAPJACK Export in FLAT Files
GDMS
Genotypic Matrix & QTLs
Lines selected for further crossing in
GS
External Genotyping Platforms
Called SNPs
ISMU V2.0
GS-Models • To meet the challenges, statistical methods
that can handle high-dimensional data have been developed
• However, their respective properties are still not fully understood
• Causing considerable uncertainty about
the choice of models for genomic prediction
• Factors affecting GS are also not very clear
Factors Affecting GS-Models
• Marker density, genome size and structure
• Size of the training population • Historical effective population size • Trait heritability • Relationship between training
population & selection candidates • Number of genes and distribution of
their effects • Method used for the estimation of
marker effects • GxE
Validation Studies
• Fit available models • Cross Validation • Prepare a matrix of validation scores • Compare over the multiple environments • Select Final model
Training set Testing set
Cross Validation K(=5) - fold cross-validation
ISMU 2.0 Pipeline Analysis Capabilities to ISMU 1.0
• GUI for Genomic Selection • Multicore Support • R and Fortran Libraries for GS • Project Mode Development • IDE Supports • Multiple Method & Traits at once • Platform Support
– Windows x64 and x32 – CentOS x64 and Ubuntu x64 – MAC (Under Testing…)
In collaboration with J L Jannink, John Hickey and Aaron Lorenz
• Data Diagnostics – Graphical Summary – Tabular Summary
• Subset Data – Missing % – MAF – PIC
• Genomic Selection – RR-BLUP – Kinship Gauss – Bayesian LASSO – BayesB and BayesCπ – Random Forest Regression (RFR)
• HTML & PDF Output
ISMU 2.0 Pipeline Analysis Capabilities to ISMU 2.0
ISMU 2.0
ISMU 2.0
Browse Data
Data in ISMU2.0
Calculation of Marker Summary
Summary Plots
Various Statistics
Export to MS-Excel (Windows)
GS Methods
GS Methods
GS Results
GS Results
Export to PDF
Export to High Quality Graphics 300DPI
Future Plans • Customized Parameters for GS Scripts • Integrating more Algorithms • Implementation of Cross Validation • Linking with IBWS • Data Import/Export Module • Online Version of ISMU 2.0 • Linking with Agricultural Genomics Network • Making available on more OS • Average GEBVs • Multi-trait GS • Capacity building in NARS Partners
– 4th International Workshop on Next Generation Genomics and Integrated Breeding for Crop Improvement, Feb 19th -21st 2014
Acknowledgements
Many Friends & Collaborators
Thanks…
Thanks…