wiggans, 2013sruc imputation (1) dr. george r. wiggans animal improvement programs laboratory...
TRANSCRIPT
Wiggans, 2013SRUC Imputation (1)
Dr. George R. WiggansAnimal Improvement Programs LaboratoryAgricultural Research Service, USDABeltsville, MD 20705-2350, [email protected]
Imputation 100 011110 1220020012 02121110111121 10111100112110002012200222011112021012002111221100211120220 0011110010110110102200110022011011200201101020222121122101220 2010011100011220221222112021120120201002022020002122 21122011101210011121110211211002010210002200020221 201000201100002202211022112101121110122220012011 12220020002002020201222110022222220022121111220 2100211112001101110112002022200011120110102121 1121211102022100211201211001111102111211020002 122000101101110202200221110102011121111011221 202102102121101102212200121101121101202201100 01 22200210021100011100211021101110002220021121 2 2121211000222010200222212001221121210111011 11 200201102020012222220021110 2200112 211122 10101121211 202111 2112 12112121 10120 1021 01 11220 012 10 0 21 00 2 2 11 12 1 0 21 1 2 12001 0 12
Wiggans, 2013SRUC Imputation (2)
Imputation
Based on splitting the genotype into individual chromosomes (maternal and paternal contributions)
Missing SNPs assigned by tracking inheritance from ancestors and descendents
Imputed dams increase predictor population
Genotypes from all chips merged by imputing SNPs not present
Wiggans, 2013SRUC Imputation (3)
Terms
Genotype – Alleles on both chromosomes for all markers
Allele representation – A,B; A,C,T,G
Genotype representation – number of A’s; 0,1,2,5 (missing)
Imputation – Determination of an allele from alleles of other markers and animals
Phasing – Separating a genotype into individual chromosomes and possibly assigning maternal or paternal origin
Wiggans, 2013SRUC Imputation (4)
10001112200200121110111121111011110011211000201220022201111202101200211122110021112001111001011011010220011002201101120020110102022212112210201001110001122022122211202112012020100202202000021100011202011221112111022011110000212202000221012020002211220111012100111211102112110020102100022000220100020110000220221102211210112111012222001211212220020002002020201222110022222220022121111210021111200110111011200202220001112011010211121211102022100211201211001111102111211021112200010110111020220022111010201112111101120210210212110110221220012110112110120220110022200210021100011100211021101110002220020221212110002220102002222121221121112002011020200122222211221202121121011001211011020022000200100200011110110012110212121112010101212022101010111110211021122111111212111210110120011111021111011111220121012121101022202021211222120222002121210121210201100111222121101
Genotype for Elevation
Chromosome 1
Wiggans, 2013SRUC Imputation (5)
X chromosome
Bull
202220200002022220002020222020202
Cow
1201201212222010111022210210212022
Wiggans, 2013SRUC Imputation (6)
Pedigree – parents, grandparents, etc.
O-Style
O-ManManfred
Jezebel
DevaTeamster
Dima
Wiggans, 2013SRUC Imputation (7)
O-Style haplotypes – chromosome 15
Wiggans, 2013SRUC Imputation (8)
findhap
Developed by Paul VanRaden
Divides chromosomes into segments
Allows for successively shorter segments, typically 3 runs Long segments lock in identical by descent Shorter segments fill in missing SNPs
Separates genotype into maternal and paternal contribution, haplotypes (phasing)
Builds haplotype library sequenced by frequency
Wiggans, 2013SRUC Imputation (9)
findhap characteristics
Population haplotyping Divides chromosomes into segments Lists haplotypes by genotype match Similar to FastPhase, Impute, or long range phasing
Pedigree haplotyping Detects crossover; fixes noninheritance Imputes nongenotyped ancestors
Wiggans, 2013SRUC Imputation (10)
Recent program revisions
Improved imputation and reliability
Changes since January 2010 Use known haplotype if 2nd is unknown Use current instead of base frequency Combine parent haplotypes if crossover is detected Begin search with parent or grandparent haplotypes Store 2 most popular progeny haplotypes
Decreased computing time by using previous haplotype library
Wiggans, 2013SRUC Imputation (11)
Population haplotyping
Put 1st genotype into haplotype list
Check next genotype against list Do any homozygous loci conflict?−If haplotype conflicts, continue search−If match, fill any unknown SNP with homozygote−2nd haplotype = genotype minus 1st haplotype−Search for 2nd haplotype in rest of list
If no match in list, add to end of list
Sort list to put frequent haplotypes 1st
Wiggans, 2013SRUC Imputation (12)
Coding of alleles and segments
Genotypes 0 = BB, 1 = AB or BA, 2 = AA, 3 = B_, 4 = A_, 5 = __ (missing) Allele frequency used for missing
Haplotypes 0 = B, 1 = not known, 2 = A
Segment inheritance (example) Son has haplotype numbers 5 and 8 Sire has haplotype numbers 8 and 21 Son got haplotype number 5 from dam
Wiggans, 2013SRUC Imputation (13)
1st segment of chromosome 15
For efficiency, store haplotypes just once
Most frequent Holstein haplotype had 4,316 copies (0.0516 41,822 animals 2 chromosomes each)
1 5.16% 022222222020020022002020200020000200202000022022222202220 2 4.37% 022020220202200020022022200002200200200000200222200002202
3 4.36% 022020022202200200022020220000220202200002200222200202220 4 3.67% 0220202220202220020220222020200002022200002000020200020025 3.66% 0222222220202220220202002200000202222020000020202200020226 3.65% 0220200222022002000220202200002202022000022002222002022227 3.51% 0220022220202220220220202202002220022000000020222200022208 3.42% 0220022220022200220220202200202002022020002020200200020209 3.24% 022222222020200000022020220020200202202000202020020002020
10 3.22% 022002222002220022002020002220000202200000202022020202220
Most frequent haplotypes
Wiggans, 2013SRUC Imputation (14)
Check new genotype against list
1st segment of chromosome 15 Search for 1st haplotype that matches genotype
022112222011221022021110220010110212202000102020120002021
Get 2nd haplotype by removing 1st from genotype022002222002220022022020220020200202202000202020020002020
5.16% 0222222220200200220020202000200002002020000220222222022204.37% 0220202202022000200220222000022002002000002002222000022024.36% 0220200222022002000220202200002202022000022002222002022203.67% 0220202220202220020220222020200002022200002000020200020023.66% 022222222020222022020200220000020222202000002020220002022
3.65% 0220200222022002000220202200002202022000022002222002022223.51% 0220022220202220220220202202002220022000000020222200022203.42% 0220022220022200220220202200202002022020002020200200020203.24% 0222222220202000000220202200202002022020002020200200020203.22% 022002222002220022002020002220000202200000202022020202220
Wiggans, 2013SRUC Imputation (15)
Recessive defect discovery
Check for homozygous haplotypes Most haplotype blocks ~5 Mbp long 7–90 expected, but 0 observed
5 of top 11 haplotypes confirmed as lethal
Investigation of 936–52,449 carrier sirecarrier MGS fertility records found 3.0–3.7% lower conception rates
Wiggans, 2013SRUC Imputation (16)
Traditional evaluations 3X/year
Yield Milk, fat, protein, component percentages
Type Stature, udder characteristics, feet and legs
Calving Calving ease, stillbirth rate
Functional Somatic cell score, productive life, fertility
Wiggans, 2013SRUC Imputation (17)
Reduce generation interval from 5 to 2 yr
0 1 2 3 4 5
Genomic prediction of progeny test
Select parents, transfer embryos
to recipients
Calves born and
DNA tested
Calves born from DNA-selected
parents
Bull receives progeny test
Wiggans, 2013SRUC Imputation (18)
Benefit of genomics
Determine value of bull at birth
Increase selection accuracy
Reduce generation interval
Increase selection intensity
Increase rate of genetic gain
Wiggans, 2013SRUC Imputation (19)
Genomic evaluation program
Identify animals to genotype
Send sample to genotyping laboratory
Genotype sample
Send genotype to evaluation center
Calculate genomic evaluation
Release monthly evaluation
Wiggans, 2013SRUC Imputation (20)
DHI herd
DNA laboratory AI organization, breed association
DNA samples
genotypes
genomic
evaluations
nominations,
pedigree datagenotype
quality reports genomic
evaluati
ons
DNA samples
genotypes
DNA samples
CDCB
Genomic data flow
Wiggans, 2013SRUC Imputation (21)
Genotyped animals – April 2013
ChipTraditionalevaluation?
Animalsex Holstein Jersey
Brown Swiss Ayrshire
50K Yes Bulls 21,904 2,855 5,381 639Cows 16,062 1,054 110 3
No Bulls 45,537 3,884 1,031 325Cows 32,892 660 102 110
<50K Yes Bulls 19 11 28 9Cows 21,980 9,132 465 0
No Bulls 14,026 1,355 90 2Cows 158,622 18,722 658 105
Imputed Yes Cows 2,713 237 103 12No Cows 1,183 32 112 8
All 314,938 37,942 8,080 1,213
Wiggans, 2013SRUC Imputation (22)
Steps to prepare genotypes
Nominate animal for genotyping
Collect blood, hair, semen, nasal swab, or ear punch Blood may not be suitable for twins
Extract DNA at laboratory
Prepare DNA and apply to beadchip
Do amplification and hybridization, 3-day process
Read red/green intensities from chip and call genotypes from clusters
Wiggans, 2013SRUC Imputation (23)
What can go wrong
Inadequate DNA quality or quantity from sample
Genotype with many SNPs that cannot be determined (90% call rate required)
Parent-progeny conflicts Pedigree error Sample ID error (switched samples) Laboratory error Parent-progeny relationship detected not in pedigree
Wiggans, 2013SRUC Imputation (24)
Parentage validation and discovery
Parent-progeny conflicts detected Animal checked against all other genotypes Conflict reported to breeds and requesters Correct sire usually detected
MGS checked 1 SNP at a time Haplotype checking more accurate
Breeds moving to accept SNPs in place of microsatellites
Wiggans, 2013SRUC Imputation (25)
Sire AnimalA/B A/B
* B/B B/B* A/A A/A
B/B A/BA/B B/BA/B A/B
* A/A A/AA/B A/AB/B A/B
* B/B B/B* B/B B/B
A/B A/BB/B A/B
* A/A A/A* B/B B/B
A/B A/BA/B A/A
* B/B B/BA/B A/AA/B A/A
Parent-progeny conflicts
SireConflicts = 0*Tests = 10Conflict % = 0%
Conflict % Relationship
MGSA/BA/BA/AA/B *A/A *B/B *A/A *B/B *B/B *B/B *A/BA/BA/BB/B *A/BA/AB/B *A/BA/A *B/B
MGSConflicts = 3*Tests = 10Conflict % = 30.0%
Wiggans, 2013SRUC Imputation (26)
For animal Pedigree wrong Genotype unreliable (3K)
For SNP SNP unreliable Clustering needs adjustment
Parent 10212002101201211001020100100
Progeny 10202010100200221001120120220
Parent-progeny conflicts
Wiggans, 2013SRUC Imputation (27)
Detecting unreliable genotypes
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2.0 2.4 2.8 3.2
Conflicts (%)
AcceptUnreliable
genotype (reject)
3.6
Reject
Wiggans, 2013SRUC Imputation (28)
MGS detection
SNP conflict method (SNP) Check if animal and MGS have opposite homozygotes
(duo test) If sire is genotyped, some heterozygous SNP can be checked
(trio test)
Common haplotype method (HAP) After imputation of all loci, determine maternal contribution
by removing paternal haplotype
Count maternal haplotypes in common with MGS
Remove haplotypes from MGS and check remaining against maternal great-grandsire (MGGS)
Wiggans, 2013SRUC Imputation (29)
Results by breed
*50K genotyped animals only
SNP method Hap method
BreedMGS %
confirmedMGS %
confirmedMGGS %
confirmedHolstein 95 (98)* 97 92Jersey 91 (92) 95 95Brown Swiss 94 (95) 97 85
Wiggans, 2013SRUC Imputation (30)
Lab QC
Each SNP evaluated for Call rate Portion heterozygous Parent-progeny conflicts
Clustering investigated if SNP exceeds limits
Number of failing SNPs indicates genotype quality
Target <10 SNPs in each category
Wiggans, 2013SRUC Imputation (31)
Before clustering adjustment
86% call rate
Wiggans, 2013SRUC Imputation (32)
After clustering adjustment
100% call rate
Wiggans, 2013SRUC Imputation (33)
Automated QC reporting
6160 Genotypes Processed from LAB2013021811PASS/FAIL,Count,DescriptionPASS,1,Parent Progeny Conflict SNP >2%PASS,5,Low Call Rate SNP >10%PASS,0,HWE SNPPASS,0,Chips w/ >20 ConflictsPASS,0.3,No Nomination %PASS,0,Genotype Submitted with No Sample Sheet Row
Wiggans, 2013SRUC Imputation (34)
Reliability of Holstein predictions
Trait Bias* b REL (%) REL gain (%)Milk (kg) −64.3 0.92 67.1 28.6Fat (kg) −2.7 0.91 69.8 31.3Protein (kg) 0.7 0.85 61.5 23.0Fat (%) 0.0 1.00 86.5 48.0Protein (%) 0.0 0.90 79.0 40.4Productive life (mo) −1.8 0.98 53.0 21.8Somatic cell score 0.0 0.88 61.2 27.0Daughter pregnancy rate (%) 0.0 0.92 51.2 21.7Sire calving ease (%DBH) 0.8 0.73 31.0 10.4Daughter calving ease (%DBH) −1.1 0.81 38.4 19.9Sire stillbirth (%) 1.5 0.92 21.8 3.7Daughter stillbirth (%) − 0.2 0.83 30.3 13.2
*2011 deregressed value – 2007 genomic evaluation
Wiggans, 2013SRUC Imputation (35)
Marketed Holstein bulls
2007 2008 2009 2010 20110%
10%20%30%40%50%60%70%80%90%
100%
Old nongenomicOld genomic1st-crop nongenomic1st-crop genomicYoung nongenomicYoung genomic
Breeding year
% o
f tot
al b
reed
ings
Wiggans, 2013SRUC Imputation (36)
Ways to increase accuracy
Automatic addition of traditional evaluations of genotyped bulls when are 5 yr old
Possible genotyping of 10,000 bulls with semen in repository
Collaboration with other countries
Use of more SNPs from HD chips
Full sequencing – identify causative mutations
Wiggans, 2013SRUC Imputation (37)
Application to more traits
Animal’s genotype is good for all traits
Traditional evaluations required for accurate estimates of SNP effects
Traditional evaluations not currently available for heat tolerance or feed efficiency
Research populations could provide data for traits that are expensive to measure
Will resulting evaluations work in target population?
Wiggans, 2013SRUC Imputation (38)
Impact on producers
Young-bull evaluations with accuracy of early 1st crop evaluations
AI organizations marketing genomically evaluated young bulls
Genotype usually required to be a bull dam
Rate of genetic improvement likely to increase by up to 50%
AI organizations reducing progeny-test programs
Wiggans, 2013SRUC Imputation (39)
Why genomics works for dairy cattle
Extensive historical data available
Well developed genetic evaluation program
Widespread use of AI sires
Progeny-test programs
High-value animals worth the cost of genotyping
Long generation interval that can be reduced substantially by genomics
Wiggans, 2013SRUC Imputation (40)
Council on Dairy Cattle Breeding – CDCB
CDCB assuming responsibility for receiving data and computing and delivering U.S. evaluations
USDA will continue research and development to improve evaluation system
CDCB and USDA employees located at USDA’s Beltsville Agricultural Research Center in Beltsville, Maryland
Wiggans, 2013SRUC Imputation (41)
Questions?