the effect of population structure on...

Post on 18-Jun-2020

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Effect of Population Structure on Linkage

Allen Van DeynzeTomato Breeders’ Roundtable

June, 2009

1

Some Definitions

Linkage• Association of two or more loci on a chromosome with limited

recombination

Linkage Disequilibrium or Gametic Phase Disequilibrium• Non random association of alleles at two or more loci not

necessarily on the same chromosome• Measures co-segregation of alleles in a population• Mendel’s pea traits – showed complete linkage equilibrium and

hence independent assortment• Can arise from intermixture of populations with different gene

frequencies• Can also be produced or maintained by selection favoring one

combination of alleles over the other – e.g. selection for yield in a breeding population

Falconer and McKay (1996)2

Linkage disequilibrium

3

Molecular Markers

Molecular/DNA markers are great tools for breeding, however,• Confounding factors (e.g. linkage among markers,,

genotype*environment interaction, p-value threshold) can drive the selection in a different direction than breeder intended

• Understanding the population structure becomes critical• Phenotypic data still drives the genomic regions identified

as quantitative trait loci• Need to be able to understand and interpret the results

– implications and validation of QTLs are based on F-statistics

4

Trait Distribution in the Progeny

5

If we could observe directly the QTL we could see the 3 underlying trait distributions

Trait distribution in the F2 progeny Distribution within genotypic classesaaAaAA

Marker assisted selection

6

Fruit ripening

DNA marker

R

What About Population structure?F2;F3;F4;Recombinant Inbred (RI);F1-derived, intermated recombinant inbred (IRI);Doubled haploids (DH);Backcross (BC1);Association study in the base germplasm;Near Isogenic Lines (NIL).

7

Populations

8

Expansion of genetic map of 100 cM in an F2

9Winkler et al. (2003)

The level of resolution needed depends on intended application

Combining QTLs between lines within a segregating population.Moving chromosome segments within heterotic pools Make germplasm wide inferences/claims about a particular chromosome segment (assuming IBD).Moving chromosome segments across heterotic pools of elite germplasm.Introgression of special trait (e.g. disease resistance) from exotic germplasm.QTL cloning

10

Incr

ease

d ne

ed fo

r res

olut

ion

Type of population depends on resolution needed

Comparison of resolution and research time for various approaches to dissect quantitative variation. The research times assume the target species has only two generations per year. NIL, near-isogenic line; RIL, recombinant inbred line

11Buckler and Thornsberry (2002)

Linkage Disequilibrium Decay

12

r = Recombination Rate between two locir = 0.5 = Two loci are unlinkedr = 0 = Two loci are completely linked and do not independently assortFalconer and Mackay (1996)

Modeling a Marker Locus and a Linked QTL

Single Marker Analysis Modeling, assume:• One QTL locus A, • One Marker locus M,• Measure of association between A and M• Additive effect only, as dominance effects need

not be considered in most commercial Breeding applications:

– Self pollinated crops: compare AA to aa (no dominance effect)

– Cross Pollinated crop: compare AAT to aAT (dominance and additive effects are confounded)

13

What happens when both A and M are segregating?

Assume a genetic distance of rcM between A and M.

14

X aM

Generate a segregating progeny

A

m

Good phenotypes are important

ripeningin

fect

ion

Ann Powell UC Davis

Tomato fruit susceptibility to Botrytis cinerea

18 June 2009UCD Postharvest

MG

5 dpi 5 dpi

Red Ripe

Susceptibility to pathogens increases

Assigned values of the genotypes at A and M

Let δ=d, d/2, and 0 for cloned, selfed, and tescrossed progenies, respectively.

Expected progeny mean of the three marker genotypic classes:

• MM => a.P(AA/MM) + δ.P(Aa/MM) - a.P(aa/MM)• Mm => a.P(AA/Mm) + δ.P(Aa/Mm) - a.P(aa/Mm)• mm => a.P(AA/mm) + δ.P(Aa/mm) - a.P(aa/mm)

17

Genotyping PhenotypingIndividual

GenotypedCloned Progeny

Selfed Progeny

TestcrossedProgeny

AA AA => a AA => a AT => aAa Aa => d 1AA:2Aa:1aa => d/2 ½ AT+ ½ aT => 0

aa aa => -a aa => -a aT => -a

Variance within the genotypic classes

AssumeAA ~N( a, σ2

g residual+σ2e) ~N( a,σ2)

Aa ~N( d, σ2g residual+σ2

e) ~N( d,σ2)aa ~N(-a, σ2

g residual+σ2e) ~N(-a,σ2)

THEN:Let δ=d, d/2, and 0 for cloned, selfed, and testcrossed progenies, respectively.

Expected progeny variance of the three marker genotypic classes:

• Var(MM) = σ2+ (a-MM)2.P(AA/MM) + (δ-MM)2.P(Aa/MM) +(-a-MM)2.P(aa/MM)

• Var(Mm) = σ2+ (a-Mm)2.P(AA/Mm) + (δ-Mm)2.P(Aa/Mm) +(-a-Mm)2.P(aa/Mm)

• Var(mm) = σ2+ (a-mm)2.P(AA/mm) + (δ-mm)2.P(Aa/mm) +(-a-mm)2.P(aa/mm)

18

Test Statistics- MM versus mm -

• Assume our test statistics will be Satterthwaite t-test (so we don’t have to assume the variance is the same in the two genotypic classes):

19

mmmmMMMMmmMM nsnsXXt 22 +−='

- Mm versus MM or Mm versus mm -• This type of comparison is used in Backcross populations:

MmMmmmorMMmmorMMMmmmorMM nsnsXXt 22 +−='

What information do we need? Information pertinent to the association between marker genotypes and phenotypic values.• Total number of individuals genotyped, N (and

number in each class, nMM, nMm, and nmm)• Values of a and δ.• Values of residual genetic variance and error

variance

20

Populations derived from the cross between two inbred parents

(DH, BC1, F2, F3, F4, RI, NIL)Detection of linkage between A and M, will occur only if both are segregating in the population and if they are physically linked;Assuming both parents are sampled at random from the germplasm, the probability that:• M is segregating: 2.fM.(1-fM)• A is segregating: 2. fA.(1-fA)• Both A and M are segregating: 2. (fAM.fam+ fAm.faM)

21

Gametic phase disequilibrium between linked markers is valuable

We cannot extrapolate results from one mapping population to the rest of our germplasm in the absence of disequilibrium.Disequilibrium tends to increase the chance of M1 and M2 segregating simultaneously.

22

How to know if we have disequilibrium between linked marker and QTL in the germplasm?

We cannot really assess very well the extent of gametic phase disequilibrium between M1 and M2 in the population without very extensive mapping studies.

But we can look at Marker-Marker associations for a much smaller cost: we just need to genotype our germplasm.

23

TG670

SSR10511CT6213SSR26614SSR192 TG12515SSR95 CT2012721

SSR31634CT14935CT20134I CT10725I41CT20268I42SSR13446CT10975I51TG27354CT10030I.258CT2011659LEOH106 TG5960CT10629 CT10811CT10945 LEVCOH1265LEVCOH1166CT19169SSR973SSR308 TG46581LEOH22286SSR42 TG260TG24587SSR3796CT1025998TG255103SSR65 SSR582SSR288 TG580113CT10126I115

Chr.1

CT105350LEOH3423LEOH342n7TG60812CT20522CT10682I29CT1019030TG16531SSR6633SSR9640CT1064942CT1092344TG1446SSR5 SSR60547CT10771 CT1015348CT1080151CT10279I55CT24457SSR3259LEOH34860SSR59861SSR2664TG46968LEOH11370TG64572TG33777TG53784TG16790LEOH319 LEOH17491TG15192TG154100

Chr.2

TG214 SSR6010SSR14 LEOH1271SSR3202CT10690I3CT20050 CT10772I4CT106786CT10042I13CT1045017CT8519CT10437I22TG24628LEOH11031LEOH18533TG12934CT10689I36CT10480I38SSR11144CD5145CT20195 CT10402I46CT20037 CT1043747CT10736 CT8250CT2002351LEOH22354CT1050655TG52059CT14169TG13B77TG13086TG11492

Chr.3

CT109520

SSR29610

TG1522CT10255I SSR4325LEOH36126SSR431127TG48337CT2014541CT1018447CT1032253SSR31056SSR450 CT10485CT157 SSR30657CT1080959CT1021564LEOH101 CT17868CT2002870CT19474TG16377CT1088878CT10184I79CT1013684CT1055685TG50090CT5092

CT10375121

Chr.4

CT1010CT102384TG4419CT167 SSR11514CT1003615

CT9330TG9637CT1096341CT10373I TG619CT10151I42CT10765I44CT20210I45CT10526 LEOH6346TG100A47CT11855LEOH19258LEOH31667CT1059176SSR16283SSR109 TG18584

Chr.5

CT2160CT102426CT10242I7CT1018711TG59017

CT10328I27SSR12830

TG35639LEOH24341

TG36549TG25355LEOH14657LEOH20958LEOH20061LEOH11266

CT20674TG31478

Chr.6

TG3420CT200173CT5210SSR28611L21J7a18LEOH10422SSR27627CD5728

TG18338TG17441CT1013845CD5446

CT1097455

SSR4564TG2070CT1003973LEOH22174TG49980

Chr.7

CT10152I0CT103961LEOH704

TG17616LEOH12319LEOH147 CT47CT1019220CT1001521CT92 SSR32722CT1016226TG34931TG30239SSR33544SSR6350TG33051SSR3853CT10367I62

CT26570

CT6882

Chr.8

GP390

TG189

CT14323SSR68 SSR7030CT2015931CT1000436

TG29149CT1002450SSR11056LEOH14458TG55160

CT7471LEOH11772LEOH17074TG42181

SSR333 TG328100

Chr.9

CT100820CT10082I1TG122 CT166CT1067013SSR3416SSR59617CT23419CT10105I30CT10464I31SSR31835CT1136CT1055440CT20342CT10419I43CT10078I CT1070146CT10386I CT1038653

TG40377

TG23388LEOH336 TG6389SSR22391

Chr.10

CT10683I0TG4974LEOH1765

SSR8017TG50822SSR7627

CT10120 CT20244I34

TG147 CT10781CT1091547TG38451CT10737I53CT10615I59

CT2018168TG54672TG3678

TG39389CT1002790

Chr.11

TG1800

TG689

CT21126CT10953I29

TG36038SSR2041TG56547TG11153

LEOH6664LEOH30167

CT156 CT10329I78CT1077879CT10796I LEOH27580LEOH19787CT27688

Chr.12

Tomato SNP and Indel Map

Matthew Robbins

The marker density needed depends on the population

25

( )

0.0

0.5

1.0

1.5

2.0

2.5

0 20 40 60 80 100

Distance of Marker from QTL (in cM)

(MM

-mm

)/a *

DH or F2 or 2xBC1F3F4IRI (n# gen RM=0)IRI (n# gen RM=2)IRI (n# gen RM=5)IRI (n# gen RM=10)IRI (n# gen RM=25)

Pow

er o

f det

ectin

g a

QTL

34.7 cM

2.4 cM

Thus, QTL mapping in an F2…

26

+ from Parent 2+ from Parent 1

1 2 5 6 7 8 9 103 4

…is an association study in a population where the confounding effect of pedigree has been removed.

Sample sizeσ2 = error σ2 + residual genetic σ2

“error σ2 ” and “residual genetic σ2 ” are based on overall mean for each entry.Increasing the number of field replications will reduce error σ2 but not the between-line genetic σ2.To further reduce the denominator in the t-test. We need to add more genetic entries.

Typical split plot: – Sub-plot error: plot-level error σ2

– Plot error: genetic σ2 + (plot-level error σ2)/(rep number)

27

Sample size - Using power calculations (Lynch and Walsh, 1998) -

Assume additive effect only (thus, we can also ignore difference in variance between marker classes)Let the total Phenotypic variance be Assume we want the power to detect QTL for which variation at linked marker loci account for r2 of the total phenotypic varianceIgnore the slight increase of intra class variance due to imperfect linkage between A and M. Thus:

- MM ~N(μMM=γ,σ2)- Mm ~N( 0, σ2)- mm ~N(μmm=-γ,σ2)

The proportion of total phenotypic variance explained by the segregation at the marker locus is:

for Fi, IRI, DH’s

for BC1

28

( )[ ] ( )

[ ] ( )242

22

2

22

Nnnnr

nnNnnr

MmMMP

mmMMP

mmMM

====

=+

=

σγ

σγ

2Pσ

( ) 22 11 σσ ⎟⎟⎠

⎞⎜⎜⎝

⎛+=−

mmMMmmMM nn

zzTHUS :

Sample sizeUsing power calculations (Lynch and Walsh, 1998)

The number of progeny needed to detect an effect at a marker loci that corresponds to r2 of the total phenotypic variance, with a probability of false positive of α, and a probability β of missing a true association can be extracted from the relationship:

29

( )[ ]( )

( ) 114

111 2

12

212

22

=⎟⎟⎠

⎞⎜⎜⎝

⎛+

−⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛+−

−−

βα

γ

σz

r

znnr P

mmMM

(Replace nmm by nmM and 4γ2 by γ2 in the case of BC1)

Sample size (3) Power calculations , examples-

Whenever we have nMM=nmm, we have:

30

( ) [ ]( )( )

( ) [ ]( )( )

2

12

212

2

2

12

212

22

11

112

1

⎟⎟⎠

⎞⎜⎜⎝

⎛+

−=

=⎟⎟⎠

⎞⎜⎜⎝

⎛+

−−

−−

βα

βα

γσ

zr

zr

rNTHUS

zr

znrand P

( )nnnNnr mmMM

P

=== 2

22 2

σγ

This applies to Fi’s, BC1,DH, IRI

α=0.05β=0.10

α=0.01β=0.05

N for r2=0.10

101 171

N for r2=0.05

206 349

N for r2=0.01

1047 1774

False +False -

Proportion σ2P

Statistical power

31

h2=0.10

h2=0.05

Hu and Shu 08

32

Environments vs #genotypes

Schön et al. (2004)Very large population (975 F5 testcrosses in 19 environments) and simulated populationsQTL analyses (PLABQTL)Obtained proportion of phenotypic variance explained by QTLs ( )Derived proportion of genotypic variance explained by all detected QTLs:

Data subdivided to verify impact of number of progenies and number of environments.Used resampling techniques to estimate the amount of bias in detecting QTL (comparing R2 and P in estimation and test data sets (ES vs. TS).

2adjR

22 ˆˆˆ hRp adj=

33

Figure 1. (Schön et al., 2004) Proportion of the genotypic variance explained by detected QTL in estimation sets averaged over all data sets ( ES) for 12 combinations of experimental data PED (N, E), using fivefold standard cross-validation and two significance levels for grain yield, grain moisture, and plant height. Individual columns are partitioned into the genotypic variance explained in test sets (TS, solid bottom) and the bias calculated as the difference ES – TS (shaded top).

34

Figure 2. (Schön et al., 2004)Mean (–), median (o), and 12.5 and 87.5% quantiles of the proportion of the genotypic variance explained in test sets calculated for individual data sets for 12 combinations of experimental data PED (N, E) using fivefold standard cross-validation and LOD 2.5 for grain yield, grain moisture, and plant height.

35

# individuals σG2

environments σ2

Main conclusions of Schön et al.

Adding more genotypes is more efficient than replicating the same genotypes (provided that a minimum number of environments are sampled)h2 and size of effect important Results are trait specific

36

ReferencesDudley, J.W. and R.J. Lambert. 1992. Ninety generations of selection for oil and protein in maize. Maydica 37:81-87.Falconer and McKay. 1996. Introduction to quantitative genetics. 4th ed. Longman group LTD. Essex, UK. pp 464.Laurie, C.C. et al. 2004. The Genetic Architecture of Response to Long-Term Artificial Selection for Oil Concentration in the Maize Kernel. Genetics 168:2141-2155.Liu, B.H. 1998. Statistical Genomics. Linkage, mapping and QTL analysis.CRC press LLC. FL, USA. 611p.Schön, C.C. et al. 2004.Quantitative Trait Locus Mapping Based on Resampling in a Vast Maize Testcross Experiment and Its Relevance to Quantitative Genetics for Complex Traits. Genetics 167: 485-498Tanksley S.D. et al. 1996. Advanced backcross QTL analysis in a cross between an elite processing line of tomato and its wild relative L. pimpinellifolium. Theor. Appl. Genet. 92:213-224.Walsh, B. and Lynch, M. 1998. Genetics and Analysis of Quantitative Traits. Sinauer Associates Inc. MA, USA. 980p.Winkler, C.R. et al. 2003. On the determination of recombination rates in intermated recombinant inbred populations. Genetics 164:741-745.Xiao, J. et al. 1998. Identification of trait-improving quantitative trait loci alleles from a wild rice relative, Oryza rufipogon. Genetics 150:899-909.

37

Carotenoids in NILs: S. pennelli x S. lycopersicum

Confidential 38Liu et al (2003)

top related