lecture 12 genetic linkage analysis and map construction 6764 145.09 1476 145.09 3778 144.61 2252...

55
Lecture 12 Genetic Linkage Analysis and Map Construction

Upload: vodat

Post on 08-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Lecture 12

Genetic Linkage Analysis and Map Construction

Basic Principles in Genetics

2

Rediscovery of Mendel’s hybridization experiments in garden pea

• Experiments with Plant Hybrids (1866) – Seed shape: 5474 round vs 1850 wrinkled – Cotyledon color: 6022 yellow vs 2001 green – Seed coat color: 705 grey-brown vs 224

white – Pod shape: 882 inflated vs 299 constricted – Unripe pod color: 428 green vs 152 yellow – Flower position: 651 axial vs 207 terminal – Stem length: 787 long (20-50cm) vs 277

short (185-230cm) • Rediscovered in 1900

Qualitative traits in genetics

4

Seed shape: round vs wrinkled

Cotyledon color: yellow vs green

Seed coat color: grey-brown vs white

Pod shape: inflated vs constricted

Unripe pod color: green vs yellow

Flower position: axial vs terminal

Stem length: long vs short

The First Mendelian Law

• The Principle of Segregation (The “First Law”).

• For genotype Aa: – From zygote to gamete: ½A

+ ½ a (1:1) – From gamete to zygote: (½A

+ ½a)2 = ¼AA + ½Aa + ¼aa (1:2:1)

The Second Mendelian Law • For two independent loci, i.e. no

linkage • The Principle of Independent

Assortment (The “Second Law”) • For genotype AaBb:

– From zygote to gamete: ¼AB + ¼Ab + ¼aB + ¼ab (1:1:1:1)

– From gamete to zygote: (¼AB + ¼Ab + ¼aB + ¼ab)2 =_AABB +_AABb + _AAbb + _AaBB + _AaBb + _Aabb + _aaBB + _aaBb + _aabb (1:2:1:2:4:2:1:2:1)

The Third Genetics Law: Linkage and Recombination

A

a

B

b P1: AABB P2: aabb A B

a b

F1: AaBb A B

a b

×

A B a b

(1-r)/2 (1-r)/2 A b B a

r/2 r/2 Parental type Parental type Recombinant type Recombinant type

Meiosis

Quantitative traits in genetics

8

0200400600800

1000120014001600

54 56 58 60 62 64 66 68 70 72 74

Num

ber o

f wom

en

Midpoint group value

Distribution of height (inches) among 4995 British women

0

5

10

15

20

25

30

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Num

ber o

f wom

en

Midpoint group value

Ear length (cm) of one maize inbred line (P1)

0

5

10

15

20

25

30

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Num

ber o

f wom

en

Midpoint group value

Ear length (cm) of aonther maize inbred line (P2)

0

5

10

15

20

25

30

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Num

ber o

f wom

en

Midpoint group value

Ear length (cm) of their F1 hybrids

020406080

100120140160

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Num

ber o

f wom

en

Midpoint group value

Ear length (cm) of their F2 hybrids

Genetic Linkage

Chromosome2

2278327.666511326.726512326.262077325.32198324.852279324.852928324.854936324.394935323.934692323.93barc080323gwm1403231504321.546831314.551355314.55724308.51791306.64695269.6696268.686663267.257141267.254031267.257422265.853097262.585446262.585447262.585445262.586234262.583284258.723095254.855448254.855900254.855352250.188507249.726294228.983076213.22717211.815227211.343684210.42Glu79207.62Glu1718206.72cfa2129b204.23barc061203.233713201.385635199.054139197.666134197.66854196.72540196.25515194.841953191.17560190.167982190.164204190.164203190.166107188.77491187.853945187.855322185.535418185.531121185.532668185.532667185.537466185.53767185.533348185.532504185.53gwm131c185.034141184.052222183.581399179.867934179.397504179.394504178.471948178.47361178.476062177.08407173.794099169.411602165.994598164.11274162.232998160.361955158.4wmc406155.981578152.787480152.78466151.86gwm374151.37158150.882881150.881573150.886964150.88BARC187150.888001150.881191150.881566150.887385150.881957150.885976150.886110150.886444150.884715150.888275150.882150150.884764150.883875150.884974150.886260150.884555150.885121150.885133150.88944150.887578150.884093150.886611150.886621150.885593149.944350148.527117147.57817146.64barc065145.78409145.23gwm582143.332655140.522867140.055702139.15435138.621480136.747398135.354242135.356489135.354073135.357399135.35L42820130.14L42678108.96WMC08580.1wmc257c43.95L323480

Chromosome19

L31965389.12L32831341.372879316.56626316.51880312.816989312.81wmc158309.736519305.645337305.647321305.644558305.642552304.722756304.265336303.343979303.345136302.877197302.872570302.873978301.53751301.057306300.127013295.951735295.951022292.663267290.31barc070288.98032287.517093278.23930277.3834277.3WMC168272.654704253.65cfa2049249.941751227.522710215.67472214.754129214.75473214.758390214.757201210.597200210.59cfd035a208.267301202.682513202.687849202.681845202.686472190.461353190.466473190.463832189.993831189.99679178.216331177.74275174.466310174.46274174.467419172.62barc174170.262735166.482507166.482506166.481156159.452576158.53363158.532945156.677769154.811277154.815727154.811477154.352497154.35486151.11334149.727731147.863754147.868122146.485887146.017002145.554043145.555644145.555841145.556159145.556764145.091476145.093778145.092252144.612250144.61gwm631143.22127142.33232142.35860141.834277141.371832140.91834140.97650137.7476137.77651137.71502137.73639137.7635136.792752135.878073135.874837135.871761135.877660135.411946135.41689135.411945135.413668135.414002135.414735135.417749135.417741135.413082135.41448134.931421134.451110132.617046111.325790108.981724108.521424107.65791106.676535104.36668095.47312887.14516787.14227086.2829782.97272350.2272450.2443738.57443838.09839333.89542029.29337122.8587321.8817921.8873721.4173621.4179420.49292918.62402818.15105316.7186616.71436415.79417515.79cfa224012.159044.6475922.7983122.7964242.3267350

Genetic populations and pair-wise linkage analysis

10

Populations handled in QTL IciMapping Parent P1 Parent P2 Legends

HybridizationF1

Selfing1. P1BC1F1 7. F2 2. P2BC1F1

Repeated selfing9. P1BC2F1 13. P1BC1F2 8. F3 14. P2BC1F2 10. P2BC2F1

Doubled haploids15. P1BC2F2 16. P2BC2F2

11. P1BC2RIL 5. P1BC1RIL 4. F1RIL 6. P2BC1RIL 12. P2BC2RIL BC3F1, BC4F1 etc.

P1BC2F1 P1BC1F1 F1 P2BC1F1 P2BC2F1 Marker-assistedselection

19. P1BC2DH 17. P1BC1DH 3. F1DH 18. P2BC1DH 20. P2BC2DH CSS lines orIntrogression lines

P1 × CP P2 × CP P3 × CP … Pn × CP CP=common parent

RIL family 1 RIL family 2 RIL family 3 RIL family i RIL family n

One NAM population

12

Marker C263 R830 R3166 XNpb387 R569 R1553 C128 C1402 XNpb81 C246 R2953 C1447 Grain width (mm)

Position (cM) 0.0 3.5 8.5 19.5 32.0 66.6 74.1 78.6 81.8 91.9 92.7 96.8

RIL1 0 0 0 0 0 0 0 0 0 0 0 0 2.33

RIL2 2 2 2 2 2 0 0 0 0 2 2 2 1.99

RIL3 0 2 2 2 2 2 2 2 2 2 2 2 2.24

RIL4 0 0 0 0 0 0 2 2 2 2 2 2 1.94

RIL5 0 0 0 0 0 2 2 0 0 0 0 0 2.76

RIL6 0 0 0 2 2 2 2 2 2 2 2 2 2.32

RIL7 0 0 0 0 0 0 0 0 0 0 0 0 2.32

RIL8 2 2 0 2 2 0 0 0 0 2 2 2 2.08

RIL9 0 0 0 0 2 2 0 0 0 0 0 0 2.24

RIL10 0 0 0 0 2 2 0 0 0 0 0 0 2.45

Example: 10 RILs in a rice population (Linkage map of Chr. 5)

Genetic markers in linkage analysis

• Morphological traits – Qualitative traits used in Mendel’s

hybridization experiments • Cytogenetic and bio-chemistry markers

(e.g. isozyme) • DNA molecular markers

– RFLP, SSR, SNP etc.

The four gametes (haplotypes) of an F1

14

A

a

B

b

P1: AABB P2: aabb

A B

a b

F1: AaBb

A B

a b

×

A B a b (1-r)/2 (1-r)/2

A b B a r/2 r/2

Parental type Parental type Recombinant type

Recombinant type

Meiosis

Expected genotypic frequency in backcross and DH populations

P1: AABB; P2: aabb

15

P1BC1 P2BC1 DH Samples Theoretical frequency

AABB AaBb AABB n1 f1=(1-r)/2

AABb Aabb AAbb n2 f2=r/2

AaBB aaBb aaBB n3 f3=r/2

AaBb aabb aabb n4 f4=(1-r)/2

MLE of recombination frequency Likelihood function

Logarithm of likelihood

MLE of r

Fisher information

Variance of estimated r

3241

4321

)()1()1(21

21

21)1(

21

!!!!!

4321

nnnnnnnn

rrCrrrrnnnn

nL ++−=

−=

rnnrnnCL ln)()1ln()(lnln 3241 ++−++=

nnn

nnnnnnr 32

4321

32ˆ +=

++++

=

)1()1()ln( 2

32241

2

2

rrn

rnn

rnnE

rdLdEI

−=

+−

−+

−−=−=

nrr

IVr

)ˆ1(ˆ1ˆ

−==

Significance test of linkage • Null hypothesis H0: r = 0.5 (no genetic linkage, or

locus A-a and B-b are independent) • Alternative hypothesis HA: r ≠ 0.5

• Likelihood ratio test (LRT) or LOD score

)1(~])ˆ(

)5.0(ln[2 2 ==

−= dfrL

rLLRT χ

)5.0()ˆ(log

==

rLrLLOD

An example P1BC1 population • Genotypes of two inbred parents P1 and P2

are AABB and aabb • Observed samples of the four genotypes in

P1BC1 – AABB:162;AABb:40;AaBB:41;AaBb:158

18

%20.2040181

15841401624140ˆ ==+++

+=r

4ˆ 1002.4)ˆ1(ˆ −×≈

−=

nrrVr

Test of linkage • Null hypothesis H0: r = 0.5 • Alternative hypothesis HA: r ≠ 0.5

• Likelihood ratio test (LRT) (P<0.0001) and LOD score

19

153

41

103.6)()1(

)5.0()ˆ(

4321

3241

×=−

== +++

++

nnnn

nnnn rrrL

rL

27.708])5.0(

)ˆ(ln[*2 ==

=rL

rLLRT

80.153])5.0(

)ˆ(log[ ==

=rL

rLLOD

Genotypic frequencies in RIL populations, compared with DH

20

DH population

Theoretical frequency

RIL population

Theoretical frequency

AABB f1=(1-r)/2 AABB f1=(1-R)/2

AAbb f2=r/2 AAbb f2=R/2

aaBB f3=r/2 aaBB f3=R/2

aabb f4=(1-r)/2 aabb f4=(1-R)/2

R=2r/(1+2r)

10 RILs in a rice population P1: 0 or A; P2: 2 or B; F1: 1 or H

RIL Marker 1 Marker 2 Parent type or recombinant

C263 XNpb387 RIL1 0 or A 0 or A P1 type RIL2 2 or B 2 or B P2 type RIL3 0 or A 2 or B Recombinant RIL4 0 or A 0 or A P1 type RIL5 0 or A 0 or A P1 type RIL6 0 or A 2 or B Recombinant RIL7 0 or A 0 or A P1 type RIL8 2 or B 2 or B P2 type RIL9 0 or A 0 or A P1 type RIL10 0 or A 0 or A P1 type

n1=6 n2=2 n3=0 n4=2

R=2/10=0.2

r=0.125

LRT=17.72 (P=2.56×10-5)

LOD=3.85

Expected genotypic frequencies in F2 populations

Co-dominant markers Dominant markers Marker type Frequency Marker type Frequency AABB (1-r)2/4 A_B_ [2+(1-r)2]/4 AABb r(1-r)/2 AAbb r2/4 A_bb [1-(1-r)2]/4 AaBB r(1-r)/2 AaBb (1-2r+2r2)/2 Aabb r(1-r)/2 aaBB r2/4 aaB_ [1-(1-r)2]/4 aaBb r(1-r)/2 aabb (1-r)2/4 aabb (1-r)2/4

MLE of r in F2: dominant markers

• Logarithm of the likelihood ratio

• MLE of r

• Variance of the estimated r

2)1( rk −=)21ln()2ln()()23ln(ln 2

92

732

1 rrnrrnnrrnCL +−+−+++−+=

knknnknC ln)1ln()()2ln( 9731 +−++++=

nnnnnnnnn

rk2

)32()32()1( 9

291912 ×+−−±−−−

=−=

)243(2)23)(2(

)21(2)2)(1(

2

22

ˆ rrnrrrr

knkkVr +−

+−−=

+−−

=

MLE of r in F2: co-dominant markers (Newton-Raphson algorithm)

• Log-likelihood function

• The first-order derivative of LogL • f'(r) • The second-order derivative of LogL • f''(r) • The iteration algorithm:

ri+1 = ri - f'(ri)/f''(ri)

)221ln(ln)22(

)1ln()22(lnln2

5738642

864291

rrnrnnnnnnrnnnnnnCL+−+++++++

−++++++=

25738642864291

221)24(22

122ln

rrrn

rnnnnnn

rnnnnnn

drLd

+−−

++++++

+−

+++++==

22

25

2738642

2864291

2

2

)221()44(22

)1(22ln)

rrrrn

rnnnnnn

rnnnnnn

rdLd

+−−

++++++

−−

+++++−==

MLE of r in F2: co-dominant markers (EM algorithm)

• EM for expectation and maximization • E-step: for an initial r0, calculate the probability of

crossover in each marker type • M-step: Update r, and repeat from the E-step

∑=k

kkn GRPnr )|(' 1

Expected probability of crossover Marker type Frequency Expected sample size P(R|G)

AABB f1=(1-r)2/4 n1 = nf1 0

AABb f2=r(1-r)/2 n2 = nf2 0.5

AAbb f3=r2/4 n3 = nf3 1

AaBB f4=r(1-r)/2 n4 = nf4 0.5

AaBb f5=(1-2r+2r2)/2 n5 = nf5 r2/(1-2r+2r2)

Aabb f6=r(1-r)/2 n6 = nf6 0.5

aaBB f7=r2/4 n7 = nf7 1

aaBb f8=r(1-r)/2 n8 = nf8 0.5

aabb f9=(1-r)2/4 n9 = nf9 0

r= [n1×0+ n2×0.5+ n3×1+…+ n8×0.5+ n9×0]/n

Estimated r after 3 EM iterations (r0=0.5)

Geno. Size r0

Exp. Freq. P(R|G) r1

Exp. Freq. P(R|G) r2

Exp. Freq. P(R|G) r3

AABB 30 0.5 0.063 0 0.313 0.118 0 0.198 0.161 0 0.159 AABb 7 0.5 0.125 0.5 0.313 0.107 0.5 0.198 0.080 0.5 0.159 AAbb 1 0.5 0.063 1 0.313 0.024 1 0.198 0.010 1 0.159 AaBB 9 0.5 0.125 0.5 0.313 0.107 0.5 0.198 0.080 0.5 0.159 AaBb 50 0.5 0.250 0.5 0.313 0.285 0.1712 0.198 0.341 0.0577 0.159 Aabb 12 0.5 0.125 0.5 0.313 0.107 0.5 0.198 0.080 0.5 0.159 aaBB 0 0.5 0.063 1 0.313 0.024 1 0.198 0.010 1 0.159 aaBb 10 0.5 0.125 0.5 0.313 0.107 0.5 0.198 0.080 0.5 0.159 aabb 25 0.5 0.063 0 0.313 0.118 0 0.198 0.161 0 0.159

144 1 1 1

Estimated r after 3 EM iterations (r0=0.25)

Geno. Size r0

Exp. Freq. P(R|G) r1

Exp. Freq. P(R|G) r2

Exp. Freq. P(R|G) r3

AABB 30 0.25 0.141 0 0.174 0.171 0 0.154 0.179 0 0.150 AABb 7 0.25 0.094 0.5 0.174 0.072 0.5 0.154 0.065 0.5 0.150 AAbb 1 0.25 0.016 1 0.174 0.008 1 0.154 0.006 1 0.150 AaBB 9 0.25 0.094 0.5 0.174 0.072 0.5 0.154 0.065 0.5 0.150 AaBb 50 0.25 0.313 0.1 0.174 0.357 0.0423 0.154 0.370 0.0319 0.150 Aabb 12 0.25 0.094 0.5 0.174 0.072 0.5 0.154 0.065 0.5 0.150 aaBB 0 0.25 0.016 1 0.174 0.008 1 0.154 0.006 1 0.150 aaBb 10 0.25 0.094 0.5 0.174 0.072 0.5 0.154 0.065 0.5 0.150 aabb 25 0.25 0.141 0 0.174 0.171 0 0.154 0.179 0 0.150

144 1 1 1

Estimated r after 3 EM iterations (r0=0.0)

Geno. Size r0

Exp. Freq. P(R|G) r1

Exp. Freq. P(R|G) r2

Exp. Freq. P(R|G) r3

AABB 30 0 0.250 0 0.139 0.185 0 0.148 0.182 0 0.149 AABb 7 0 0.000 0.5 0.139 0.060 0.5 0.148 0.063 0.5 0.149 AAbb 1 0 0.000 1 0.139 0.005 1 0.148 0.005 1 0.149 AaBB 9 0 0.000 0.5 0.139 0.060 0.5 0.148 0.063 0.5 0.149 AaBb 50 0 0.500 0 0.139 0.380 0.0253 0.148 0.374 0.0292 0.149 Aabb 12 0 0.000 0.5 0.139 0.060 0.5 0.148 0.063 0.5 0.149 aaBB 0 0 0.000 1 0.139 0.005 1 0.148 0.005 1 0.149 aaBb 10 0 0.000 0.5 0.139 0.060 0.5 0.148 0.063 0.5 0.149 aabb 25 0 0.250 0 0.139 0.185 0 0.148 0.182 0 0.149

144 1 1 1

Co-dominant markers in other populations

Marker type

Population F2 P1B1F1 P2B1F1 F1DH P1BC1DH P2BC1DH F1-RIL

AABB (1-r)2/4 (1-r)/2 (1-r) /2 ½+(1-r) 2/4 (1-r) 2/4 (1-R)/2 AABb r(1-r)/2 r/2 AAbb r2/4 r/2 r/2-r2/4 r/2- r2/4 R/2 AaBB r(1-r)/2 r/2 AaBb (1-2r+2r2)/2 (1-r)/2 (1-r)/2 Aabb r(1-r)/2 r/2 aaBB r2/4 r/2 r/2-r2/4 r/2-r2/4 R/2 aaBb r(1-r)/2 r/2 aabb (1-r)2/4 (1-r)/2 (1-r) /2 (1-r) 2/4 ½+(1-r) 2/4 (1-R)/2

R=2r/(1+2r)

More populations (e.g. BC1F2, F3 etc): Generation transition matrix of

Parent Genotype and frequency in self-pollinated progeny AABB AABb AAbb AaBB AB/ab Ab/aB Aabb aaBB aaBb aabb

AABB 1 AABb 0.25 0.5 0.25 AAbb 1

AaBB 0.25 0.5 0.25 AB/ab (1-r)2/4 r(1-r)/2 r2/4 r(1-r)/2 (1-r)2/2 r2/2 r(1-r)/2 r2/4 r(1-r)/2 (1-r)2/4 Ab/aB r2/4 r(1-r)/2 (1-r)2/4 r(1-r)/2 r2/2 (1-r)2/2 r(1-r)/2 (1-r)2/4 r(1-r)/2 r2/4 Aabb 0.25 0.5 0.25 aaBB 1 aaBb 0.25 0.5 0.25 aabb 1

Distortion has little effect on linkage analysis!

DH pop Theo. Freq. Distortion Freq. in distortion AABB f1=(1-r)/2 (1-r)/2 (1-r)/(1+s)

AAbb f2=r/2 r/2 r/(1+s)

aaBB f3=r/2 s×r/2 r×s/(1+s)

aabb f4=(1-r)/2 s×(1-r)/2 (1-r)×s/(1+s)

Sum 1 (1+s)/2 1

rssrssrsrr =++=+×++= )1/()1()1/()1/(ˆ

Three-point analysis and linkage map construction

33

Linkage analysis of three markers

• When (no interference),

• When (complete interference),

• The order of the three loci can be determined after linkage analysis (3!/2=3 potential orders)

– 1—2—3, or 1—3—2, or 2—1—3 34

2312231213 12 rrrrr )( δ−−+=0=δ

2312231213 )1)(1()1( rrrrr +−−=−

1=δ

231223122312231213 2)1()1( rrrrrrrrr −+=−+−=

231213 rrr +=

Mapping distance and recombination frequency

• Mapping distance • Unit of mapping distance

– M (Morgan) or cM (centi-Morgan), 1M=100cM

• The function of mapping distance on recombination frequency (Mapping function):

35

231213 mmm +=

)(rfm =

Common mapping functions

36

Morgan function (complete interference)

• In M: m =r (M) • In cM: m =r ×100 (cM)

• Haldane function (no interference) • In M: • In cM:

• Kosambi function (interference depends on length of interval) • In M:

• In cM:

)1( 221 mer −−=

)21ln(50)( rrfm −−== )1( 50/21 mer −−=

rrm

2121ln

41

−+

=11

21

4

4

+

−=

ee

m

m

r

rrm

2121ln25

−+

=1

121

25/

25/

+

−=

ee

m

m

r

)21ln()( 21 rrfm −−==

Comparison of the three functions

0.0

0.5

1.0

1.5

2.0

0 0.1 0.2 0.3 0.4 0.5

重组率

图距

(M)

Haldane作图函数

Kosambi作图函数

Morgan作图函数

37

Recombination frequency

Map

ping

dis

tanc

e (c

M)

Genetic Linkage Map

38

C112XNpb346

3.1

XNpb543.9

R32035.6

G22002.2

C86

10.4

C3029C

16.3

C23403.1

C8131.5

XNpb3437.5

XNpb3930.8

C466B1.5

C13704.6

R1012

11.8

R886

12.8

R14854.6

XNpb3021.5

R26354.7

XNpb297-21.6

R1468C0.8

Y2820R1.5

R19280.7

R2159.7

XNpb3644.1

C9042.4

Y5714L0.7

XNpb2522.4

XNpb87-22.4

R2105.9

C12114

C95517

R3192

14.3

R19443.8

R16132.2

XNpb2167.8

C9700.8

XNpb89-3C1470

4.0

C9781.5

Ky65.9

XNpb2500.8

C5604.2

C3703.1

C6017.7

XNpb2043.3

R4185.2

R33930.8

G11850.8

C12368.9

R4270.8

C920

8.9

XNpb670.8

C6217.2

C796A6.2

XNpb132

9.0

C7774.6

R19894.8

G13404.8

C535B0.7

R7123.9

XNpb2274.3

C132

10.7

XNpb2233.1

R4594.1

C259D5.7

XNpb3491.6

V83B

12.3

XNpb212-2XNpb48

3.2

C393B4.2

XNpb1644.4

C393A6.3

XNpb2325.0

G10155.9

XNpb2494.2

C14685.0

C1351

9.6

R19

16.6

G1316

9.6

C803.1

C16775.8

C3613.1

C14523.2

C1454B6.4

R31565.2

XNpb621.5

XNpb2343.9

XNpb2380.8

C161B2.3

XNpb3925.5

R27784.5

C5633.1

XNpb184

8.3

R518

9.2

XNpb279

11.0

C7211.6

R1468B4.9

R1468A0.9

C445R416

2.3

C10164.7

XNpb1772.4

C6002.3

Ky4.9

R27378.4

Ky53.4

XNpb3313.3

XNpb1140.8

R27832.4

C335

11.8

C62126.4

XNpb493.2

XNpb3117.2

C8912.3

C9753.9

XNpb237

14.2

R2882.6

C946

10.1

XNpb2474.0 R2373

2.3

C263R830

3.5

R31665.0

XNpb71

9.5

XNpb3871.6

Y1060L1.7

R569

10.7

XNpb251

10.3

G14585.9

XNpb105

10.3

R1553

8.0

C1287.6

C14024.5

XNpb813.2

C246

10.1

R29530.8

C14474.1

XNpb209

C688

9.0

XNpb277.6

XNpb165-15.7

R2171

17.3

R688

9.0

G20285.7

C1677B6.0

XNpb386

16.0

XNpb135

10.7

C9620.8

XNpb342

8.5

C6071.6

Three steps in map construction

• Step 1: Grouping. Grouping can be based on – (i) a threshold of LOD score – (ii) a threshold of marker distance (cM) – (iii) anchor information

Three steps in map construction

• Step 2: Ordering. Three ordering algorithms are – (i) SER: SERiation (Buetow and Chakravarti, 1987. Am J

Hum Genet 41:180–188) – (ii) RECORD: REcombination Counting and ORDering

(Van Os et al., 2005. Theor Appl Genet 112: 30–40) – (iii) nnTwoOpt: nearest neighbor was used for tour

construction, and two-opt was used for tour improvement, similar to Travelling Salesman Problem (TSP) (Lin and Kernighan, 1973. Oper. Res. 21: 498–516.

– Or by the input order, say we know the order from physical map for GBS markers

Travelling Salesman Problem (TSP)

• A salesman is required to visit each of n given cities once and only once, starting from any city and returning to the original place of departure. What tour he choose in order to minimize his total travel distance?

• The distance between any pair of cities are assumed to be known by the salesman. Distance can be replaced by another notion, such as time or money.

• TSP is one of the most widely studied problems in combinatorial optimization. It is easy to state, but hard to solve! TSP is an NP-hard problem, i.e. non-deterministic polynomial-time hard.

Finding the solutions

• TSP is represented by some letters plus the number of cities. For example, there are 24978 Sweden cities in TSP “sw24978”.

The solution for TSP “sw24978”

Approximate algorithm of TSP

• Tour construction algorithms

• Tour improvement algorithms

• Composite algorithms

Nearest-neighbor (nn) algorithm for tour construction

• A simple algorithm for tour construction • Start in an arbitrary city. As long as there are cities,

that have not yet been visited, visit the nearest city that still not appeared in the tour. Finally, return to the first city.

• This approach is simple, but often too greedy! • The first distances in the construction process are

reasonable short, whereas the distance at the end of the process usually will be rather long.

X X+1

Y Y+1

Y X+1

X Y+1

Two-Opt algorithm for tour improvement (Lin and Kernighan, 1973)

• Due to the large number of markers (n), it is impossible to compare all possible orders (say n=50, possible orders are n!/2=1.52x1064). Orders from the above algorithms are regional optimizations.

• Step 3: Rippling. Five rippling criteria are – (i) SARF (Sum of Adjacent Recombination

Frequencies) – (ii) SAD (Sum of Adjacent Distances) – (iii) SALOD (Sum of Adjacent LOD scores) – (iv) COUNT (number of recombination events)

• Universe criteria: shortest map!

Three steps in map construction

Linkage map and physical map

48

Species Size of haploid genome (kb)

Size of linkage map (cM)

kb/cM

Yeast 2.2×104 3700 6 Neurospora 4.2×104 500 80 Arabidopsis 7.0×104 500 140 Drosophila 2.0×105 290 700 Tomato 7.2×105 1400 510 Human 3.0×106 2710 1110 Wheat 1.6×107 2575 6214 Rice 4.4×105 1575 279 Corn 3.0×106 1400 2140

The MAP functionality in QTL IciMapping

49

Coding of co-dominant marker (P1 and P2 bands are both present in F1)

When heterozygote F1 is present, P1 is coded as 2, F1 is coded as 1, and P2 is coded as 0. When heterozygote F1 is absent, P1 is coded as 2, and P2 is coded as 0.

P1: MM MM

Code 2 1 0 2 1 0

Populations with heterozygousity, i.e., 1, 2, 7,

8, 9, 10, 13, 14, 15, 16

Populations with no heterozygousity, i.e., 3, 4, 5,

6, 11, 12, 17, 18, 19, 20

2 0

P2: mm F1: Mm mm Mm

Two parental lines P1 ( MM), P2 (mm) and their F1 hybrid

(Mm) MM mm

P1: MM MM

Code 2 1 0 2 1 0

Populations with heterozygousity, i.e., 1, 2, 7,

8, 9, 10, 13, 14, 15, 16

Populations with no heterozygousity, i.e., 3, 4, 5,

6, 11, 12, 17, 18, 19, 20

2 0

P2: mm F1: Mm mm Mm

Two parental lines P1 ( MM), P2 (mm) and their F1 hybrid

(Mm) MM mm

Coding of dominant marker (F1 band is the same as the P1 band)

P1: MM MM

Code 12 12 0 2 1 0

Populations with heterozygousity, i.e., 1, 2, 7,

8, 9, 10, 13, 14, 15, 16

Populations with no heterozygousity, i.e., 3, 4, 5,

6, 11, 12, 17, 18, 19, 20

2 0

P2: mm F1: Mm mm Mm

Two parental lines P1 ( MM), P2 (mm) and their F1 hybrid

(Mm) MM mm

P1: MM MM

Code 12 12 0 2 1 0

Populations with heterozygousity, i.e., 1, 2, 7,

8, 9, 10, 13, 14, 15, 16

Populations with no heterozygousity, i.e., 3, 4, 5,

6, 11, 12, 17, 18, 19, 20

2 0

P2: mm F1: Mm mm Mm

Two parental lines P1 ( MM), P2 (mm) and their F1 hybrid

(Mm) MM mm

When heterozygote F1 is present, P1 and F1 are coded as 12, and P2 is coded as 0. When heterozygote F1 is absent, P1 is coded as 2, and P2 is coded as 0.

Coding of recessive marker (F1 band is the same as P2 band)

P1: MM MM

Code 2 10 10 2 1 0

Populations with heterozygousity, i.e., 1, 2, 7,

8, 9, 10, 13, 14, 15, 16

Populations with no heterozygousity, i.e., 3, 4, 5,

6, 11, 12, 17, 18, 19, 20

2 0

P2: mm F1: Mm mm Mm

Two parental lines P1 ( MM), P2 (mm) and their F1 hybrid

(Mm) MM mm

P1: MM MM

Code 2 10 10 2 1 0

Populations with heterozygousity, i.e., 1, 2, 7,

8, 9, 10, 13, 14, 15, 16

Populations with no heterozygousity, i.e., 3, 4, 5,

6, 11, 12, 17, 18, 19, 20

2 0

P2: mm F1: Mm mm Mm

Two parental lines P1 ( MM), P2 (mm) and their F1 hybrid

(Mm) MM mm

When heterozygote F1 is present, P1 is coded as 2, and F1 and P2 are coded as 10. When heterozygote F1 is absent, P1 is coded as 2, and P2 is coded as 0.

Interface of the MAP functionality

Marker Summary

Display Window

Linkage Map Display Window

Parameter setting

Map outputs:

Linkage map for each chromosome (A) or all

chromosomes (B)

A. Map of one chromosome B. Map of all chromosomes

An example in genetics: A resistance gene is linked with a molecular marker

F2 population Resistant Susceptible Marker type A H B A H B Sample size 572 1161 14 3 22 569 Marker types A and B are parental types; H is the type of F1 hybrid

Resistant and susceptible can be fitted by the 3:1 ratio (one dominance gene locus): χ2=0.17 (P=0.68). Marker types A, H, and B can be fitted by the 1:2:1 ratio (one co-dominance gene locus) : χ2=0.32 (P=0.57)

But Resistance and Marker are not independent, i.e. can not be fitted by the 3:6:3:1:2:1 ratio: