introduction to association mapping. we have a set of inbred lines or varieties we have genotyped...

35
INTRODUCTION TO ASSOCIATION MAPPING

Upload: harriet-joseph

Post on 05-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

INTRODUCTION TO ASSOCIATION MAPPING

Page 2: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

• We have a set of inbred lines or varieties

• We have genotyped them with a large set of markers

• We also have phenotypic data of the lines for several traits

• And now What?

Page 3: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

• We will take advantage of the Linkage Disequilibrium (LD) to identify genetic regions associated with our trait of interest

• Association mapping is also called Linkage Disequilibrium mapping

Page 4: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Identify associations between markers and phenotypes without the need to develop specific populations

Marker Distance

Lin

e 1

Lin

e 2

Lin

e 3

Lin

e 4

Lin

e 5

Lin

e 6

Lin

e 7

Lin

e 8

Lin

e 9

Lin

e

10

Lin

e

11

Lin

e

12

Lin

e

13

Lin

e

14

Lin

e

15

Lin

e

16

_3_0363_ 0 A B B A A A B A B B A B B B B B_1_1061_ 0.8 A B B A A A B A B B A A A B B A_3_0703_ 1.5 B A A B B B A B A A B B B B B B_1_1505_ 1.5 B A A B B B A B A B B B B B B B_1_0498_ 1.5 B B B B B B B B B B B B B B B A_2_1005_ 3.8 A B B A A A B A B A A B B B B B_1_1054_ 3.8 A A A A A A A A A B A A A A A A_2_0674_ 6 A B B A A A B A B A A A A A A B_1_0297_ 8.8 A A B B B B B A A A A A A A A B_1_0638_ 10.7 A A B B B B B A A B A A A A A A_1_1302_ 11.4 B A A A B B A A A B A B B B B A_1_0422_ 11.4 B A A A B B A A A B A B B B B A_2_0929_ 15.3 A B B B A A B B B A B A A A A B_3_1474_ 15.4 A B B B A A B B B A B A A A A A_1_1522_ 17.3 A B B B A A B B B A B A A A A A_2_1388_ 17.3 A A A A A A A A A A A A A A A A_3_0259_ 18.1 B B B B B B B B B B B A A A A A_1_0325_ 18.1 B B B B B B B B B B B A A A A A_2_0602_ 20.8 A A B A A A A B A B A A A A A A_1_0733_ 23.9 B B B B B B B B B B B A A A A A_2_0729 23.9 B B B B B B B B B B B A A A A A_1_1272_ 23.9 A B B B A A B B B B B B B B B B_2_0891_ 26.1 A A A A A A A A A B A A A A A A_2_0748_ 26.6 B B B B B B B B B A B B B B B B_3_0251_ 27.4 A B A A A B A A A B A A A B A A_1_0997_ 35.5 B B A A A B B B B B B B B B B B_1_1133_ 41.8 B B A A A B B B B A B A A A A A_2_0500_ 42.5 A A A A A A A A A B A B B B B B_3_0634_ 43.3 B B B B B B B B B A B A A A A A

0

10

5Desease severity

Page 5: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

• Definition of Linkage Disequilibrium is very simple:

is the ‘non-random association of alleles at different loci’

A B

A B

a b

a b

Locus 1 Locus 2

A B

A b

a B

a b

Locus 1 Locus 2

Equilibrium Disequilibrium

Page 6: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Random mating population with loci segregating independently

Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 Locus 1 Locus 2 Locus 3 Locus 4 Locus 5

Non random mating population LD due to selection, mutation,

drift/sampling, population structure

EquilibriumDisequilibrium

Page 7: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

How do we measure LD?

• The LD is measured with a parameter called D.

• If alleles at different loci are not inherited independently, then:

PAB ≠ PA x PB and DAB = PAB – PA x PB

(PA and PB are allele frequencies and PAB is the haplotype frequency)

Standarized measures of LD: D’ and r2

bBaA

AB

PPPP

Dr

22 )(

)PP- ,P(-Pmax '

baBA ABD

D

)PP ,P(min'

BabA

P

DD AB

for D < 0

for D > 0

Page 8: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

aAaaaaAaaaaaAaaAaAaAAaaaaaAaAA

bBBbbbBbBbbBBbBBbBbBbbbbbBBBBB

Locus 1 Locus 2

123456789101112131415161718192021222324252627282930

Line

Allele frequencies:

PA= 10/30

Pa= 20/30

PB= 15/30

Pb= 15/30

Haplotype frequencies:

PAB= 9/30

PaB= 6/30

PAb= 1/30

Pab= 14/30

DAB = PAB – PA x PB = 9/30 – (10/30 x 15/30) = 0.13

32.0

3015

3015

3020

3010

)13.0()( 222

bBaA

AB

PPPP

Dr

8.0

3015

3010

13.0

)PP ,P(min'

BabA

P

DD AB

Page 9: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Spring barley – Two rows – Chromosome 5H

0

0.2

0.4

0.6

0.8

1

1.2

0 100000 200000 300000 400000 500000 600000 700000 800000 900000

Distance (bp)

r2

Page 10: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic
Page 11: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Humans 80kb (Europeans)

5kb (Nigerians)

Outcrossing

Cattle > 10 cM Outcrossing

Arabidopsis 250 kb Selfing

Maize 1 kb (Diverse maize)

1.5 kb (diverse inbred lines)

>100 kb (Elite lines

Outcrossing

Barley Up to 100kb Selfing

Flint-Garcia et al., Annu. Rev. Plant Biol. 2003. 54:357–74

Extension of LD

Page 12: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Factors that increase LD:•mutation

•mating system (self-pollination),

•population structure

•admixture

•relatedness (kinship)

•small founder population size or genetic drift

•selection (natural, artificial, and balancing)

Factors that decrease LD:•high recombination and mutation rate

•recurrent mutations

•outcrossing

Page 13: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Mutation:

provides the original material for producing polymorphism that will be in LD

A B

A B

a B

a B

Locus 1 Locus 2

A B

A B

a B

a B

A b

Allele b appears on gamete carrying A

A and b will appear together

Page 14: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 4 7

10

13

16

19

22

25

28

31

34

37

40

Generation

D'

0.05 0.00

0.05 0.99

0.25 0.00

0.25 0.99

0.50 0.00

0.50 0.99

Outcrossing = 0.00Selfing = 0.99

Little recombination = 0.05High recombination = 0.5

Selfing, little or no recombination

Mating system:

•Generally LD decays more rapidly in outcrossing species compared to selfing, where individuals are likely to be homozygous

•In selfing species, most recombination occurs between identical haplotypes, as a result of high individual homozygosity, and thus these events do not reduce LD

•Selfing reduces the rate at which LD breaks down

•When loci are closely linked in a selfing population they remain in high LD for many generations

Outcrossing, high recombination

Page 15: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Drift / Sampling

•In small populations the effects of genetic drift results in the loss of rare allelic combination, which increases LD.

•Sampling increases or reduces certain allelic combinations by chance

Selection

•Strong selection at a locus is expected to reduce diversity and increase LD in the surrounding region

•Selection operating on a gene will increase LD and reduce diversity in the vicinity of that gene. Alleles flanking the selected gene will be fixed.

•Can cause LD also between unlinked loci: typical result of coselection of loci during breeding for multiple traits

Page 16: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

LOD LOD

Page 17: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

LOD LOD

Page 18: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic
Page 19: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

What information we need to know the association mapping analysis?

• Genotypic:

•Linkage disequilibrium decay

•Number of markers and Marker density

•Quality of the data: missing values, minor allele frequency

• Phenotypic:

• Quantitative or qualitative traits

• Heritability of the trait, repeatability

• Population:

• Structure

• Kinship

Page 20: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Genotypic Information:

•Linkage disequilibrium decay.

•The power of detection is highly influenced by the LD between the QTL and the marker

Physical distance Physical distance

r2 r2

10 kb 100 kb

Page 21: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Marker density

•The extend of LD shows the expected r2 at a given distance

•According to it, it is important to chose an adequate marker density to increase the power of detection

Physical distance Physical distance

r2 r2

10 kb 100 kb

Page 22: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Quality of the data:

•Number of individuals: with small samples sizes, the probability of a significant association between maker and QTL is high.

Marker Distance

Lin

e 1

Lin

e 2

Lin

e 3

Lin

e 4

Lin

e 5

Lin

e 6

Lin

e 7

Lin

e 8

Lin

e 9

Lin

e

10

Lin

e

11

Lin

e

12

Lin

e

13

Lin

e

14

Lin

e

15

Lin

e

16

_3_0363_ 0 A B B A A A B A B B A B B B B B_1_1061_ 0.8 A B B A A A B A B B A A A B B A_3_0703_ 1.5 B A A B B B A B A A B B B B B B_1_1505_ 1.5 B A A B B B A B A B B B B B B B_1_0498_ 1.5 B B B B B B B B B B B B B B B A_2_1005_ 3.8 A B B A A A B A B A A B B B B B_1_1054_ 3.8 A A A A A A A A A B A A A A A A_2_0674_ 6 A B B A A A B A B A A A A A A B_1_0297_ 8.8 A A B B B B B A A A A A A A A B_1_0638_ 10.7 A A B B B B B A A B A A A A A A_1_1302_ 11.4 B A A A B B A A A B A B B B B A_1_0422_ 11.4 B A A A B B A A A B A B B B B A_2_0929_ 15.3 A B B B A A B B B A B A A A A B_3_1474_ 15.4 A B B B A A B B B A B A A A A A_1_1522_ 17.3 A B B B A A B B B A B A A A A A_2_1388_ 17.3 A A A A A A A A A A A A A A A A_3_0259_ 18.1 B B B B B B B B B B B A A A A A_1_0325_ 18.1 B B B B B B B B B B B A A A A A_2_0602_ 20.8 A A B A A A A B A B A A A A A A_1_0733_ 23.9 B B B B B B B B B B B A A A A A_2_0729 23.9 B B B B B B B B B B B A A A A A_1_1272_ 23.9 A B B B A A B B B B B B B B B B_2_0891_ 26.1 A A A A A A A A A B A A A A A A_2_0748_ 26.6 B B B B B B B B B A B B B B B B

0

10

5Desease severity

Page 23: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Quality of the data:

•Number of individuals: with small samples sizes, the probability of a significant association between maker and QTL is high.

Marker Distance

Lin

e 1

Lin

e 2

Lin

e 3

Lin

e 4

Lin

e 5

Lin

e 6

Lin

e 7

Lin

e 8

_3_0363_ 0 A B B A A A B A_1_1061_ 0.8 A B B A A A B A_3_0703_ 1.5 B A A B B B A B_1_1505_ 1.5 B A A B B B A B_1_0498_ 1.5 B B B B B B B B_2_1005_ 3.8 A B B A A A B A_1_1054_ 3.8 A A A A A A A A_2_0674_ 6 A B B A A A B A_1_0297_ 8.8 A A B B B B B A_1_0638_ 10.7 A A B B B B B A_1_1302_ 11.4 B A A A B B A A_1_0422_ 11.4 B A A A B B A A_2_0929_ 15.3 A B B B A A B B_3_1474_ 15.4 A B B B A A B B_1_1522_ 17.3 A B B B A A B B_2_1388_ 17.3 A A A A A A A A_3_0259_ 18.1 B B B B B B B B_1_0325_ 18.1 B B B B B B B B_2_0602_ 20.8 A A B A A A A B_1_0733_ 23.9 B B B B B B B B_2_0729 23.9 B B B B B B B B_1_1272_ 23.9 A B B B A A B B_2_0891_ 26.1 A A A A A A A A_2_0748_ 26.6 B B B B B B B B

0

10

5Desease severity

Page 24: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Quality of the data: Minor allele frequency

aaaaaaaaaaaaaaaAaaaaa

bbbbbbbbbbbBbbbbbbbbb

Locus 1

123456789101112131415161718192021

Line Locus 2

Two loci can be completely unlinked and still show high LD

Line Phenotype: heading date

bbBbb

bbbbb

bbbbBbbbbbbbbbbb

21

20191817

2625242322

16151413121110987654321

6358

1545864

645758

15360

14958

15160

1525859

151645760

1505859

15262

Locus 1

Bbbbb

bbbbb

BbbbbBBbbBbBbbbb

Locus 2

Locus 1:

Average allele b: 78.8

Average allele B: 152

Locus 2:

Average allele b: 87.7

Average allele B: 89.3

Page 25: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Quality of the data: Missing data

Line Phenotype: heading date

21

20191817

2625242322

16151413121110987654321

6358

1545864

645758

15360

14958

15160

1525859

151645760

1505859

15262

Locus 1

Bbbbb

bbbbb

BbbbbBBbbBbBbbbb

Locus 2

Locus 1:

Average allele b: 76.2

Average allele B: 102.8

Locus 2:

Average allele b: 87.7

Average allele B: 89.3

-b-bb

bbbbb

Bb-bb-BbbBbBbb-b

Page 26: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

What information we need to know the association mapping analysis?

• Genotypic:

•Linkage disequilibrium decay

•Number of markers and Marker density

•Quality of the data: missing values, minor allele frequency

• Phenotypic:

• Quantitative or qualitative traits

• Heritability of the trait, repeatability

• Population:

• Structure

• Kinship

Page 27: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Phenotypic:

• Quantitative or qualitative traits

•One or more QTL involved

•The higher the effect of the QTL, the higher the power of detection

•Quantitative traits: usually many genes involved of small effect

•The problem of epistatic traits

h2=Vgenotipic/Vphenotypic

Heritability of the trait, repeatability

Page 28: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

The problem of epistatic traits

Line Phenotype: heading date

aaAaA

aaaAa

AaAaaaaAaaaAaaAa

21

20191817

2625242322

16151413121110987654321

6358

1545864

645758

15360

14958

15160

1525859

151645760605859

15262

VRN1

DcccD

ccccc

cccccDDccDcDcccc

VRN2

VRN1 and VRN2 located in different chromosomes

No association between individuals genes (VRN1 or VRN2) and heading date

However, late heading date only when haplotype Ac is present

Page 29: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

What information we need to know the association mapping analysis?

• Genotypic:

•Linkage disequilibrium decay

•Number of markers and Marker density

•Quality of the data: missing values, minor allele frequency

• Phenotypic:

• Quantitative or qualitative traits

• Heritability of the trait, repeatability

• Population:

• Structure

• Kinship

Page 30: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Population Structure:

• Study of type 2 diabetes in 2 tribes of Native Americans from Arizona

• A correlation between a haplotype at the immunoglobulin G locus and reduced diabetes

• However on further analysis it was found that those with diabetes had a lower proportion of European ancestry

• And that the haplotype associated with reduced diabetes was more prevalent in Europeans

• When the analysis was restricted to individuals with similar European ancestry, the association was no longer detected.

Knowler WC, et al. 1988. Am. J.Hum. Genet. 43:520–26

The classical example of interference by population structure

Page 31: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Population Structure

•Similar structure exists in plants

•Breeding history of many important crop species and limited gene flow have created complex stratification within the germplasm.

•Different geographic origin of the germplasm causes population structure (usually natural selection tends to fix alleles at many loci related to adaptation).

•Also the destination of the crop, growth habit, certain morphological traits.

•This is a common cause of spurious associations

Page 32: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

How can we allocate individuals to sub-populations?

•First, we need to know in advance how many sub-populations there are.

•If unknown, this can be estimated:

•The allocation process is repeated for different possible numbers and the best fitting selected.

Page 33: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

The computer program STRUCTURE

• Uses computationally intensive methods to partition individuals into populations.

• Many individuals or lines will not belong uniquely to one, but will be the descendents of crosses between two or more ancestral populations.

• STRUCTURE also estimates the proportion of ancestry attributable to each population.

Page 34: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

Line Q1 Q2 Q32B96-5038 0.983 0.007 0.0102B98-5312 0.997 0.002 0.0016B00-1526 0.001 0.001 0.9986B02-3394 0.001 0.001 0.9986B94-7378 0.004 0.014 0.9826B94-8253 0.035 0.026 0.9396B97-2245 0.003 0.035 0.96188Ab536 0.003 0.275 0.72188Ab536-B 0.004 0.274 0.722AC_Metcalfe 0.992 0.005 0.003Arapiles 0.773 0.220 0.007B1202 0.928 0.061 0.012B1215 0.997 0.002 0.001B1602 0.034 0.101 0.865B1614 0.018 0.144 0.838Baronesse 0.997 0.002 0.002BCD47 0.768 0.194 0.038Belford 0.080 0.888 0.032Bison-1H 0.993 0.005 0.002Bison-1H+4H 0.873 0.053 0.074Bison-1H+5H 0.996 0.003 0.001Bison-4H 0.985 0.012 0.003Bison-4H+5H 0.991 0.005 0.004Bison-5H 0.995 0.003 0.002Bison-7H 0.995 0.003 0.002Bowman 0.806 0.017 0.178C-14 0.713 0.284 0.003Canela 0.390 0.571 0.038CDC_Copelan 0.971 0.007 0.023

Page 35: INTRODUCTION TO ASSOCIATION MAPPING. We have a set of inbred lines or varieties We have genotyped them with a large set of markers We also have phenotypic

The effect of kinship:

y = Xß + Qv + Zu + e

Xß includes all fixed effects: population means, environments, and marker allele effects

Q is a subpopulation incidence matrix; v are estimates of subpopulation mean effects

There is a degree of relatedness not captured by population structure:u is the polygenic effect gnerated by othre loci unlinked to the one being tested