introduction to linkage analysis pak sham twin workshop 2003

Introduction to Linkage Introduction to Linkage AnalysisAnalysis

Pak Sham

Twin Workshop 2003

Human GenomeHuman Genome

• 22 autosomes, XY

3 109 base-pairs (2 metres long)

2% coding sequences, rest regulatory & “junk”

30,000 - 40,000 genes

• Much communality with other species

Genetic VariationGenetic Variation

• Chromosomal abnormalities• Duplication (e.g. Down’s)

• Deletion (e.g. Velo-cardio-facial syndrome)

• Major deleterious mutations

• Usually Rare (e.g. Huntington’s)

• Polymorphisms

• Single nucleotide polymorphisms (SNPs)

• Variable length repeats (e.g. microsatellites)

• Some are functional (“normal variation”)

• Most are non-functional (neutral markers)

Genetic Mapping of DiseaseGenetic Mapping of Disease

• Levels of Genetic Analysis• Estimate heritability (family, twins, adoption)

• Find chromosomal locations (linkage)

• Identify risk variants (association)

• Understand mechanisms (cell biology, etc)

• Applications• Prediction of genetic risk

• More accurate prediction of genetic risk

• Even more accurate prediction of genetic risk; prediction of prognosis and treatment response

• Development of new drug targets

Strategies of Gene MappingStrategies of Gene Mapping

• Functional• Uses knowledge of disease to identify candidate genes

• Finds variants in candidate genes

• Looks for association between variants and disease

• Positional• Systematic screen of whole genome

• Uses a set of 400 evenly-spaced markers

• Looks for markers which con-segregate with disease

Co-segregationCo-segregation

A2A4

A3A4

A1A3

A1A2

A2A3

A1A2 A1A4 A3A4 A3A2

Marker allele A1

cosegregates withdominant disease

Linkage Linkage Co-segregationCo-segregation

Parent Gametes

Alleles on the same chromosome tend to be staytogether in meiosis; therefore they tend be co-transmitted.

Crossing over between Crossing over between homologous chromosomes homologous chromosomes

Map DistanceMap Distance

Map distance between two loci (Morgans)

= Expected number of crossovers per meiosis

(1 Morgan = 100 centiMorgans)

Note: Map distances are additive

Heterogeneity in recombination frequencies

Total map length: 33

(1 cM 106 base pairs)

RecombinationRecombinationA1

A2

Q1

Q2

A1

A2

Q1

Q2

A1

A2 Q1

Q2

Non-recombinants1-

Recombinants

Parental genotypes

Recombination FractionRecombination Fraction

Recombination fraction () between two loci

= Proportion of gametes that are recombinant with respect to the two loci

Recombination & map Recombination & map distancedistance

2

1 2me

Haldane mapfunction

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 0.2 0.4 0.6 0.8 1

Map distance (M)

Re

co

mb

ina

tio

n f

rac

tio

n

Double Backcross :Double Backcross :Fully Informative GametesFully Informative Gametes

AaBb aabb

AABB aabb

AaBb aabb Aabb aaBb

Non-recombinant Recombinant

Linkage Analysis :Linkage Analysis :Fully Informative GametesFully Informative Gametes

Count Data Recombinant Gametes: RNon-recombinant Gametes: N

Parameter Recombination Fraction:

Likelihood L() = R (1- )N

Estimation

Chi-square)(ˆ RNR

)5log(.)(

)1log(log22

NR

NR

Phase Unknown MeiosesPhase Unknown Meioses

AaBb aabb

AaBb aabb Aabb aaBb

Non-recombinant Recombinant

Recombinant Non-recombinant

Either :

Or :

Mixture distribution Mixture distribution likelihoodlikelihood

The probability of observed data X depend on thestatus of descrete variable G

P(X|G)

The status of G is not observed but the probabilitydistribution of G is available

P(G)

Then the likelihood of the observed data X is

G

GPGXPL )()|(

Linkage Analysis :Linkage Analysis :Phase-unknown MeiosesPhase-unknown Meioses

Count Data Recombinant Gametes: XNon-recombinant Gametes: Y

or Recombinant Gametes: YNon-recombinant Gametes: X

Likelihood L() = X (1- )Y + Y (1- )X

An example of incomplete data :

Mixture distribution likelihood function

Parental genotypes unknownParental genotypes unknown

Likelihood will be a function of

allele frequencies (population parameters)

(transmission parameter)

AaBb aabb Aabb aaBb

Complex PhenotypesComplex Phenotypes

Penetrance parameters

Genotype Phenotype

f2AA

aa

Aa

Disease

Normal

f1

f0

1- f2

1- f1

1- f0

Each phenotype is compatible with multiple genotypes.

General Pedigree LikelihoodGeneral Pedigree Likelihood

Likelihood is a sum of products (mixture distribution likelihood)

n

f

imf

f

i

G

n

gggtransgpopgxpenL iiii

111

)|()()|( ,

number of terms = (m1 m2 …..mk)2n

where mj is number of alleles at locus j

Elston-Stewart algorithmElston-Stewart algorithm

Reduces computations by peeling:

Step 1Condition likelihoods of family 1 on genotype of X.

1

2X

Step 2Joint likelihood of families 2 and 1

Lod Score: Morton (1955)Lod Score: Morton (1955)

5.0

log

L

LLod

Lod > 3 conclude linkage

Prior odds linkage ratio Posterior odds1:50 1000 20:1

Lod <-2 exclude linkage

Lod Score CurvesLod Score Curves

lod

0.5

Lod score curves are additive over pedigrees

0

Lods, chi-squares & p-valuesLods, chi-squares & p-values

In large samples

2 loge(10) Max lod ~ 21

In small samples

P 10 -Max lod

Problems with parametric Problems with parametric linkagelinkage

• Requires parameters of the disease model to be specified• Allele frequency

• Penetrances

These are generally unknown for a complex trait

• Disease model assumes that a single locus is the only source of familial resemblance

This is generally unrealistic

Linkage AnalysisLinkage AnalysisAdmixture Test (CAB Smith)Admixture Test (CAB Smith)

Model

Probability of linkage in family =

Likelihood

L(, ) = L() + (1- ) L(=1/2)

Note: Another example of mixture likelihood

Linkage Analysis: Linkage Analysis: MODMOD

• Maximise lod score over several sets of disease models, e.g. dominant, recessive, additive

• Make correction for multiple (k) models• Adjusted lod = lod – log10(k)

Allele sharing Allele sharing (non-parametric) methods(non-parametric) methods

Penrose (1935): Sib Pair linkage

For rare disease IBDConcordant affectedConcordant normalDiscordant

Therefore affected sib pair (ASP) design efficient

Test H0: Proportion of alleles IBD =1/2HA: Proportion of alleles IBD >1/2

Correlation between IBD of two Correlation between IBD of two lociloci

• For sib pairs

Corr(A, B) = (1-2AB)2

attenuation of linkage signal with increasing genetic distance from disease locus

Joint distribution of Pedigree Joint distribution of Pedigree IBD IBD

• IBD of relative pairs are not independent• e.g If IBD(1,2) = 2 and IBD (1,3) = 2 then IBD(2,3) = 2

• Inheritance vector gives joint IBD distribution• Each element indicates whether

• paternally inherited allele is transmitted (1)

• or maternally inherited allele is transmitted (0) Vector of 2N elements (N = # of non-founders)

Inheritance Vector: An Inheritance Vector: An ExampleExample

1/2 3/4

Ordered genotype notation1st allele = paternally inherited2nd allele = maternally inherited

1/3 1/4 2/3

2/4

Inheritance vector = (1, 1, 1, 0, 1, 0)

Pedigree allele-sharing Pedigree allele-sharing methodsmethods

APM: Affected Pedigree Members: Uses IBSvery sensitive to allele frequency mis-specificationless powerful than IBD-based methods

NPL: Non-Parametric Linkage (Genehunter)Conservative at positions between markers

LRT: “Delta parameter” (Genehunter+, Allegro)

•All these methods consider affected members only

Variance Components Variance Components LinkageLinkage

• Models trait values of pedigree members jointly• Assumes multivariate normality conditional on IBD

• Covariance between relative pairs

= Vr + VQ [-E()]

• Where V = trait variance

r = correlation (depends on relationship)

VQ= QTL additive variance

E() = expected proportion IBD

•

Path Diagram for Sib-Pair QTL Path Diagram for Sib-Pair QTL modelmodel

PT1

QSN

PT2

Q S N

1

[0 / 0.5 / 1]

n qs nsq

Incomplete Marker Incomplete Marker InformationInformation

• IBD sharing cannot always be deduced from marker genotypes with certainty

• Obtain probabilities of IBD values (Z0, Z1, Z2)

Finite mixture likelihood

Pi-hat likelihood

iIBDXLZL i |

2| IBDXLL

2/ˆ 12 zz

PT1

QSN

PT2

Q S N

1

n qs nsq

Pi-hat ModelPi-hat Model

Parametric / Allele SharingParametric / Allele Sharing

Trait Data Marker Data

IBD sharing

Parametric

Allele sharing

introduction to linkage analysis pak sham twin workshop 2003

Documents