introduction to linkage analysis pak sham twin workshop 2003
TRANSCRIPT
Introduction to Linkage Introduction to Linkage AnalysisAnalysis
Pak Sham
Twin Workshop 2003
Human GenomeHuman Genome
• 22 autosomes, XY
3 109 base-pairs (2 metres long)
2% coding sequences, rest regulatory & “junk”
30,000 - 40,000 genes
• Much communality with other species
Genetic VariationGenetic Variation
• Chromosomal abnormalities• Duplication (e.g. Down’s)
• Deletion (e.g. Velo-cardio-facial syndrome)
• Major deleterious mutations
• Usually Rare (e.g. Huntington’s)
• Polymorphisms
• Single nucleotide polymorphisms (SNPs)
• Variable length repeats (e.g. microsatellites)
• Some are functional (“normal variation”)
• Most are non-functional (neutral markers)
Genetic Mapping of DiseaseGenetic Mapping of Disease
• Levels of Genetic Analysis• Estimate heritability (family, twins, adoption)
• Find chromosomal locations (linkage)
• Identify risk variants (association)
• Understand mechanisms (cell biology, etc)
• Applications• Prediction of genetic risk
• More accurate prediction of genetic risk
• Even more accurate prediction of genetic risk; prediction of prognosis and treatment response
• Development of new drug targets
Strategies of Gene MappingStrategies of Gene Mapping
• Functional• Uses knowledge of disease to identify candidate genes
• Finds variants in candidate genes
• Looks for association between variants and disease
• Positional• Systematic screen of whole genome
• Uses a set of 400 evenly-spaced markers
• Looks for markers which con-segregate with disease
Co-segregationCo-segregation
A2A4
A3A4
A1A3
A1A2
A2A3
A1A2 A1A4 A3A4 A3A2
Marker allele A1
cosegregates withdominant disease
Linkage Linkage Co-segregationCo-segregation
Parent Gametes
Alleles on the same chromosome tend to be staytogether in meiosis; therefore they tend be co-transmitted.
Crossing over between Crossing over between homologous chromosomes homologous chromosomes
Map DistanceMap Distance
Map distance between two loci (Morgans)
= Expected number of crossovers per meiosis
(1 Morgan = 100 centiMorgans)
Note: Map distances are additive
Heterogeneity in recombination frequencies
Total map length: 33
(1 cM 106 base pairs)
RecombinationRecombinationA1
A2
Q1
Q2
A1
A2
Q1
Q2
A1
A2 Q1
Q2
Non-recombinants1-
Recombinants
Parental genotypes
Recombination FractionRecombination Fraction
Recombination fraction () between two loci
= Proportion of gametes that are recombinant with respect to the two loci
Recombination & map Recombination & map distancedistance
2
1 2me
Haldane mapfunction
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 0.2 0.4 0.6 0.8 1
Map distance (M)
Re
co
mb
ina
tio
n f
rac
tio
n
Double Backcross :Double Backcross :Fully Informative GametesFully Informative Gametes
AaBb aabb
AABB aabb
AaBb aabb Aabb aaBb
Non-recombinant Recombinant
Linkage Analysis :Linkage Analysis :Fully Informative GametesFully Informative Gametes
Count Data Recombinant Gametes: RNon-recombinant Gametes: N
Parameter Recombination Fraction:
Likelihood L() = R (1- )N
Estimation
Chi-square)(ˆ RNR
)5log(.)(
)1log(log22
NR
NR
Phase Unknown MeiosesPhase Unknown Meioses
AaBb aabb
AaBb aabb Aabb aaBb
Non-recombinant Recombinant
Recombinant Non-recombinant
Either :
Or :
Mixture distribution Mixture distribution likelihoodlikelihood
The probability of observed data X depend on thestatus of descrete variable G
P(X|G)
The status of G is not observed but the probabilitydistribution of G is available
P(G)
Then the likelihood of the observed data X is
G
GPGXPL )()|(
Linkage Analysis :Linkage Analysis :Phase-unknown MeiosesPhase-unknown Meioses
Count Data Recombinant Gametes: XNon-recombinant Gametes: Y
or Recombinant Gametes: YNon-recombinant Gametes: X
Likelihood L() = X (1- )Y + Y (1- )X
An example of incomplete data :
Mixture distribution likelihood function
Parental genotypes unknownParental genotypes unknown
Likelihood will be a function of
allele frequencies (population parameters)
(transmission parameter)
AaBb aabb Aabb aaBb
Complex PhenotypesComplex Phenotypes
Penetrance parameters
Genotype Phenotype
f2AA
aa
Aa
Disease
Normal
f1
f0
1- f2
1- f1
1- f0
Each phenotype is compatible with multiple genotypes.
General Pedigree LikelihoodGeneral Pedigree Likelihood
Likelihood is a sum of products (mixture distribution likelihood)
n
f
imf
f
i
G
n
gggtransgpopgxpenL iiii
111
)|()()|( ,
number of terms = (m1 m2 …..mk)2n
where mj is number of alleles at locus j
Elston-Stewart algorithmElston-Stewart algorithm
Reduces computations by peeling:
Step 1Condition likelihoods of family 1 on genotype of X.
1
2X
Step 2Joint likelihood of families 2 and 1
Lod Score: Morton (1955)Lod Score: Morton (1955)
5.0
log
L
LLod
Lod > 3 conclude linkage
Prior odds linkage ratio Posterior odds1:50 1000 20:1
Lod <-2 exclude linkage
Lod Score CurvesLod Score Curves
lod
0.5
Lod score curves are additive over pedigrees
0
Lods, chi-squares & p-valuesLods, chi-squares & p-values
In large samples
2 loge(10) Max lod ~ 21
In small samples
P 10 -Max lod
Problems with parametric Problems with parametric linkagelinkage
• Requires parameters of the disease model to be specified• Allele frequency
• Penetrances
These are generally unknown for a complex trait
• Disease model assumes that a single locus is the only source of familial resemblance
This is generally unrealistic
Linkage AnalysisLinkage AnalysisAdmixture Test (CAB Smith)Admixture Test (CAB Smith)
Model
Probability of linkage in family =
Likelihood
L(, ) = L() + (1- ) L(=1/2)
Note: Another example of mixture likelihood
Linkage Analysis: Linkage Analysis: MODMOD
• Maximise lod score over several sets of disease models, e.g. dominant, recessive, additive
• Make correction for multiple (k) models• Adjusted lod = lod – log10(k)
Allele sharing Allele sharing (non-parametric) methods(non-parametric) methods
Penrose (1935): Sib Pair linkage
For rare disease IBDConcordant affectedConcordant normalDiscordant
Therefore affected sib pair (ASP) design efficient
Test H0: Proportion of alleles IBD =1/2HA: Proportion of alleles IBD >1/2
Correlation between IBD of two Correlation between IBD of two lociloci
• For sib pairs
Corr(A, B) = (1-2AB)2
attenuation of linkage signal with increasing genetic distance from disease locus
Joint distribution of Pedigree Joint distribution of Pedigree IBD IBD
• IBD of relative pairs are not independent• e.g If IBD(1,2) = 2 and IBD (1,3) = 2 then IBD(2,3) = 2
• Inheritance vector gives joint IBD distribution• Each element indicates whether
• paternally inherited allele is transmitted (1)
• or maternally inherited allele is transmitted (0) Vector of 2N elements (N = # of non-founders)
Inheritance Vector: An Inheritance Vector: An ExampleExample
1/2 3/4
Ordered genotype notation1st allele = paternally inherited2nd allele = maternally inherited
1/3 1/4 2/3
2/4
Inheritance vector = (1, 1, 1, 0, 1, 0)
Pedigree allele-sharing Pedigree allele-sharing methodsmethods
APM: Affected Pedigree Members: Uses IBSvery sensitive to allele frequency mis-specificationless powerful than IBD-based methods
NPL: Non-Parametric Linkage (Genehunter)Conservative at positions between markers
LRT: “Delta parameter” (Genehunter+, Allegro)
•All these methods consider affected members only
Variance Components Variance Components LinkageLinkage
• Models trait values of pedigree members jointly• Assumes multivariate normality conditional on IBD
• Covariance between relative pairs
= Vr + VQ [-E()]
• Where V = trait variance
r = correlation (depends on relationship)
VQ= QTL additive variance
E() = expected proportion IBD
•
Path Diagram for Sib-Pair QTL Path Diagram for Sib-Pair QTL modelmodel
PT1
QSN
PT2
Q S N
1
[0 / 0.5 / 1]
n qs nsq
Incomplete Marker Incomplete Marker InformationInformation
• IBD sharing cannot always be deduced from marker genotypes with certainty
• Obtain probabilities of IBD values (Z0, Z1, Z2)
Finite mixture likelihood
Pi-hat likelihood
iIBDXLZL i |
2| IBDXLL
2/ˆ 12 zz
PT1
QSN
PT2
Q S N
1
n qs nsq
Pi-hat ModelPi-hat Model
Parametric / Allele SharingParametric / Allele Sharing
Trait Data Marker Data
IBD sharing
Parametric
Allele sharing