applications of linkage analysis in the modern era intro to... · 2017-12-07 · linkage...
TRANSCRIPT
Applications of linkage analysis in the modern era
Outline
• What is linkage analysis?• Parametric
• Non-parametric
• Why is linkage analysis complicated for complex traits such as cognition or psychiatric illness?
• How can it be used in the modern era?• Used to filter large amount of data generated through next generation
sequencing
• Used to understand the effects of combinations of variants on phenotype
• One of the two main approaches in gene mapping.
• Uses pedigree data.
Linkage Analysis
Linkage Association
Linkage is a property of loci Association is a property of alleles
Role:* To identify a biological mechanism for transmission of a trait* To locate the gene involved
Role:* To identify association between an allelic variant and a disease* To identify linkage disequilibrium between a disease allele and a marker
Coarse mapping (>1cM) Fine mapping (<1cM)
No information about which allelic variant associated with higher risk of disease
Require family pedigrees Case-control or family based approach
Use very polymorphic markers or bi-allelic markers Usually bi-allelic markers
Differences between linkage and association
Calculation of LOD Scores
LOD scores are the Log10 of the ratio between the two odds.
You calculate the probability of the pattern occurring by chance and the probability that they occur because they are close together i.e. linked.
Likelihood given linkage (i.e. the recombination fraction <0.5, here 0.2)
= (1-θ)5 x θ1
= (0.8)5 x 0.21
= 0.32768 x 0.2 = 0.065536
Likelihood given no linkage (i.e. the recombination fraction is 0.5)
= (θ)6
= (0.5)6
= 0.015625
Ratio between the two probabilities= 0.065536/0.015625
= 4.194304
The Log(10) of this ratio, is the Z score or LODscore =
0.62266
5 non-recombinant individuals
+
1 recombinant individual
Recombination fraction
=N. recomb/N.meioses)
=1/6 = 0.167
Calculation of LOD Scores
Building blocks of linkage analysis
001.0
8.0
99.0 (aa), probability of a homozygote being affected
(Aa), probability of a heterozygote being affected
(AA), probability of a non-carrier being affected (phenocopy rate)
• Information about disease model (in parametric analysis)
• Information about allele frequencies
• Information about environmental variables
DISEASE ALLELE_FREQ PENETRANCES LABELPROSTATE_CANCER 0.001 * HYPOTHETICAL_ADDITIVE_MODEL
SEX = FEMALE 0.000,0.000,0.000AGE < 50 0.001,0.050,0.100AGE < 70 0.002,0.200,0.400OTHERWISE 0.004,0.500,0.800
The model describes an hypothetical susceptibility allele for prostate cancer.
- The first liability class is all females, and specifies that they never develop prostate cancer.
- The next row specifies that males under the age of 50 have about a 5% chance of developing cancer if they are heterozygotes for this allele and a 10% chance if they are homozygotes. These probabilities increase for males aged between 50 and 70.
- The final row specifies the penetrances for all other individuals (i.e. males aged 70 or over).
An appropriate and careful choice of disease model is essential for parametric linkage analyses.
Calculation of LOD Scores – liability classes
Seven liability classes were defined on the basis of age (in years). For each age group j, age-dependent population prevalence Pjwas obtained from the Rotterdam Study.1
The disease-gene penetrance, fj, of the jthage group can be estimated as
where PAF is the population-attributable fraction—that is, the proportion of the population prevalence that can be explained by the studied gene (10% assumed)—and qis the disease-allele frequency (1% assumed).
LiabilityClass
Age(years)
Population Prevalencea Penetrance
No. of Patients
No. of Unaffected Relatives
1 <65 <.02 .00 0 129
2 65–69 .02 .09 4 6
3 70–74 .05 .23 22 11
4 75–79 .09 .46 32 14
5 80–84 .23 .99 30 8
6 85–89 .35 .99 24 1
7 90 >.35 .99 0 1
Calculation of LOD Scores – age-dependent penetrance
A genome-wide screen for late-onset Alzheimer disease in a genetically isolated Dutch population. Am J Hum Genet. 2007 July; 81(1): 17–31.
Multipoint and Heterogeneity LOD Scores better resolution, more robust, exclusion mapping
A genome-wide screen for late-onset Alzheimer disease in a genetically isolated Dutch population.
Multipoint LOD (blue) and HLOD (pink) scores for chromosomes 1 and 3 in the genome screen of late-onset AD after fine typing.
Am J Hum Genet. 2007 July; 81(1): 17–31.
The Significance of LOD Scores
• Significant linkage equals a LOD score of > or = +3
i.e. Log10 1000
or linkage is 1000x more likely than non-linkage
LOD +3 is ~ p = 0.05
In genome scans this limit is increased to +3.3 due to the testing of multiple markers.
• LOD < -2 is significant evidence for non-linkage
•LOD > -2 < +3 it is inconclusive and more data is needed, perhaps by adding additional families.
Strachan & Read, 1999
Linkage Analysis
• Unfortunately, the standard (parametric) LOD score method doesn’t work well for complex traits, because it requires a definite model of how the trait is inherited: the first step in LOD score mapping is to determine the expected frequency of offspring phenotypes as a function of the recombination fraction.
• Non-parametric methods: look for chromosome segments shared by affected individuals. Doesn’t rely on genetic model.• affected sib pair analysis• linkage disequilibrium
Affected Sib Pair Analysis
• If two siblings both are affected by a genetic disease, they will (in most cases) share a region of chromosome surrounding the disease gene. This segment is “identical by descent” (IBD): it was derived from a common ancestor, their parent.
• use many markers to find IBD regions among many affected sib pairs.
• Usually results in a large region: too big for positional cloning.
• Also: if more than one gene causes the trait, the necessary large amount of data will never converge to a single chromosome region.
AB CD
AC AD
AB AC
AC AB
Linkage Disequilibrium (LD)• Gene mapping using recombination methods
(such as affected sib pair analysis) suffers from not having enough crossovers in one generation to localize a gene very well.
• Linkage disequilibrium uses crossovers that have occurred over several generation.• Regions of chromosome distant from the disease
mutation will become randomized. • However, right near the mutation random crossovers
will not have separated the disease locus from its surrounding haplotype: a particular DNA haplotype will be in disequilibrium with the disease trait.
• The trick is to find that haplotype.• The further back in time since the mutation occurred,
the smaller the region of disequilibrium.
More Linkage Disequilibrium
• A major complication: turns out that whole blocks of chromosomes get inherited together over many generations. Crossing over isn’t completely random. Means that genes occur in LD blocks separated by recombination hotspots.
• Another problem: LD methods depend on there being only a single original disease mutation that occurred in a particular haplotype. Multiple mutations will each have their own LD haplotype.
Outline
• What is linkage analysis?• Parametric
• Non-parametric
• Why is linkage analysis complicated for complex traits such as cognition or psychiatric illness?
• How can it be used in the modern era?• Used to filter large amount of data generated through next generation
sequencing
• Used to understand the effects of combinations of variants on phenotype
High throughput sequencing platforms
Mol Cell. 2015 May 21; 58(4): 586–597. doi: 10.1016/j.molcel.2015.05.004
Variation in the Genome
Matt Hurles – UK10K
Causes and Consequences of new mutations per individual:
3-4,000,000 variants (90% SNV, 9% Indel, 1% SV)10-11,000 amino acid changing200-250 truncating Indels70-100 truncating base30-50 splice site50-200 new mutations (only 0-2 in genes)
Metachondromatosis
Linkage analysis
Followed by whole genome sequencing of a single affected individual from the family.
Evaluation of sequence data
Sobreira et al., suggests 3 methods to prioritize variants for further analysis:- Linkage information
Using the results of linkage analysis to prioritise regions
- Likelihood of being functionalLooking at exonic variants that effect the protein sequence
- Stops gained- Frameshifting InDels
- Frequency in the health populationComparison to known variants in dbSNPComparison to sequence from 8 unrelated controls
Linkage AnalysisThey were able to exclude linkage to 96% of the genome (LOD < -2) and 98.4% of
the genome showed negative LOD scores.
This reduced the search to 42Mb of sequence within 6 regions.
This included 767 Kb exonic sequencePTPN11
Linkage Region
LODScore
No. RefSeqgenes
No. variants unique to patient
Unique Variants
2p25 1.0-1.5 20 0
5q12.1 1.0-1.5 7 0
7p14.1 2.5* 14 0
8q24.1 1.8 27 0
9q31.1-q33.1 1.0-1.5 71 0
12q33 1.8# 105 1 11bp del PTPN11
*Maximum achievable LOD score in this family# Subsequently revised.
Linkage Analysis
Genotyping in Family
11 bp deletion
Linkage Analysis
Target Enrichment
Agilent Technologies 1M SureSelect
DNA capture array containing 973,952 probes targeting 844,339 bp within the 8.6 Mb candidate interval
Including 88.4% and 98.6% of UCSC exons and CCDS coding sequence, respectively.
61% of sequence reads mapped to the targeted region.
23% of targeted bases were not captured.
16 individuals from 11 families
PTPN11 mutations identified in MC participants.
Locus Identification-problems
• Uncertainty in diagnostic boundaries
• Non-Mendelian inheritance
• Variable age of onset
• Genetic heterogeneity
– Many different genes can cause the illness
> 1% risk world wide
> phenotypic variation
• Oligogenic/polygenic causation
– More than one mutant gene required to produce phenotype
Locus identification- reducing the problems
• Single large families
• Avoid bilineal descent
• rigorous interviews
• family history
Reduce genetic heterogeneity
Significant LOD score = gene of major effect
• Reduce uncertainty of diagnosis
– classify minor diagnoses as unaffected
– >1 category of affected phenotype
Molecular Psychiatry advance online publication 21 March 2017.doi:10.1038/mp.2017.49
A rare missense variant in RCL1 segregates with depression in extended families.
Immunohistochemical labeling of RCL1 in human cerebral cortex. Co-localization with GFAP-positive primate-specific interlaminarastrocytes.
A rare genetic variant, rs115482041, on chr 9p24 in the RCL1 gene that segregated with depression across multiple generations in an extended family.
The variant was estimated to explain more than half of the variation in depressive symptoms in the extended family, and 2.9% of the heritability in the overall genetically isolated population.
Interlaminar astrocytes may form a network for long-range coordination of intra-cortical communication.
Pedigree structure and genotypic information
John R. Giudicessi, and Michael J. Ackerman Circ
Cardiovasc Genet. 2013;6:193-200
The phenotype of Bardet–Biedl syndrome (BBS) is defined by the association of retinitis pigmentosa, obesity, polydactyly, hypogenitalism, renal disease and cognitive impairment. The significant genetic heterogeneity of this condition is supported by the identification, to date, of eight genes (BBS1–8) implied with cilia assembly or function.
Phenotypic heterogeneity and reduce penetrance?
recurrent major depression
minor diagnosisunaffected
schizophrenia
bipolar affective disorder
(1;11)(q42;q14) translocation
Blackwood et al, 2001
Risk of major psychiatric illness increases 50 fold
Carriers have reduced brain attention measure ERP P300
Chromosomes with multipoint LOD >2
Chr1q Chr11q Chr5q
Chr2pChr1p
Chr4q Chr16Chr3q
Model Code Phenotypes
MODEL BSCZ, BP1, BP2, SCZAFF, rMDD, cyclothymia
MODEL F BP, rMDD, MDD
Psychosis hal/del
Fine Mapping
Id
Any
Diagnosis SCZ BP
SCZ BP
rMDD
Cyclothymi
a BP rMDD t(1;11) chr1 chr11_1
chr11_2
F chr5 chr1p chr2p chr3q chr4q chr16
13 1 1 1 1 2 1* 1 1 1 1
24 1 1 1 1 2 1 1 1 1 1
27 1 1 2 1 1 1 1 1
49 1 1 1 1 2 1 1 1 1 1 1 Schizophrenia
41 1 1 1 1 1 2 1 1 1 1 1 1 1 1 Bipolar
61 1 1 1 1 2 1 1 1 1 1 1 1
50 1 1 2 1 1 1 1 1 1 1 1
26 1 1 2 1 1 1 1
104 1 1 2 1 1 1
67 1 1 1 1 1 2 1 1 1 1 1 Bipolar
55 1 1 1 2 1 1 1 1 1 1 Cyclothymic
19 1 1 1 1 2 1 1 1 1
53 1 1 1 2 1 1 1 1 Cyclothymic
15 1 1 1 1 2 1 1 1 1 Schizophrenia
32 1 1 1 2 1 1 1 Cyclothymic
9 1 1 2 1 1 1
18 1 1 1 1 2 1 1 1 Schizophrenia
70 1 1 1 1 2 1 1 2 Schizophrenia
44 1 1 1 1 1 1 1 1 1 1 1 1 MDD recurrent
87 1 1 1 1 1 1 1 1 MDD single episode
47 1 1 1 1 1 1 1 MDD + Generalised anxiety
54 1 1 1 1 1 1 1 1 1 MDD recurrent
62 1 1 1 1 Generalised anxiety
85 1 1 1 MDD single episode
Do the haplotypes in each individual predict diagnoses?
Phenotype prediction in the family
Linkage information useful for:
• Exclusion mapping• Variable filtering
• Assessing combined effects of variants:• Compound heterozygosity
• Identification of causal pathways
• Investigation of reduced penetrance & phenotypic heterogeneity
AbstractCreative activities in music represent a complex cognitive function of the human brain, whose biological basis is largely unknown. In order to elucidate the biological background of creative activities in music we performed genome-wide linkage and linkage disequilibrium (LD) scans in musically experienced individuals characterised for self-reported composing, arranging and non-music related creativity. The participants consisted of 474 individuals from 79 families, and 103 sporadic individuals. We found promising evidence for linkage at 16p12.1-q12.1 for arranging (LOD 2.75, 120 cases), 4q22.1 for composing (LOD 2.15, 103 cases) and Xp11.23 for non-music related creativity (LOD 2.50, 259 cases). ...The locus at 4q22.1 overlaps the previously identified region of musical aptitude, music perception and performance giving further support for this region as a candidate region for broad range of music-related traits. ...Pathway analysis of the genes suggestively associated with composing suggested an overrepresentation of the cerebellar long-term depression pathway (LTD), which is a cellular model for synaptic plasticity. ...These results suggest that molecular pathways linked to memory and learning via LTD affect music-related creative behaviour.