applications of linkage analysis in the modern era intro to... · 2017-12-07 · linkage...

Applications of linkage analysis in the modern era

[email protected]

Outline

• What is linkage analysis?• Parametric

• Non-parametric

• Why is linkage analysis complicated for complex traits such as cognition or psychiatric illness?

• How can it be used in the modern era?• Used to filter large amount of data generated through next generation

sequencing

• Used to understand the effects of combinations of variants on phenotype

• One of the two main approaches in gene mapping.

• Uses pedigree data.

Linkage Analysis

Linkage Association

Linkage is a property of loci Association is a property of alleles

Role:* To identify a biological mechanism for transmission of a trait* To locate the gene involved

Role:* To identify association between an allelic variant and a disease* To identify linkage disequilibrium between a disease allele and a marker

Coarse mapping (>1cM) Fine mapping (<1cM)

No information about which allelic variant associated with higher risk of disease

Require family pedigrees Case-control or family based approach

Use very polymorphic markers or bi-allelic markers Usually bi-allelic markers

Differences between linkage and association

Calculation of LOD Scores

LOD scores are the Log10 of the ratio between the two odds.

You calculate the probability of the pattern occurring by chance and the probability that they occur because they are close together i.e. linked.

Likelihood given linkage (i.e. the recombination fraction <0.5, here 0.2)

= (1-θ)5 x θ1

= (0.8)5 x 0.21

= 0.32768 x 0.2 = 0.065536

Likelihood given no linkage (i.e. the recombination fraction is 0.5)

= (θ)6

= (0.5)6

= 0.015625

Ratio between the two probabilities= 0.065536/0.015625

= 4.194304

The Log(10) of this ratio, is the Z score or LODscore =

0.62266

5 non-recombinant individuals

+

1 recombinant individual

Recombination fraction

=N. recomb/N.meioses)

=1/6 = 0.167

Calculation of LOD Scores

Building blocks of linkage analysis

001.0

8.0

99.0 (aa), probability of a homozygote being affected

(Aa), probability of a heterozygote being affected

(AA), probability of a non-carrier being affected (phenocopy rate)

• Information about disease model (in parametric analysis)

• Information about allele frequencies

• Information about environmental variables

DISEASE ALLELE_FREQ PENETRANCES LABELPROSTATE_CANCER 0.001 * HYPOTHETICAL_ADDITIVE_MODEL

SEX = FEMALE 0.000,0.000,0.000AGE < 50 0.001,0.050,0.100AGE < 70 0.002,0.200,0.400OTHERWISE 0.004,0.500,0.800

The model describes an hypothetical susceptibility allele for prostate cancer.

- The first liability class is all females, and specifies that they never develop prostate cancer.

- The next row specifies that males under the age of 50 have about a 5% chance of developing cancer if they are heterozygotes for this allele and a 10% chance if they are homozygotes. These probabilities increase for males aged between 50 and 70.

- The final row specifies the penetrances for all other individuals (i.e. males aged 70 or over).

An appropriate and careful choice of disease model is essential for parametric linkage analyses.

Calculation of LOD Scores – liability classes

Seven liability classes were defined on the basis of age (in years). For each age group j, age-dependent population prevalence Pjwas obtained from the Rotterdam Study.1

The disease-gene penetrance, fj, of the jthage group can be estimated as

where PAF is the population-attributable fraction—that is, the proportion of the population prevalence that can be explained by the studied gene (10% assumed)—and qis the disease-allele frequency (1% assumed).

LiabilityClass

Age(years)

Population Prevalencea Penetrance

No. of Patients

No. of Unaffected Relatives

1 <65 <.02 .00 0 129

2 65–69 .02 .09 4 6

3 70–74 .05 .23 22 11

4 75–79 .09 .46 32 14

5 80–84 .23 .99 30 8

6 85–89 .35 .99 24 1

7 90 >.35 .99 0 1

Calculation of LOD Scores – age-dependent penetrance

A genome-wide screen for late-onset Alzheimer disease in a genetically isolated Dutch population. Am J Hum Genet. 2007 July; 81(1): 17–31.

Multipoint and Heterogeneity LOD Scores better resolution, more robust, exclusion mapping

A genome-wide screen for late-onset Alzheimer disease in a genetically isolated Dutch population.

Multipoint LOD (blue) and HLOD (pink) scores for chromosomes 1 and 3 in the genome screen of late-onset AD after fine typing.

Am J Hum Genet. 2007 July; 81(1): 17–31.

The Significance of LOD Scores

• Significant linkage equals a LOD score of > or = +3

i.e. Log10 1000

or linkage is 1000x more likely than non-linkage

LOD +3 is ~ p = 0.05

In genome scans this limit is increased to +3.3 due to the testing of multiple markers.

• LOD < -2 is significant evidence for non-linkage

•LOD > -2 < +3 it is inconclusive and more data is needed, perhaps by adding additional families.

Strachan & Read, 1999

Linkage Analysis

• Unfortunately, the standard (parametric) LOD score method doesn’t work well for complex traits, because it requires a definite model of how the trait is inherited: the first step in LOD score mapping is to determine the expected frequency of offspring phenotypes as a function of the recombination fraction.

• Non-parametric methods: look for chromosome segments shared by affected individuals. Doesn’t rely on genetic model.• affected sib pair analysis• linkage disequilibrium

Affected Sib Pair Analysis

• If two siblings both are affected by a genetic disease, they will (in most cases) share a region of chromosome surrounding the disease gene. This segment is “identical by descent” (IBD): it was derived from a common ancestor, their parent.

• use many markers to find IBD regions among many affected sib pairs.

• Usually results in a large region: too big for positional cloning.

• Also: if more than one gene causes the trait, the necessary large amount of data will never converge to a single chromosome region.

AB CD

AC AD

AB AC

AC AB

Linkage Disequilibrium (LD)• Gene mapping using recombination methods

(such as affected sib pair analysis) suffers from not having enough crossovers in one generation to localize a gene very well.

• Linkage disequilibrium uses crossovers that have occurred over several generation.• Regions of chromosome distant from the disease

mutation will become randomized. • However, right near the mutation random crossovers

will not have separated the disease locus from its surrounding haplotype: a particular DNA haplotype will be in disequilibrium with the disease trait.

• The trick is to find that haplotype.• The further back in time since the mutation occurred,

the smaller the region of disequilibrium.

More Linkage Disequilibrium

• A major complication: turns out that whole blocks of chromosomes get inherited together over many generations. Crossing over isn’t completely random. Means that genes occur in LD blocks separated by recombination hotspots.

• Another problem: LD methods depend on there being only a single original disease mutation that occurred in a particular haplotype. Multiple mutations will each have their own LD haplotype.

Outline

• What is linkage analysis?• Parametric

• Non-parametric

• Why is linkage analysis complicated for complex traits such as cognition or psychiatric illness?

• How can it be used in the modern era?• Used to filter large amount of data generated through next generation

sequencing

• Used to understand the effects of combinations of variants on phenotype

High throughput sequencing platforms

Mol Cell. 2015 May 21; 58(4): 586–597. doi: 10.1016/j.molcel.2015.05.004

Variation in the Genome

Matt Hurles – UK10K

Causes and Consequences of new mutations per individual:

3-4,000,000 variants (90% SNV, 9% Indel, 1% SV)10-11,000 amino acid changing200-250 truncating Indels70-100 truncating base30-50 splice site50-200 new mutations (only 0-2 in genes)

Metachondromatosis

Linkage analysis

Followed by whole genome sequencing of a single affected individual from the family.

Evaluation of sequence data

Sobreira et al., suggests 3 methods to prioritize variants for further analysis:- Linkage information

Using the results of linkage analysis to prioritise regions

- Likelihood of being functionalLooking at exonic variants that effect the protein sequence

- Stops gained- Frameshifting InDels

- Frequency in the health populationComparison to known variants in dbSNPComparison to sequence from 8 unrelated controls

Linkage AnalysisThey were able to exclude linkage to 96% of the genome (LOD < -2) and 98.4% of

the genome showed negative LOD scores.

This reduced the search to 42Mb of sequence within 6 regions.

This included 767 Kb exonic sequencePTPN11

Linkage Region

LODScore

No. RefSeqgenes

No. variants unique to patient

Unique Variants

2p25 1.0-1.5 20 0

5q12.1 1.0-1.5 7 0

7p14.1 2.5* 14 0

8q24.1 1.8 27 0

9q31.1-q33.1 1.0-1.5 71 0

12q33 1.8# 105 1 11bp del PTPN11

*Maximum achievable LOD score in this family# Subsequently revised.

Linkage Analysis

Genotyping in Family

11 bp deletion

Linkage Analysis

Target Enrichment

Agilent Technologies 1M SureSelect

DNA capture array containing 973,952 probes targeting 844,339 bp within the 8.6 Mb candidate interval

Including 88.4% and 98.6% of UCSC exons and CCDS coding sequence, respectively.

61% of sequence reads mapped to the targeted region.

23% of targeted bases were not captured.

16 individuals from 11 families

PTPN11 mutations identified in MC participants.

Locus Identification-problems

• Uncertainty in diagnostic boundaries

• Non-Mendelian inheritance

• Variable age of onset

• Genetic heterogeneity

– Many different genes can cause the illness

> 1% risk world wide

> phenotypic variation

• Oligogenic/polygenic causation

– More than one mutant gene required to produce phenotype

Locus identification- reducing the problems

• Single large families

• Avoid bilineal descent

• rigorous interviews

• family history

Reduce genetic heterogeneity

Significant LOD score = gene of major effect

• Reduce uncertainty of diagnosis

– classify minor diagnoses as unaffected

– >1 category of affected phenotype

Molecular Psychiatry advance online publication 21 March 2017.doi:10.1038/mp.2017.49

A rare missense variant in RCL1 segregates with depression in extended families.

Immunohistochemical labeling of RCL1 in human cerebral cortex. Co-localization with GFAP-positive primate-specific interlaminarastrocytes.

A rare genetic variant, rs115482041, on chr 9p24 in the RCL1 gene that segregated with depression across multiple generations in an extended family.

The variant was estimated to explain more than half of the variation in depressive symptoms in the extended family, and 2.9% of the heritability in the overall genetically isolated population.

Interlaminar astrocytes may form a network for long-range coordination of intra-cortical communication.

Pedigree structure and genotypic information

John R. Giudicessi, and Michael J. Ackerman Circ

Cardiovasc Genet. 2013;6:193-200

The phenotype of Bardet–Biedl syndrome (BBS) is defined by the association of retinitis pigmentosa, obesity, polydactyly, hypogenitalism, renal disease and cognitive impairment. The significant genetic heterogeneity of this condition is supported by the identification, to date, of eight genes (BBS1–8) implied with cilia assembly or function.

Phenotypic heterogeneity and reduce penetrance?

recurrent major depression

minor diagnosisunaffected

schizophrenia

bipolar affective disorder

(1;11)(q42;q14) translocation

Blackwood et al, 2001

Risk of major psychiatric illness increases 50 fold

Carriers have reduced brain attention measure ERP P300

Chromosomes with multipoint LOD >2

Chr1q Chr11q Chr5q

Chr2pChr1p

Chr4q Chr16Chr3q

Model Code Phenotypes

MODEL BSCZ, BP1, BP2, SCZAFF, rMDD, cyclothymia

MODEL F BP, rMDD, MDD

Psychosis hal/del

Fine Mapping

Id

Any

Diagnosis SCZ BP

SCZ BP

rMDD

Cyclothymi

a BP rMDD t(1;11) chr1 chr11_1

chr11_2

F chr5 chr1p chr2p chr3q chr4q chr16

13 1 1 1 1 2 1* 1 1 1 1

24 1 1 1 1 2 1 1 1 1 1

27 1 1 2 1 1 1 1 1

49 1 1 1 1 2 1 1 1 1 1 1 Schizophrenia

41 1 1 1 1 1 2 1 1 1 1 1 1 1 1 Bipolar

61 1 1 1 1 2 1 1 1 1 1 1 1

50 1 1 2 1 1 1 1 1 1 1 1

26 1 1 2 1 1 1 1

104 1 1 2 1 1 1

67 1 1 1 1 1 2 1 1 1 1 1 Bipolar

55 1 1 1 2 1 1 1 1 1 1 Cyclothymic

19 1 1 1 1 2 1 1 1 1

53 1 1 1 2 1 1 1 1 Cyclothymic

15 1 1 1 1 2 1 1 1 1 Schizophrenia

32 1 1 1 2 1 1 1 Cyclothymic

9 1 1 2 1 1 1

18 1 1 1 1 2 1 1 1 Schizophrenia

70 1 1 1 1 2 1 1 2 Schizophrenia

44 1 1 1 1 1 1 1 1 1 1 1 1 MDD recurrent

87 1 1 1 1 1 1 1 1 MDD single episode

47 1 1 1 1 1 1 1 MDD + Generalised anxiety

54 1 1 1 1 1 1 1 1 1 MDD recurrent

62 1 1 1 1 Generalised anxiety

85 1 1 1 MDD single episode

Do the haplotypes in each individual predict diagnoses?

Phenotype prediction in the family

Linkage information useful for:

• Exclusion mapping• Variable filtering

• Assessing combined effects of variants:• Compound heterozygosity

• Identification of causal pathways

• Investigation of reduced penetrance & phenotypic heterogeneity

AbstractCreative activities in music represent a complex cognitive function of the human brain, whose biological basis is largely unknown. In order to elucidate the biological background of creative activities in music we performed genome-wide linkage and linkage disequilibrium (LD) scans in musically experienced individuals characterised for self-reported composing, arranging and non-music related creativity. The participants consisted of 474 individuals from 79 families, and 103 sporadic individuals. We found promising evidence for linkage at 16p12.1-q12.1 for arranging (LOD 2.75, 120 cases), 4q22.1 for composing (LOD 2.15, 103 cases) and Xp11.23 for non-music related creativity (LOD 2.50, 259 cases). ...The locus at 4q22.1 overlaps the previously identified region of musical aptitude, music perception and performance giving further support for this region as a candidate region for broad range of music-related traits. ...Pathway analysis of the genes suggestively associated with composing suggested an overrepresentation of the cerebellar long-term depression pathway (LTD), which is a cellular model for synaptic plasticity. ...These results suggest that molecular pathways linked to memory and learning via LTD affect music-related creative behaviour.

applications of linkage analysis in the modern era intro to... · 2017-12-07 · linkage...

Documents