j computer linkage heterogeneity evaluation familyous sclerosis (tsc) brought together linkage data...

8
J Med Genet 1992; 29: 867-874 Computer simulation of linkage and heterogeneity in tuberous sclerosis: a critical evaluation of the collaborative family data L A J Janssen, L A Sandkuijl, J R Sampson, D J J Halley Abstract The existence of locus heterogeneity for a genetic disease may complicate linkage studies considerably, especially when very few large families with the disease are available. In this situation a modest collection of families is unlikely to be sufficient for successful localisation of one or more disease genes. Recently, eight research groups working on tuber- ous sclerosis (TSC) brought together linkage data pertaining to the candidate chromosomes 9, 11, and 12 for a large group of families. In a series of simula- tion studies we determined the proba- bility of detecting linkage and linkage heterogeneity in this set of families. On average TSC families are very small; in most cases there are fewer than two informative meioses. The size distri- bution of chromosome 9 linked families was similar to that of non-linked fami- lies. This indicates that a dramatic dif- ference in the clinical severity of major genetic forms of TSC is unlikely. The results of our simulation studies show that this set of families can generate highly significant evidence for linkage and heterogeneity. When two TSC genes are equally common, the strongest evid- ence for linkage and heterogeneity could be obtained using a method based on the incorporation of multiple candidate regions in a single analysis, with an average lod score of 24-27. (J Med Genet 1992;29:867-74) Department of Clinical Genetics, Academic Hospital Rotterdam Dijkzigt and Erasmus University Rotterdam, Dr Molewaterplein 50, 3015 GE Rotterdam, The Netherlands. L A J Janssen L A Sandkuijl D J J Halley Institute of Medical Genetics, University Hospital of Wales, Cardiff CF4 4XN. J R Sampson L A Sandkuijl Correspondence to Dr Janssen. Received 4 August 1992. Revised version accepted 30 September 1992. During the past decade, linkage analysis has been successfully applied to localise the genes responsible for many different inherited dis- eases.' The availability of a large series of highly polymorphic markers throughout the genome has considerably facilitated linkage mapping. For a number of diseases, however, linkage results have been reported that have not been confirmed by further studies, as was initially the case for Charcot-Marie-Tooth disease, and is still the case for a number of psychiatric disorders including schizophrenia and manic depression.23 Locus heterogeneity has been suggested as a possible explanation for such differing results. While it is com- monly recognised that locus heterogeneity may complicate linkage studies, it has not pre- cluded the accurate localisation of major genes for polycystic kidney disease and, eventually, Charcot-Marie-Tooth disease. For mapping under locus heterogeneity, it is necessary to identify the subset of families that show lin- kage to a given chromosomal region. For large families that can generate significant evidence for linkage when studied individually this can be done directly. When the average family size is small, however, this distinction cannot be made in a straightforward manner, and it is intuitively clear that a large number of families will be required for detection of one or several of the responsible genes. This has been con- firmed in theoretical studies using computer simulation and analytical methods.45 Various statistical methods have become available to make optimal use of the linkage information in a series of (relatively) small families. The most commonly applied test is the admixture test,67 which involves simultan- eous estimation of the location(s) of the re- sponsible genes and of the proportion of fami- lies segregating for each of these genes. This method has been instrumental in detecting several loci causing retinitis pigmentosa on the X chromosome.8 For tuberous sclerosis, a neurocutaneous disorder characterised by widespread hamar- tosis, linkage studies have yielded conflicting results. While some studies provided evidence for a locus close to the ABO blood group gene on chromosome 9,910 other studies could not confirm this linkage, and indicated chromo- somes 11 and 12 as possible alternative loca- tions." 12 Considerable efforts have been made to combine linkage data from the groups parti- cipating in the TSC consortium and to analyse the data with a variety of statistical methods. Unfortunately, these studies have not entirely resolved the controversy, although consistent evidence has emerged for a TSC1 gene on the long arm of chromosome 9 and for the exis- tence of locus heterogeneity.13-5 For the simultaneous evaluation of the dif- ferent chromosomal regions that may be involved in TSC, we have developed an exten- sion of the admixture approach,6716 which we have previously called the imaginary chromo- some (IC) approach. In a single analysis, this method will evaluate linkage results for all relevant chromosomal regions. For a given family, this allows simultaneous evaluation of positive evidence for one of the regions and negative linkage information for the other re- gions. While this approach has the advantage of making maximal use of all available infor- mation, it shares with some other statistical methods the disadvantage of not being trans- parent. Thus, it is not immediately apparent 867 on August 6, 2021 by guest. Protected by copyright. http://jmg.bmj.com/ J Med Genet: first published as 10.1136/jmg.29.12.867 on 1 December 1992. Downloaded from

Upload: others

Post on 08-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: J Computer linkage heterogeneity evaluation familyous sclerosis (TSC) brought together linkage data pertaining to the candidate chromosomes 9, 11, and 12 for a large group offamilies

J Med Genet 1992; 29: 867-874

Computer simulation of linkage andheterogeneity in tuberous sclerosis: a criticalevaluation of the collaborative family data

L A J Janssen, L A Sandkuijl, J R Sampson, D J J Halley

AbstractThe existence of locus heterogeneity for agenetic disease may complicate linkagestudies considerably, especially whenvery few large families with the diseaseare available. In this situation a modestcollection of families is unlikely to besufficient for successful localisation ofone or more disease genes. Recently,eight research groups working on tuber-ous sclerosis (TSC) brought togetherlinkage data pertaining to the candidatechromosomes 9, 11, and 12 for a largegroup of families. In a series of simula-tion studies we determined the proba-bility of detecting linkage and linkageheterogeneity in this set of families.On average TSC families are very

small; in most cases there are fewer thantwo informative meioses. The size distri-bution of chromosome 9 linked familieswas similar to that of non-linked fami-lies. This indicates that a dramatic dif-ference in the clinical severity of majorgenetic forms of TSC is unlikely.The results of our simulation studies

show that this set offamilies can generatehighly significant evidence for linkageand heterogeneity. When two TSC genesare equally common, the strongest evid-ence for linkage and heterogeneity couldbe obtained using a method based on

the incorporation of multiple candidateregions in a single analysis, with an

average lod score of 24-27.(J Med Genet 1992;29:867-74)

Department ofClinical Genetics,Academic HospitalRotterdam Dijkzigtand ErasmusUniversityRotterdam, DrMolewaterplein 50,3015 GE Rotterdam,The Netherlands.L A J JanssenL A SandkuijlD J J Halley

Institute of MedicalGenetics, UniversityHospital of Wales,Cardiff CF4 4XN.J R SampsonL A Sandkuijl

Correspondence toDr Janssen.Received 4 August 1992.Revised version accepted30 September 1992.

During the past decade, linkage analysis hasbeen successfully applied to localise the genesresponsible for many different inherited dis-eases.' The availability of a large series ofhighly polymorphic markers throughout thegenome has considerably facilitated linkagemapping. For a number of diseases, however,linkage results have been reported that havenot been confirmed by further studies, as was

initially the case for Charcot-Marie-Toothdisease, and is still the case for a number ofpsychiatric disorders including schizophreniaand manic depression.23 Locus heterogeneityhas been suggested as a possible explanationfor such differing results. While it is com-

monly recognised that locus heterogeneity maycomplicate linkage studies, it has not pre-

cluded the accurate localisation of major genes

for polycystic kidney disease and, eventually,Charcot-Marie-Tooth disease. For mapping

under locus heterogeneity, it is necessary toidentify the subset of families that show lin-kage to a given chromosomal region. For largefamilies that can generate significant evidencefor linkage when studied individually this canbe done directly. When the average family sizeis small, however, this distinction cannot bemade in a straightforward manner, and it isintuitively clear that a large number of familieswill be required for detection of one or severalof the responsible genes. This has been con-firmed in theoretical studies using computersimulation and analytical methods.45

Various statistical methods have becomeavailable to make optimal use of the linkageinformation in a series of (relatively) smallfamilies. The most commonly applied test isthe admixture test,67 which involves simultan-eous estimation of the location(s) of the re-sponsible genes and of the proportion of fami-lies segregating for each of these genes. Thismethod has been instrumental in detectingseveral loci causing retinitis pigmentosa on theX chromosome.8For tuberous sclerosis, a neurocutaneous

disorder characterised by widespread hamar-tosis, linkage studies have yielded conflictingresults. While some studies provided evidencefor a locus close to the ABO blood group geneon chromosome 9,910 other studies could notconfirm this linkage, and indicated chromo-somes 11 and 12 as possible alternative loca-tions." 12 Considerable efforts have been madeto combine linkage data from the groups parti-cipating in the TSC consortium and to analysethe data with a variety of statistical methods.Unfortunately, these studies have not entirelyresolved the controversy, although consistentevidence has emerged for a TSC1 gene on thelong arm of chromosome 9 and for the exis-tence of locus heterogeneity.13-5For the simultaneous evaluation of the dif-

ferent chromosomal regions that may beinvolved in TSC, we have developed an exten-sion of the admixture approach,6716 which wehave previously called the imaginary chromo-some (IC) approach. In a single analysis, thismethod will evaluate linkage results for allrelevant chromosomal regions. For a givenfamily, this allows simultaneous evaluation ofpositive evidence for one of the regions andnegative linkage information for the other re-gions. While this approach has the advantageof making maximal use of all available infor-mation, it shares with some other statisticalmethods the disadvantage of not being trans-parent. Thus, it is not immediately apparent

867

on August 6, 2021 by guest. P

rotected by copyright.http://jm

g.bmj.com

/J M

ed Genet: first published as 10.1136/jm

g.29.12.867 on 1 Decem

ber 1992. Dow

nloaded from

Page 2: J Computer linkage heterogeneity evaluation familyous sclerosis (TSC) brought together linkage data pertaining to the candidate chromosomes 9, 11, and 12 for a large group offamilies

8Janssen, Sandkuijl, Sampson, Halley

how much evidence a given data set can poten-tially yield in an analysis of linkage and hetero-geneity which includes several chromosomalregions simultaneously. Also, it is not entirelyclear what level of evidence is to be regarded as'significant', the criterion that researchers aremost interested in. Ott' suggested that thelikelihood ratio favouring linkage or heteroge-neity should always be reported, leaving thedecision about whether a certain likelihoodratio is to be regarded as significant to theindividual researchers. Indeed, uniformguidelines are hard to define, as the statisticalbehaviour of combined tests of linkage andheterogeneity in multipoint linkage analysishas not been adequately investigated.

Recently, we have analysed the collaborativeTSC data set, after rigorous checking for dataerrors, and after reassessment of diagnosesfollowing uniform criteria. The results, pre-sented in an accompanying paper," indicateexistence of a major locus (TSC1) on chromo-some 9 and at least one other locus elsewherein the genome. Chromosome 12 might harbouranother TSC gene of minor importance. Tohelp interpret these results, we carried outextensive computer simulations, using boththe conventional admixture test and its multi-locus extension. In the current study we willaddress the following questions. How muchlinkage information is potentially present inthe combined TSC families, and how is thatinformation distributed over the families? Isthere any difference in effective size betweenthe families that show linkage to the chromo-some 9 markers and those that yield negativelod scores? How much evidence for linkageand heterogeneity may one expect to findunder various degrees of locus heterogeneity ina data set of this size? The answers to thesequestions can provide both information aboutthe power and size of the collaborative TSCdata set and insight into the performance of theadmixture test when using multiple candidateregions.

Materials and methodsFAMILY MATERIALIn our studies we used 128 families from eightdifferent centres: Irvine, Boston, Houston,Durham (USA), Cardiff, London (UK),Erlangen (Germany), and Rotterdam (TheNetherlands). This set of families is identicalto that used in the collaborative linkagestudy."7 Almost all families used in this studyhave been described before," 1216 M22 althoughphenotypic data have been fully reviewed withamendment of every subject's affection statusfor these studies."7For our simulation studies we classified all

subjects for whom actual marker tests hadbeen carried out as 'available', others wereclassified as 'unavailable' for DNA analysis.Apart from family structure, diagnostic assess-ment, and availability, no other family datawere used.

COMPUTER SIMULATIONIn our simulation studies of linkage and het-erogeneity, three separate steps can be dis-tinguished.

(1) Preparation of hypothetical marker datafor all persons for whom DNA samples wereavailable. These marker data were generatedusing the computer program SLINK,23 whichtakes the disease status of all family membersinto account. Penetrance and gene frequencywere as used in the collaborative linkagestudy." We assumed that a hypotheticalmarker was located at 5% recombination fromeach of the TSC genes. For each family, 100distinct replicates were made, each with newhypothetical marker data. Three levels ofinformativeness were simulated: eight alleles(PIC value 0 86), four alleles (PIC value 0 7),and two alleles (PIC value 0375). In a separateseries of simulations, we generated data formarkers with similar informativeness, butunlinked to any of the TSC genes.From the simulated data for each individual

family, simulated versions of the entire data setwere created by selecting for each family alinked or unlinked replicate with probabilitiesa and 1-a respectively (where a denotes theassumed proportion of families linked to aparticular region).

(2) The lod score calculations, by regularanalysis of linkage using the generated markerdata. Lod scores were calculated for each repli-cate, varying the recombination frequencyfrom 0-0 to 0 5 in steps of 0 01. Lod scorecalculations on these replicates were per-formed batchwise, using the MLINK optionof the LINKAGE package (version 5 03). Inthese calculations, we applied the same genefrequency as in the collaborative linkage analy-sis (1 10-),"I while no allowance was made fornew mutations.

(3) Heterogeneity analysis using single locusand multilocus versions of the admixture test.All admixture tests that we applied are basedon the assumption that disease genes may existat two or more locations in the genome, butthat penetrance and mutation frequency areequal for all genes involved. Lod scores calcu-lated in the previous step were transferred to aslightly modified version of the HOMOG pro-gram.' For simultaneous evaluation of severalchromosomal regions the HOMOG2 programwas used. The necessary input was created byappending a list of lod scores for an unlinkedreplicate of a given family to a list of lod scoresfor a linked replicate of the same family or viceversa (probabilities a and 1-a respectively).Thereby we created an input file containingtwo chromosomal regions, each represented byone marker. Each family was linked to eitherthe first or the second region. Together theregions formed a so called 'imaginary chromo-some'.

INFORMATION CONTENT OF FAMILIESIn order to get a more precise measure of theusefulness of each family for linkage studies,we calculated the mean lod score (expected lod

868

on August 6, 2021 by guest. P

rotected by copyright.http://jm

g.bmj.com

/J M

ed Genet: first published as 10.1136/jm

g.29.12.867 on 1 Decem

ber 1992. Dow

nloaded from

Page 3: J Computer linkage heterogeneity evaluation familyous sclerosis (TSC) brought together linkage data pertaining to the candidate chromosomes 9, 11, and 12 for a large group offamilies

Computer simulation of linkage and heterogeneity in tuberous sclerosis

score) at 5% recombination over the 100 repli-cates obtained in simulations of an eight allelemarker. The resulting mean Z(O = 005) wasdivided by 0-215 (the expected lod score at0 = 0 05 for one completely informative meio-sis). The quotient was termed the 'effectivenumber of informative meioses' (EFNIM),since it enables us to compare families withcompletely different structures. This measureis closely related to Edwards's 'equivalent ob-servations'24 (appendix).

ResultsPOWER OF TSC FAMILIES TO DETECT LINKAGEUNDER HOMOGENEITYWe estimated the power to detect linkageunder homogeneity by simulating markerswith low and high informativeness. We speci-fied a recombination frequency of 5% for thefollowing reasons. (1) The chromosome 9markers most closely linked to the TSC1 locusshow approximately 5% recombination,17 and(2) in a random genome search it is frequentlyattempted to test markers that divide thegenome into 10 cM intervals, which implies afrequency of recombination between the dis-ease gene and the closest marker of at most5%.By simulating a two allele marker we learned

what values of the lod score (Zmax) might beexpected if only regular RFLPs (PIC 0 37)were used for linkage mapping. Over 100 dis-tinct simulations of the entire data set, the twoallele marker gave a mean overall lod score of16 8. The highest lod score value was 24-8,while Zmax exceeded 10 5 in 95% of all repli-cates.

Simulations on an eight allele marker gaveus insight into the lod scores that could beobtained with dinucleotide repeats and otherhighly informative markers. A similar amountof informativeness can be expected from a mapof two, four, and five allele markers in amultipoint analysis as performed on the actualdata of the TSC collaborative group.'7The highest lod score obtained with the

eight allele marker was 56 7, the mean Zmaxwas 41-8, and at least 95% of all replicatesshowed a Zmax of 33-6 or higher.Rather than comparing individual families

by their mean lod scores, we decided to de-scribe the families in terms of their 'effectivenumber of informative meioses' (EFNIM).The EFNIM of a family represents thenumber of informative meioses one may expectto score on average in that family for a markersystem with given informativeness. TheZ(0= 0-05) values and EFNIM values for allfamilies are presented in the table (A and B).EFNIM values for the chromosome 9 linked

group of TSC1 families were compared withthose obtained for non-chromosome 9 linkedfamilies. Assignment of families to either ofthese two groups was made according to theposterior probabilities for linkage to chromo-some 9 as determined in the accompanyingstudy'7 using real linkage data for chromosome9 markers. The distribution of the families

over 11 EFNIM categories of increasing infor-mativeness is shown in fig 1. There are nomarked differences in size (as summarised byEFNIM) between the two groups of TSCfamilies. Another obvious finding is the pau-city of large families in both groups. Mostfamilies are in the 0-1 EFNIM category. Thisis illustrated by the mean EFNIM values forboth family groups: 1-41 for the chromosome 9linked families and 1-46 for the non-chromo-some 9 linked group.

EXCLUSION POWER OF TSC FAMILIESIn general, exclusion studies are only validwhen the mode of inheritance is specifiedcorrectly in the analysis. In a real exclusionstudy for TSC one would therefore have totake into account the locus heterogeneity. Inour simulation study we evaluated the powerof the families for an uncomplicated exclusionstudy (that is, under linkage homogeneity),with the sole purpose of comparing the families.As expected, families that contained much in-formation for linkage detection also contributedmost information for exclusion (table A and B).

Together, the TSC1 families had the powerto exclude linkage over large genetic distances.When an eight allele marker was examined,95% of the replicates excluded 31 cM or moreon either side of the marker. With a two allelemarker the exclusion distance decreased to12cM.The non-chromosome 9 linked families

were able to exclude similar areas: at least29 cM for an eight allele marker and at least11 cM for a two allele marker in 95% of thereplicates.

POWER OF TSC FAMILIES TO DETECTHETEROGENEITYWe tested the performance of the families inheterogeneity analyses. We analysedHOMOGinput files (containing lod scores from linkedand unlinked families) and HOMOG2 inputfiles (containing lod scores from two regions,with each family linked to only one of these).Thus we tried to answer two questions. (1)What is the power of this data set for detectionof heterogeneity? (2) What is gained by includ-ing information from the alternative region ina two locus heterogeneity analysis?The interpretation of results of linkage

analysis in terms of statistical significance isnot always simple. Under homogeneity a lodscore of 3-0 or more is accepted as 'significant'evidence for linkage. A lod score of 3-0 corres-ponds to an odds ratio of 1000:1. This highthreshold has a statistical basis (p < 0 05 isusually regarded as significant evidence): whentwo loci are selected at random the chancesthat they will show linkage are small. It hasbeen estimated that the prior probability oftwo loci showing true linkage is only 1 in 50. Ifthe theoretical odds ratio of 1000:1 is correctedfor the low (1 in 50) prior expectation offinding linkage, the resulting frequency of falsepositive linkages will be 0-05.

869

on August 6, 2021 by guest. P

rotected by copyright.http://jm

g.bmj.com

/J M

ed Genet: first published as 10.1136/jm

g.29.12.867 on 1 Decem

ber 1992. Dow

nloaded from

Page 4: J Computer linkage heterogeneity evaluation familyous sclerosis (TSC) brought together linkage data pertaining to the candidate chromosomes 9, 11, and 12 for a large group offamilies

80anssen, Sandkuijl, Sampson, Halley

Results of simulation studies presented as lod scores, EFNIM values, and percentages of all replicates. B= Boston,C= Cardiff, L = London, I= Irvine, R = Rotterdam, E= Erlangen, D= Durham, H= Houston. Z(max) = lod scoremaximised over 0, Z(9 = 5%) = lod score at a recombination fraction of 0.05, EFNIM= effective number ofinformative meioses (see text).

(A) The power of the chromosome 9 linked (TSCl) families

8 allele marker at 2 allele marker Unlinked 8 allele marker0=5% at 5% (O=50%)

Family Original Max Mean EFNIM Mean Mean Power to excludeNo Family No Z(max) Z(O= 5%) Z(O= 5%) Z(0= 5%) > 5 cM (%)

0-410*000-290-420-190-200*000700-100040090090-120-141-550-100-100950-220-080-220200*000-210740*000-240-820-621-340-080030540-110*000090340*000-060*000*500-780-150-080-270*000*000071-070*000*000-310-820-310040-810-641-910230o000250070o00

1-92 0-180o00 0.00

136 012196 0.15089 010

095 0070o00 0.00

326 031045 004018 002042 003040 0020-57 0-03067 004720 059049 003048 0044-40 0 321-01 0.110-38 0 02104 0o11092 0040*00 0.00

097 009

342 0280o00 0.00

1 10 0 08380 031288 0216-25 0-470-37 0 030-15 0.01

252 010

053 0050o00 0o00040 001

160 0120o00 0o00029 0020o00 0o002 31 0 173-61 024071 0-040 36 0-02127 0130o00 0.00

0o00 0o000 35 0-034*97 0-370o00 0.00

0o00 0.00

144 011

381 0401-45 0-08020 002378 0372-99 0-24889 0841 07 0-05002 000

1-17 0o09032 0020-01 0.00

-0460o00

-028-049-025-0290o00

-073-008-0*05-0-05-008-007-008- 153-0-17-009

- 107-026-0-09-028-021-0o00-035-0830o00

-029-089-063-1 27-0-05-007-0-88-0080o00

-0-10-0360o00

-0 120o00

-0*55-073-0-05-0-01-025-0-01-o000-003-0950o000o00

-032-077-037-007-085-083-202-024-0o00-032-0-10-o000

0

0

0

0

0

0

0

0

0

0

0

0

0

0

460

0

150

0

0

0

0

0

40

0

156

250

0

150

0

0

0

0

0

0

0

50

0

0

0

0

0

200

0

0

140

0

911590

0

0

0

0

When one is examining the possibility oflocus heterogeneity, however, no prior expec-tations can be formulated about whether a

disease will show locus heterogeneity or not.Therefore lod scores of 0O834, approximatelycorresponding to a x2 of 3-841 (p=0005 at1 df), have been conventionally accepted as

'significant'. The situation changes again,when examining the possibility that the pre-sumed second locus is also located within theregion for which markers were tested. Theprior expectation for the second locus being inthe tested region is low. Therefore, a thresholdlod score difference of 3 0 seems a prudentchoice. Accordingly, we formulated the fol-lowing tests of various hypotheses.

HO. No locus exists in the area. (Nullhypothesis. The lod score is 0 0 by definition.)H1. One locus maps in the area tested and

there is no heterogeneity. A lod score of at least3 0 is required.

H2(1 -pped) One locus maps in the area,heterogeneity exists, but it is assumed that thesecond locus is not located in the tested region.Both the null hypothesis and the Hi hypoth-esis should be rejected in favour of this altern-ative when the lod score exceeds that for H1 byat least 0834 and the lod score exceeds that forHO by 3-834 (=30 +O0834).

H2(2 ppd). Two loci exist in the tested area.There are three requirements for significance.There must be a lod score difference with

25

891214202122

10061009101010121013101520052008200920102011201420152016201820232026300130113015301930213026302830293033310140464067406840774219422142224264446750805159524054525733743074747479778178737981900490059006900790089010

B IB 2B 5B 8B 9B 12B 14B 20B 21B 22C6C9C 10C 12C 13C 15L 5L 8L 9L 10L 11L 14L 15L 16L 18L 23L 26I 1Il111 15I 19I 21I 26I 28I 29I 331101R 2046R 2067R 2068R 2077R 1219R 1221R 1222R 1264R 1467E 2080E 3159E 2024E 1452E 3733D 430D 474D 479D 781D 873D 981H 4H 5H 6H 7H 8H 10

1-350*000701-380700730*002220-360-230-250-250-250772-890-480-252-240570-250570440-060951-430*000701-581-163-230240-241-600-250*000-250-860*000-250*001-272-120-831-000990040-020-462-030*000*000-881-450990-251-571-993-830570-060-830-25007

870

on August 6, 2021 by guest. P

rotected by copyright.http://jm

g.bmj.com

/J M

ed Genet: first published as 10.1136/jm

g.29.12.867 on 1 Decem

ber 1992. Dow

nloaded from

Page 5: J Computer linkage heterogeneity evaluation familyous sclerosis (TSC) brought together linkage data pertaining to the candidate chromosomes 9, 11, and 12 for a large group offamilies

Computer simulation of linkage and heterogeneity in tuberous sclerosis

(B) The power of the non-chromosome 9 linked families

8 allele marker at 2 allele marker Unlinked 8 allele marker0=5% at 5% (O=50%)

FamilyNo

3671015161819

100310041005100710112001200220032004200620072012201320172019202020212022202420253003300430083016302030243034407941974223502452465383544954595462546554685865586670857111718874277431743774737482750778219001900390119012901390149015

OriginalFamily No

B 3B 6B 7B 10B 15B 16B 18B 19C3C4CSC7C IIL IL 2L 3L4L 6L 7L 12L 13L 17L 19L 20L 21L 22L 24L 25I 31 4I 8I 16I 20I 24I 34R 2079R 1197R 1223E 2024E 2246E 3383E 1449E 1459E 3462E 1465E 3468E 865E 4865D 1085D 1111D 1188D 427D 431D 347D 473D 482D 507D 821H IH 3H 11H 12H 13H 14H 15

MaxZ(max)

0-360050*000-870-250040-480040731-580-360-560-486-740040570-360-362630-680-480-250572-030250460701-081-560700-671-280420460-060-870040250700-250490*000*000*002-190*000-640570-022-290821-110-833-720-480091-520*000-680-982-750071-70200057

MeanZ(O= 5%)

0070000000-240-140000-110000350-870-050090-153 170000-260040-101-340-230-150-060-281-030-050-060-320340740-260-150-630-080040000440000-110-240060-160000*000*000-840*000070-230001-120330400-121-970-080*000-600*000-110461-340000-801-06026

EFNIM

0340*000*001-100-630*000-510-011-654070250-4107014-760*001-220-200466-231-060700-291-304820-230261-491-593-461-190-692-950-360-180-012040*000491-100290730*000*000*003920*000-321-050*005-191-511-860579 160350-012800*000502 146-240-023-734.951 21

MeanZ(O= 5%)

0020*000*000-110030*000040000-110270-010030-081-300000090000040660090-060-010 120420-010-020-140-160-310-080*050-24003

-0-010.000-150*000040070020070-000*000o000300*000050030o000440-100220040680-010o000270-000050-190530000320380-12

MeanZ(O= 5%)

-0-09-000000

-025-0-11-000-0 13000

-036-084-0-10-0 13-008-3.75000

-028-0-10-0-15-1-93-0 13-0-12-0-11-026-1 08-0 12-009-027-029-089-0 17-009-067-005-0 15-000-039-000-006-042-0-15-0030000o00000

-1-01000

-006-031-000-1*15-035-037-006- 2 37-0 16000

-077000

-0-11-042-1-44-000-1-00-1-00-0-18

Power to exclude>5cM (%)

0000000004000

900000

550000180000

11007000000000000

220000

23000

6400

11000

360

22210

H2(1 mapped) of 3-0, with Hi of 3-834, and withHO of 6-834.We tested these hypotheses in heterogeneity

analyses using simulated data sets containing acertain proportion of linked families (as of10%, 30%, 50%, 70%, and 90%). These ana-lyses made use of data for a four allele markersystem, since this closely resembles the com-bined informativeness of true marker maps inthe collaborative data (when expected andobserved lod scores were compared). Eachseries consisted of 100 replicas and thereforeinvolved 100 runs of the HOMOG program.By combining the simulated data for a

linked and an unlinked marker into an imagin-ary chromosome, we also created a typicalHOMOG2 problem, with two loci to bemapped within the tested area.

For a= 0 5, a mean lod score favouringH2(2 ppd) over HO of 24-27 was obtained.These results indicate that the family materialis highly suitable for any type of linkage orheterogeneity analysis (fig 2). When at least50% of the families were assumed to be linked,it was even possible to detect linkage under the(false) assumption of locus homogeneity in 94/100 attempts. For most a values the imaginarychromosome approach was more powerfulthan conventional HOMOG analysis.We also studied the precision of the estimates

obtained for the recombination fraction and theproportion of linked families. It was countedhow often the correct values for a (true a ± 0.1)and 0 (between 2 and 8%) were found inreplicates that yielded significant evidence forheterogeneity or linkage or both. The a and

871

on August 6, 2021 by guest. P

rotected by copyright.http://jm

g.bmj.com

/J M

ed Genet: first published as 10.1136/jm

g.29.12.867 on 1 Decem

ber 1992. Dow

nloaded from

Page 6: J Computer linkage heterogeneity evaluation familyous sclerosis (TSC) brought together linkage data pertaining to the candidate chromosomes 9, 11, and 12 for a large group offamilies

82anssen, Sandkuijl, Sampson, Halley

ena)

co

0

z

EFNIM category

Figure 1 Family distribution by 'effective number of informative meioses' (EFNIM).The distribution of all 63 TSCI families is shown in front (hatched), the 65non-chromosome 9 linkedfamilies are shown behind (solid).

1001

80

600

0-

nL40

20

0

0 01 02 03 0.4 05 06 0.7 08 0.9 1.0Proportion of linked families

Figure 2 Power to detect linkage and heterogeneity. Broken line (Hi) = power todetect linkage under the assumption of homogeneity (Z> 3 0), +-+ (H2, pp) =

power to detect linkage and heterogeneity if only one locus is known (see text7,*-* (H2=2mapped)=power to detect linkage and heterogeneity if two loci are known,applying the imaginary chromosome approach (see text).

0 values obtained from the one locus and twolocus heterogeneity tests were quite accurate(figs 3 and 4). However, for very low true asthe precision of the tested methods was foundto be insufficient.

DiscussionSUITABILITY OF FAMILY MATERIALOur family material was found to be verysuitable for heterogeneity analysis. The familystructures provide sufficient information both

for detection of linkage and exclusion of link-age. Furthermore family structure is verysimilar in the chromosome 9 linked andunlinked groups.The suitability of our family set has been

confirmed by heterogeneity analyses of thecollaborative data set.'7 For major genes, de-tection of linkage and heterogeneity will not beproblematical. The precision of the resultingvalues for a and 0 are acceptable. However,mapping genes responsible for TSC in a smallminority of families will be difficult, particu-larly when the a value approaches 10% or less.Under these circumstances the additionalpower offered by the imaginary chromosomeapproach is very useful, although still only 31/100 analyses reached significance.This may be important with respect to the

non-significant chromosome 12 findings of thecollaborative linkage study.'7 Our findings in-dicate that existence of a minor locus may onlyrarely yield significant evidence against homo-geneity. As long as reasonable 0 values emerge,it may be wise to continue the study of aputative minor locus. Linkage methods, how-ever, can contribute little to such a study.Other avenues, such as the t(3;12) transloca-tion in the case of TSC3,"1 will have to beexplored.

POWER OF THE METHODOLOGICAL APPROACHOverall the imaginary chromosome approachhas been shown to be a powerful method formapping loci when locus heterogeneity occurs.If multiple candidate regions have been identi-fied it seems sensible to analyse these simultan-eously. Only when the a exceeds 0 8 does theconventional HOMOG approach (H2(l mapped))perform better than the imaginary chromo-some approach. Since a high a implies that asmall number of families are assigned to thealternative locus, we may presume that thepower of these families is insufficient to meetthe required lod score difference of 3-0between H2(2 mapped) and H2(lmapped)3 as definedabove.Although we found the HOMOG analyses

on our data to be quite satisfactory, we dis-agree with the thresholds normally acceptedfor significance levels. Older versions of theHOMOG programs used to indicate results assignificant (p < 0-05) when the difference in lodscore for hypothesis H2(2 mapped) versus H1 is1 000 or more; in more recent publications,8odds of 50:1 or 100:1 in favour of H2(2mapped)have been regarded as significant. We feel thata lod difference of 3-834 is more appropriate(evidence for a second localisation should be atleast as convincing as evidence required for afirst localisation).Our revision of lod score difference required

for localisation ofnew TSC genes is relevant tothe previous apparent support for the putativechromosome 11 TSC locus.'6 The resultsobtained using conventional criteria for signi-ficance are apparently misleading. In contrastthe new results of the collaborative study donot provide any support for a TSC2 locus onchromosome 11 .17

872

.. ..

.................. ..............

on August 6, 2021 by guest. P

rotected by copyright.http://jm

g.bmj.com

/J M

ed Genet: first published as 10.1136/jm

g.29.12.867 on 1 Decem

ber 1992. Dow

nloaded from

Page 7: J Computer linkage heterogeneity evaluation familyous sclerosis (TSC) brought together linkage data pertaining to the candidate chromosomes 9, 11, and 12 for a large group offamilies

Computer simulation of linkage and heterogeneity in tuberous sclerosis

1 00 .. . ... ... ... ... .. ......-. ...

6080 .. .. .............. ...... . ......

60 .................... ... ... ................j' ..

40 //.------ ,--......I......

20 ... .. ... ...

20

_______ ,__ ___.0 0.1 02 0.3 04 05 06 07 08 0.9

Proportion of linked families

Figure 3 Precision of 0 estimates (9 obtained= true 0 ± 3%) under heterogeneity.Only results supported by significant lod scores were evaluated. Hl: broken line,H2( mapped), +,H2(2matpped)'100~~~~~~~~~~~~~~~ .. .. ... .. . .....

80 ..... =

co

40 ....

20 ... ..

We would like to thank the TSC collaborativegroup, particularly Moyra Smith, JonathanHaines, Jean Amos, David Kwiatkowski, Pris-cilla Short, Hope Northrup, Susan Blanton,Sue Povey, Mari Wyn Burley, J Michael Con-nor, Mark Nellist, Phillip Brook-Carter, PaulFleury, Arjenne Hesseling-Janssen, SennoVerhoef, Ray Kandt, Raimund Fahsold,Hans-Dieter Rott, Margaret Pericak-Vance,Corinne Merkens, Anneke Maat-Kievit, andDick Lindhout, for their part in the enormouseffort to establish this very useful set of fami-lies. This study was funded by the Nether-lands Praevention Fund (grant 28-1723) andthe National Tuberous Sclerosis Associationof the USA (grant 91-03).

Appendix Informativeness offamilies forlinkage.In linkage studies, the analysis of simple phaseknown families is straightforward: one cancount recombinants and non-recombinantsand no complicated statistical analysis isrequired. When some family members are notavailable for analysis, or when phase is notknown, the calculations can become extremelyinvolved. The evidence for linkage is thensummarised via the best estimate of the recom-bination frequency and the correspondingmaximum lod score. To ease the interpretationof lod scores, Edwards24 calculated the socalled 'equivalent observations' (n), that is, thenumber of recombinants and non-recombi-nants which would give the same lod score.

n log2+0og0+(i-0)log(-0)if 0>0 (1)

n= Zmax if 0=0 (2)

0 01 02 0.3 04 0.5 06 0.7 08 0.9 1.0Proportion of linked families

Figure 4 Precision of a estimates (a obtained= true a ± 10%) under heterogeneity.Only results supported by significant lod scores were evaluated. HI: broken line,

(If mwapped) + , 22 mgapped)

DESCRIPTION OF FAMILIES USING THE'EFFECTIVE NUMBER OF INFORMATIVE MEIOSES'We have used the 'effective number of infor-mative meioses' for describing family size. TheEFNIM value gives a much better descriptionof the family size than does the number ofgenerations, the number of affected, or thenumber of relatives. In contrast to the simu-lated lod score itself, it provides us with aninstant image of the family size. We proposethe use of EFNIM values to describe familiesin cases where it is not feasible to show thepedigrees themselves.The EFNIM distribution of the TSC1

(chromosome 9 linked) and non-TSC1 fami-lies showed that there is no obvious differencein family size. This implies that there is nomarked difference in the biological fitness as-sociated with the different genetic types oftuberous sclerosis.

According to this equation, a data set yieldinga lod score of 0A419 at 0= 02, for instance, isequivalent to five meioses, four of which arenon-recombinants. It should be noted that thismeasure is calculated for an entire data set; it isonly relevant when, in the calculations, thebest estimate of the recombination frequencyobtained in that same data set was used. Toillustrate this point, a data set containing fivecompletely informative meioses (four non-recombinants) will yield exactly five equivalentobservations when a recombination frequencyof 0-2 is assumed, but it will yield 25 equiva-lent observations when analysed under 0= 0-4,or only two when 0 = 0 1. This also impliesthat equivalent observations are only additiveover data sets when obtained for the samevalue of 0.While the number of equivalent observa-

tions is a very convenient measure to summar-ise a given data set, it cannot be used directly tocompare families with respect to their infor-mativeness for disease mapping for severalreasons. Firstly, the calculation of the numberof equivalent observations is based on the bestestimate of 0 in the entire data set. When theratio non-recombinants:recombinants in anindividual family deviates from the overall 0,

a)4)

ivco

C.,

0

873

on August 6, 2021 by guest. P

rotected by copyright.http://jm

g.bmj.com

/J M

ed Genet: first published as 10.1136/jm

g.29.12.867 on 1 Decem

ber 1992. Dow

nloaded from

Page 8: J Computer linkage heterogeneity evaluation familyous sclerosis (TSC) brought together linkage data pertaining to the candidate chromosomes 9, 11, and 12 for a large group offamilies

84anssen, Sandkuijl, Sampson, Halley

the number of equivalent observations fromthat family will not accurately reflect its infor-mation content, as became apparent in theprevious example. Secondly, the calculatednumber of equivalent observations depends onthe actual marker segregation in the family:when several parents are by chance homozy-gous for the marker, the information content ofthat family may seem small. For a fair compar-ison between families, the effects of chancemarker segregation have to be eliminated.

In our TSC families, we simulated markersegregation for markers with various degreesof informativeness, linked to a putative TSClocus with 5% recombination. For the calcula-tion of EFNIM values, lod scores were calcu-lated in these replicates for that same value of0. Frequently, the maximum lod score in aparticular replicate occurred at some othervalue of 0; the mean lod score over all repli-cates, however, peaked at a value of 5%. Thatmean lod score was used to compare familiesvia the calculated EFNIM (effective numberof informative meioses). The EFNIM wascalculated by

EFNIM= 2log 2+ 0 log 0 + (1-0) log (1-0)

with 0= 0 05 (3)

This equation is closely related to Edwards'sformula, but here the assumed recombinationfrequency is used in the calculations ratherthan the best estimate of the recombinationfrequency. Also, the calculations are not car-ried out for each simulated replicate separ-ately, but rather using the mean lod score overall replicates. Because EFNIM values arebased on a large number of simulations (in thiscase 100), chance fluctuations of marker infor-mativeness reduce to the average informative-ness of the marker as characterised by the PICvalue. The EFNIM values calculated herewere obtained for a marker with eight alleles(PIC 086). EFNIM values for a less informa-tive marker (with a PIC value of, say, 0375)can be approximated via

EFNIM(PICl) " EFNIM(PIC2) x (4)

How accurate this approximation is dependson the actual family structure: when manypersons have not been tested, reconstruction oftheir marker genotypes is only possible forhighly polymorphic markers. For such fami-lies, less polymorphic markers will yieldsmaller EFNIM values than calculated byequation 4.As two examples, consider families 1 and

1013 in the table (A).Family 1 yielded an EFNIM of 1-92 for a

marker with PIC=0-86. Using (4), a marker

with PIC=0*375 (two alleles with equal fre-quencies) should yield an EFNIM of 0-84.When calculated from the mean lod score forthe marker with two alleles the actual EFNIMis indeed 0-84.For family 1013, however, the predicted

EFNIM for the two allele marker is 3-14, whilethe observed EFNIM is 2-74. In this largefamily, analysis with a marker with eightalleles allows reconstruction of missing geno-types which is frequently not possible using atwo allele system.

1 McKusick VA. Current trends in mapping human genes.FASEBJ 1991;5:12-20.

2 St Clair D, Blackwood D, Muir W, et al. No linkage ofchromosome 5ql l-q13 markers to schizophrenia in Scot-tish families. Nature 1989;339:305-9.

3 Kelsoe JR, Ginns EI, Egeland JA, et al. Re-evaluation ofthe linkage relationship between chromosome llp lociand the gene for bipolar affective disorder in the OldOrder Amish. Nature 1989;342:238-43.

4 Narod SA. Power of the admixture test to detect geneticheterogeneity. Genet Epidemiol 1991;8:209-16.

5 Martinez M, Goldin LR. Power of the linkage test for aheterogeneous disorder due to two independent inheritedcauses: a simulation study. Genet Epidemiol 1990;7:219-30.

6 Smith CAB. Testing for heterogeneity of recombinationfraction values in human genetics. Ann Hum Genet1963;27: 175-82.

7 Ott J. Analysis of human genetic linkage. Revised edition.Baltimore: Johns Hopkins University Press, 1991.

8 Ott J, Bhattacharya S, Chen JD, et al. Localizing multipleX chromosome-linked retinitis pigmentosa loci usingmultilocus homogeneity tests. Proc Natl Acad Sci USA1990;87:701-4.

9 Connor JM, Yates JRW, Mann L, Aitken DA, StephensonJBP. Tuberous sclerosis: analysis of linkage to red cell andplasma protein markers. Cytogenet Cell Genet 1987;44:63-4.

10 Fryer AE, Chalmers A, Connor JM, et al. Evidence that thegene for tuberous sclerosis is on chromosome 9. Lancet1987;i:659-61.

11 Smith M, Smalley S, Cantor R, et al. Mapping of a genedetermining tuberous sclerosis to human chromosome11qI4-q23. Genomics 1990;6:105-14.

12 Fahsold R, Rott HD, Lorenz P. A third gene locus fortuberous sclerosis is closely linked to the phenylalaninehydroxylase gene locus. Hum Genet 1991;88:85-90.

13 Haines J, Amos J, Attwood J, et al. Genetic heterogeneity intuberous sclerosis: study of a large collaborative dataset.Ann NY Acad Sci 1991;615:256-64.

14 Povey S, Attwood J, Janssen LAJ, et al. An attempt to maptwo genes for tuberous sclerosis using novel two-pointmethods. Ann NY Acad Sci 1991;615:298-305.

15 Janssen LAJ, Povey S, Attwood J, et al. A comparativestudy on genetic heterogeneity in tuberous sclerosis:evidence for one gene on 9q34 and a second gene on11q22-23. Ann NY Acad Sci 1991;615:306-15.

16 Janssen LAJ, Sandkuijl LA, Merkens EC, et al. Geneticheterogeneity in tuberous sclerosis. Genomics 1990;8:237-42.

17 Sampson JR, Janssen LAJ, Sandkuijl LA, et al. Linkageinvestigation of three putative tuberous sclerosis deter-mining loci on chromosomes 9q, 1lq, and 12q. J MedGenet 1992;29:861-6.

18 Kandt RS, Pericak-Vance MA, Hung W-Y, et al. Linkagestudies in tuberous sclerosis: chromosome 9?, 11? ormaybe 14!. Ann NY Acad Sci 1991;615:284-97.

19 Sampson JR, Yates JRW, Pirrit LA, et al. Evidence forgenetic heterogeneity in tuberous sclerosis. J Med Genet1989;26:51 1-6.

20 Northrup H, Kwiatkowski DJ, Roach ES, et al. Evidencefor genetic heterogeneity in tuberous sclerosis: one locuson chromosome 9 and at least one locus elsewhere. AmJHum Genet 1992;51:709-20.

21 Haines JL, Short MP, Kwiatkowski DJ, et al. Localizationof one gene for tuberous sclerosis within 9q32-9q34, andfurther evidence for heterogeneity. Am J Hum Genet1991;49:764-72.

22 Burley MW, Atwood J, Kwiatkowski D, Povey S. The searchfor TSCl on chromosome 9q34. Ann Hum Genet (in press).

23 Ott J. Computer-simulation methods in human linkageanalysis. Proc Natl Acad Sci USA 1989;86:4175-8.

24 Edwards JH. The interpretation of lods in linkage analysis.Cytogenet Cell Genet 1976;16:289-93.

874

on August 6, 2021 by guest. P

rotected by copyright.http://jm

g.bmj.com

/J M

ed Genet: first published as 10.1136/jm

g.29.12.867 on 1 Decem

ber 1992. Dow

nloaded from