molecular pathology of single gene disorders

12
J Clin Pathol 1987;40:959-970 Molecular pathology of single gene disorders D J WEATHERALL From the Medical Research Council Molecular Haematology Unit, Nuffield Department of Clinical Medicine, John Radcliffe Hospital, Oxford SUMMARY Recent studies using recombinant DNA technology have led to an understanding of the basic molecular pathology of single gene disorders. Furthermore, methods are being developed for finding genes for conditions, whose underlying biochemistry is still not understood, or which may contribute to polygenic systems that underlie common diseases. As well as providing new approaches to carrier detection, prenatal diagnosis, and treatment of single gene disorders, these advances promise to provide important information about the pathophysiology of many common polygenic diseases. Considering that it is only about seven years since the first human genes were cloned and sequenced, a remarkable amount of progress has been made in unravelling the molecular pathology of single gene disorders. We already probably have a good idea of the repertoire of molecular defects that underlie most of them, and a start has been made in trying to relate these lesions to associated clinical phenotypes. These advances have important practical implications for carrier detection and prenatal diagnosis of genetic diseases, and in the long term may enable us to start to understand the molecular basis for common poly- genic conditions such as heart disease, diabetes, and the major psychoses. Thus the new techniques which have led to these advances are likely to have broad application in diagnostic pathology in the future. In this short review I shall summarise what is known about the structure of normal human genes, describe some of the different types of defects which give rise to abnormal gene function and how they may have arisen, and describe a few examples of how it is possible to relate abnormal gene structure and function to the associated clinical picture. There are several accounts of recombinant DNA technology for non-specialised readers.' 2 Normal gene structure and function Fig 1 summarises a typical human gene and the mech- anisms by which its messenger RNA product is pro- cessed and translated. With a few exceptions, all mammalian genes examined so far are broken up into coding regions (exons) and non-coding regions called introns. The number of introns varies widely, ranging from only two in the case of the globin genes to about 50 in the gene for the a chain of collagen. Although the function of the introns is still far from clear, it has been observed that, by and large, they separate regions of genes that code for different functional domains of proteins. It has been suggested, therefore, that their presence offers an evolutionary advantage. The cutting out of intervening sequences supposedly facilitates the juxtaposition and hence joint expres- sion of DNA sequences, which may previously have been widely separated throughout the genome and subsequently brought together by various recombination events. In this way the presence of introns would increase the speed at which selection for functionally useful fusion products might be pro- duced. Certainly the sequence of some genes-that for the low density lipoprotein receptor, for example-looks like an evolutionary patchwork of diverse sequences with homology to a variety of other genes. There are highly conserved sequences at the junc- tions between coding regions and intervening sequences. In all mammalian genes the dinucleotides GT and AG are found at the 5' and 3' ends of the introns. It is also apparent from comparative studies and analysis of the various human mutations, which will be described later, that there are other sequences at both the 5' and 3' ends of the introns near their junctions with exons that are also critical for the nor- mal splicing of messenger RNA. 959

Upload: trandang

Post on 12-Feb-2017

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular pathology of single gene disorders

J Clin Pathol 1987;40:959-970

Molecular pathology of single gene disorders

D J WEATHERALL

From the Medical Research Council Molecular Haematology Unit, Nuffield Department of Clinical Medicine,John Radcliffe Hospital, Oxford

SUMMARY Recent studies using recombinant DNA technology have led to an understanding of thebasic molecular pathology of single gene disorders. Furthermore, methods are being developed forfinding genes for conditions, whose underlying biochemistry is still not understood, or which may

contribute to polygenic systems that underlie common diseases. As well as providing new

approaches to carrier detection, prenatal diagnosis, and treatment of single gene disorders, theseadvances promise to provide important information about the pathophysiology of many commonpolygenic diseases.

Considering that it is only about seven years since thefirst human genes were cloned and sequenced, aremarkable amount of progress has been made inunravelling the molecular pathology of single genedisorders. We already probably have a good idea ofthe repertoire of molecular defects that underlie mostof them, and a start has been made in trying to relatethese lesions to associated clinical phenotypes. Theseadvances have important practical implications forcarrier detection and prenatal diagnosis of geneticdiseases, and in the long term may enable us to startto understand the molecular basis for common poly-genic conditions such as heart disease, diabetes, andthe major psychoses. Thus the new techniques whichhave led to these advances are likely to have broadapplication in diagnostic pathology in the future.

In this short review I shall summarise what isknown about the structure of normal human genes,describe some of the different types of defects whichgive rise to abnormal gene function and how theymay have arisen, and describe a few examples of howit is possible to relate abnormal gene structure andfunction to the associated clinical picture. There areseveral accounts of recombinant DNA technology fornon-specialised readers.' 2

Normal gene structure and function

Fig 1 summarises a typical human gene and the mech-anisms by which its messenger RNA product is pro-cessed and translated. With a few exceptions, allmammalian genes examined so far are broken up into

coding regions (exons) and non-coding regions calledintrons. The number of introns varies widely, rangingfrom only two in the case of the globin genes to about50 in the gene for the a chain of collagen. Althoughthe function of the introns is still far from clear, it hasbeen observed that, by and large, they separateregions of genes that code for different functionaldomains of proteins. It has been suggested, therefore,that their presence offers an evolutionary advantage.The cutting out of intervening sequences supposedlyfacilitates the juxtaposition and hence joint expres-sion of DNA sequences, which may previously havebeen widely separated throughout the genome andsubsequently brought together by variousrecombination events. In this way the presence ofintrons would increase the speed at which selectionfor functionally useful fusion products might be pro-duced. Certainly the sequence of some genes-thatfor the low density lipoprotein receptor, forexample-looks like an evolutionary patchwork ofdiverse sequences with homology to a variety of othergenes.

There are highly conserved sequences at the junc-tions between coding regions and interveningsequences. In all mammalian genes the dinucleotidesGT and AG are found at the 5' and 3' ends of theintrons. It is also apparent from comparative studiesand analysis of the various human mutations, whichwill be described later, that there are other sequencesat both the 5' and 3' ends of the introns near theirjunctions with exons that are also critical for the nor-mal splicing of messenger RNA.

959

Page 2: Molecular pathology of single gene disorders

Weatherall

C CA C TC A A AC A T TC T A G

Fkanking &In 1ff _ Is 2Inv A^ ANC GT AG r

AA

TEAEAA

AG N'

5'4

5' CAP O Q5' .-I< AAAA - A

AAAA - AIa00

1 i ~~AAAA A

Ribosome U~ UAA

Transfer DNA

Amino (acid Growing

chain IeFinishedchain

I Flanking GeneIC

3' mRNA precursor

Excision of intronssplicing of exonsprocessed mRNA

Translation

Fig 1 Typical mammalian gene and steps entailed in its transcription and translation. Exons areshown in black and introns (intervening sequence, IVS) unshaded. Regions ofgene which codeforuntranslatedportions ofmessenger RNA are indicated as NC (non-coding regions). Position of5'regulatory boxes is indicated.

There are other highly conserved sequences in mostmammalian genes. At the 5' flanking region there arethree "boxes" of homology: the CACCC box, locatedbetween -87 and -95 (87-95 nucleotides upstreamfrom the initiation codon); the CCAAT box, locatedbetween -72 and -77; and the TATA box locatedbetween -26 and -30 relative to the start site oftranscription. There is good experimental evidencethat these boxes represent cis acting sequences, whichare required for the accurate and efficient initiation oftranscription-that is, they are the major promotorsequences for structural genes. Such a sequence canbe defined as a region on a DNA molecule to whichan RNA polymerase binds and initiates gene tran-scription. At the 3' non-coding region of all mam-malian genes there is a sequence AATAAA that isthought to be a signal sequence for the processing andpolyadenylation of the 3' end of messenger RNAtranscripts. The initiation codon of messenger RNAis invariably AUG while any one of three codons,UAA, UAG, or UGA, can act as termination signals.When a gene is transcribed, the primary transcript

contains both introns and exons; while in the nucleus

a considerable amount of modification and pro-cessing occurs. Introns seem to be removed as a singlepiece after which the exons splice together. The splic-ing of pre-messenger RNA entails endonucleolyticcleavage and ligation of intron-exon junctions withthe formation of a complex lariat-like structure.3 Thesplicing machinery must be capable of aligning andholding together the intermediates of pre-messengerRNA during the splicing process; it is now clear thatseveral nuclear proteins have a role in this highlycomplex reaction. After the introns have beenremoved and it has been polyadenylated messengerRNA moves into the cytoplasm where it acts as atemplate for protein synthesis, the major steps ofwhich are summarised in fig 1.The way in which mammalian genes are regulated

is not yet fully understood. DNA does not exist incells in the form of a naked strand but is associatedwith various proteins to form chromatin. It is nowapparent that the physical state ofchromatin can varyin regions where DNA is being actively transcribed.Thus active genes are packaged in a changed form ofchromatin that shows increased sensitivity to

Nucleus

Cytoplasm

960

Page 3: Molecular pathology of single gene disorders

Molecular pathology of single gene disorders

lKb

3132 99100 N3031

* *za2 *al a2al 01 F Gy Ay *p16 -- 9 1 _MF-- 0 J---

42 '2 5Y2 a2C2 a2Y2 a2p2Hb Gower 1 Hb Portland Hb Gower 2 HbF HbA

104 105/

a262HbA2

FetusEmbryoFig 2 Hwnan globin gene clusters.

digestion by nucleases such as DNase I. In general,genes that are being actively described are hypo-methylated, compared with inactive genes. There isincreasing evidence that other sequences play a part inthe regulation of mammalian genes with respect totheir expression in different tissues-for example, a

number of so called enhancer elements with thisproperty have been defined for different human gene

families.Many human genes are found in families at partic-

ular chromosomal locations. Among the best studiedare the globin genes (fig 2) and the immunoglobulin"super gene" family. The a-like genes of globin are

found on chromosome 16 and the P-like genes on

chromosome 11; each cluster is arranged such that thegenes are in the order in which they are expressedduring development. Within these gene clusters thereare a number of inactive pseudogenes which may beevolutionary remnants of once active loci. Scatteredamong and within the structural genes there are

inherited base variations which may either producenew sites or remove pre-existing sites of cleavage byrestriction enzymes, so giving rise to so calledrestriction fragment length polymorphisms (RFLPs).In addition, in some gene clusters so called hyper-variable regions (HVRs) have been found-that is,lengths of DNA which vary considerably in lengthand which are highly polymorphic in this respect.These regions, which usually consist of simple repeatunits or satellite DNA, also constitute a valuableseries of polymorphic markers.Most human gene clusters examined to date con-

tain other families of repetitive DNA sequences. Forexample, the Alu I family, so called because of thepresence of a recognition site for this restriction endo-

nuclease in the centre of the repeat sequence, consti-tutes a family of repeats of about 300 nucleotideswhich occur some 300000 times within the humangenome. These units have a high level of homology,and although they have no known function, at least insome cases there is evidence that they are transcribed.The general reader is referred to Lewin for an excel-

lent account of gene structure.4

Human molecular pathology

Most of what is known about the molecular pathol-ogy of single gene disorders has been derived from a

study of the genetic abnormalities of haemoglobinproduction, particularly the abnormal haemoglobinsand the thalassaemias. Indeed, work in this field car-

ried out over the past five years has probably given us

a very good idea of the complete repertoire ofmutations that can affect human genes. As knowledgeabout the molecular defects in other single gene disor-ders is amassing it is apparent that most of the abnor-malities are similar to those which have been observedin different forms of thalassaemia. The table listssome of the different types of human gene mutations,together with the disorders in which they have beenfound. The molecular aspects of the globin gene dis-orders are reviewed by Weatherall and Wainscoat'and Bunn and Forget,6 and other single gene disor-ders by Cooper and Schmidtke,7 Gusella,8 Orkin,9and Davies and Robson.10

SINGLE BASE SUBSTITUTIONS LEADING TO

STRUCTURAL CHANGE IN A PROTEIN

There are numerous examples of inherited structuralchanges in proteins. By far the best studied are the

Adult

m

961

Page 4: Molecular pathology of single gene disorders

962haemoglobin variants. Adult haemoglobin consists ofa pair of a chains comprising 141 amino acid residuesand a pair of, chains with 146 residues. As each

Table Examples ofmolecular pathology ofsingle genediseases

Point mutations leading to structural variants:Abnormal haemoglobinsG-6-PD deficiencyal -antitrypsin deficiencyHereditary amyloidosisFamilial hypercholesterolaemia

Nonsense mutations:,B thalassaemiaFamilial hypercholesterolaemia (low density lipoprotein

receptor)Factor VIII or IX deficiency

Frame-shift mutations:f thalassaemiaFamilial hypercholesterolaemia (low density lipoprotein

receptor)Factor VIII or IX deficiency

Promotor mutations:0 thalassaemia

Deletions:a thalassaemia# thalassaemia, HPFH*, b6 thalassaemiaGrowth hormone deficiencyAnti-thrombin III deficiencyFactor VIII or IX deficiencyElliptocytosis (band 4,1)Lesch-Nyhan syndrome (HGPRT)Duchenne muscular dystrophyOsteogenesis imperfectaFamilial hypercholesterolaemia (low density lipoprotein

receptor)Chorionic somatomammotropin deficiencyRetinoblastomaWilms' tumour

Inversions:6B thalassaemia

Fusion genes:Haemoglobin variants, thalassaemiaRed/green colour blindness

Initiation codon mutations:a thalassaemia

Termination codon mutations:a thalassaemia

RNA processng mutations:Obligatory sequence

/ thalassaemiaPhenylketonuria

Consensus sequence,B thalassaemia

Pseudosplice substitution,B thalassaemia

Cryptic splice site activation/ thalassaemia, haemoglobin E

Poly A addition sitea thalassaemia,B thalassaemia

Signal peptide mutationChristmas diseaseHaemoglobin variants

*Hereditary persistence of fetal haemoglobin.

Weatherallresidue is coded by three bases, theoretically, there area total of 2583 single base substitutions that are possi-ble for these two genes. Of these, 1690 would result inan amino acid replacement, and about one third ofthese would cause a change in charge, allowing sepa-ration of the haemoglobin variant by electrophoresis.Remarkably, over 400 variants have been identifiedwhich, in most cases, can be explained by a single basesubstitution in the corresponding triplet codon of theglobin gene. A few variants have amino acid replace-ments at two different sites on the same chain, three ofwhich entail f chains with the f6 Glu-+Val substi-tution that produces sickle haemoglobin. These mayrepresent second mutations in genes which alreadycontained the sickle mutation or may have resultedfrom crossing over between two variant ,B chaingenes.

Usually, single amino acid substitutions have noeffect on the overall length of a peptide chain. Thereare exceptions, however; again we have to look tohaemoglobin for examples. Elongated globin chainscan be produced in various ways. There is a family ofchain termination mutations which give rise to elon-gated a globin chains. These result from single aminoacid substitutions in the chain terminating codonUAA-for example, in haemoglobin Constant Springthere is a change from UAA to CAA; the latter is thecodeword for glutamine. Thus instead of the chainterminating in the usual way glutamine is inserted andthen 3' messenger RNA sequences which are not nor-mally utilised are translated for another 90 bases untilanother in-phase stop codon is reached. Thus hae-moglobin Constant Spring has a chains containing 30additional residues at their C terminal end. Other sin-gle base substitutions in the termination codon canproduce similarly enlongated a chains but withdifferent substitutions at position 142.

Elongated products can also be produced by frame-shift mutations; the additional residues are found atthe C terminal ends of the globin chains. There arealso several examples of elongation of globin chainsat the N terminal end. These usually result fromreplacement of the N terminal valine by methionine,and this residue is preceded by an additional methi-onine. This interesting change is the result of a changein the normal processing of newly synthesised globin.Methionine is the first residue to be incorporated butduring translation of the nascent peptide the N termi-nal methionine is normally cleaved, leaving valine asthe N terminal residue of the ac and # chains. Thereplacement of valine for methionine probablyinhibits the peptidase that normally cleaves the NHterminal methionine, leading to the synthesis ofchains with the N terminal residues Met-Met. Inter-estingly, the substitution of a proline for a histidineresidue at position 2 in the ,B chain also interferes with

Page 5: Molecular pathology of single gene disorders

Molecular pathology of single gene disorders

the removal of the methione residue on nascent globinchains.

Shortened gene products can also be produced,usually by non-homologous crossing over betweenchromosomes, with the deletion of a varying numberof bases. There is one haemoglobin variant with a

shortened globin chain that results from a nonsense

mutation.

GENE DELETIONS AND VARIATION IN

GENE NUMBERThere are many examples of partial or complete dele-tions of genes as the basis for inherited diseases(table). Furthermore, we are starting to gain insightsabout how such deletional events may have occurred.One field that has been particularly productive in thisrespect is the analysis of the Ca thalassaemias.

Fig 2 shows that there are two closely linked a glo-bin genes on chromosome 16. In the a+ thalassaemiasthere is a deletion entailing this chromosome whichleaves a single functional a gene. In many forms of a'thalassaemia both a globin genes are lost. The mostlikely mechanism for the production of a chromo-some with a single a globin gene is non-homologouscrossing over between the two a globin gene loci aftermispairing of homologous chromosomes during mei-osis. Duplicated loci like the a genes have arisen by a

reduplication event which is mirrored by regions ofhomology in the flanking regions of the particulargenes involved. In the case of the a globin genes theseregions are designated X, Y, and Z. In fact, severaldifferent crossovers have occurred within thesehomology boxes, resulting in different types of a+thalassaemia. If the crossover theory is correct thereciprocal product of the crossover event, a chromo-some carrying three a globin gene loci, should beobserved. In fact, such cases have been found in everyhuman population that has been observed to date.Similar mechanisms almost certainly play a part inthe generation of variable numbers of y globin genes

on chromosome 11; individuals with one, three, or

even four y genes on chromosome 11 have beenfound.

There are other mechanisms for the production ofgene deletions. Recent work on both the a' thal-assaemia gene deletions and on long deletions whichinvolve the globin gene cluster and give rise to thephenotype of hereditary persistence of fetal haemo-globin (HPFH) have shown that some of the break-points of the deletions entail sequences with many

characteristics of the Alu I repeat sequences, as

described earlier. Similarly, there is an example of a

deletion involving the gene for the low density lipo-protein receptor, in which Alu-Alu recombination hasoccurred. Because of their high degree of homology

963

the Alu repetitive sequences may serve as "hotspots"for recombination. On the other hand, at least someof the long deletions which produce HPFH and a'thalassaemia seem to be through non-relatedsequences that is, they are examples of so calledillegitimate recombination.

Another interesting feature of deletions of the aand ,B globin gene clusters is that in many cases theyare of similar length, although at different pointsalong the genome.11 A rather novel mechanism hasbeen proposed to explain these observations: the dele-tions are generated by the loss of chromatin loops atdifferent stages of DNA replication as chromatinmoves through specific attachment sites on thenuclear matrix. After breakage the two ends of theDNA become reunited with the loss of a loop; recentevidence in favour of this observation has beenobtained from the study of an ax thalassaemia dele-tion, in which the gap across the deletion seems tohave been filled in by a DNA sequence that is nor-mally found at least 34 kilobases upstream from thesite of the deletion. This could only have happened ifthe deletion entailed a large loop of DNA whichbrought the "filler" sequence into the appropriateplace and orientation.12

FUSION GENESAnother interesting and important result of abnormalchromosomal crossing over is the production offusion genes which code for hybrid proteins. The firstand best studied example is haemoglobin Lepore,which has normal a chains combined with non-achains that have the N terminal amino acid sequenceof 6 chains and the C terminal sequence of P chains.This variant seems to have arisen through non-homologous crossing over between part of the 6 locuson one chromosome and part of the 1 locus on thecomplementary chromosome. Of course, such anevent should give rise to two abnormal chromosomes,one with a Lepore gene and the other with its oppositecounterpart, an anti-Lepore gene. In fact, both thesearrangements have been found in a number ofpatients. A similar mechanism seems to have played apart in the production of one of the sialo-glycoproteins of the red cell surface.Another elegant example of the generation of vari-

ation in gene number and the production of fusiongenes is provided by recent studies on the moleculargenetics of colour vision and colour blindness.'3Human colour vision is based on three light sensitivepigments. The genes for the red and green pigmentsshow 96% identity and lie in a tandem array on the Xchromosome on which there is a single red pigmentgene and variable numbers of green pigment genes;the blue pigment gene shows less homology and is on

Page 6: Molecular pathology of single gene disorders

964an autosome. Many of the different forms of red-green colour blindness seem to have resulted fromunequal crossing over between the red and green pig-ment genes with the production of a variety ofdifferent fusion genes.

INVERSIONSThere is one example of a gene inversion in man. Thishas been found in several patients with the phenotypeof fB thalassaemia; the inversion occurs in a region ofDNA between the ( and y globin genes; and there isalso a small deletion at each end of the inversion. Amodel has been proposed whereby an inversion ofthis type is generated by interactions between twochromosomal loops.

NONSENSE MUTATIONSSeveral point mutations have been described, whichcause scrambling of the genetic code and hence makeit impossible for the translation of a normal geneproduct. Again most of these examples occur in thal-assaemia, although similar lesions have beenobserved as the basis for disorders such as hae-mophilia, Christmas disease, and several other singlegene conditions. The first of these to be identified wasa substitution of codon 17 of the ,B globin messengerRNA, AAG-.UAG, which changes a lysine codon toa premature termination codon. Another prematuretermination codon of this type is commonly found inpatients with f thalassaemia-in this case a change incodon 39, CAG-+UAG; CAG codes for glutamine inthe normal ,B globin chain. Clearly, if there is a pre-mature chain termination codon in the middle of astructural gene, translation will cease prematurelywith the production of a shortened and phys-iologically useless peptide chain.Another way in which the genetic code can be

scrambled is by the generation of a so called frame-shift mutation, the basis of at least seven differentforms of ,B thalassaemia and some cases of hae-mophilia and Christmas disease. As proteins areencoded by a triplet code the loss or insertioti of one,two, or four nucleotides in the coding region of a genewill throw the reading frame out of sequence. As aresult, a completely anomalous amino acid sequencewill be added to a normally initiated globin chain.Sometimes the changed base sequence generates anew termination codon leading to premature termi-nation of translation of the abnormal messengerRNA. Occasionally the messenger RNA may betranslatable, and in this case there is a completechange of sequence from normal after the site of theframe-shift mutation. As mentioned earlier, at leastone form of human haemoglobin variant with anelongated f chain results from a frame-shiftmutation; in this case the normal stop codon is

Weatherallrendered out of sequence and therefore the scrambledmessenger RNA is translated until another stopcodon is produced so leading to an elongated trans-lation product.

DEFECTIVE PROCESSING OF MESSENGER RNAAs mentioned earlier the primary transcript has to beprocessed by the removal of introns, joining togetherof exons, and by polyadenylation. Work on thal-assaemia has provided a wealth of examples of molec-ular pathology involving these complex processes.We have already discussed how normal splicing of

messenger RNA depends on having GT and AGdinucleotides at the 5' and 3' intron-exon junctions.There are several examples of forms of P thalassaemiain which a single base substitution in one of thesecritical sites completely abolishes ,B globin chain prod-uction; no normal messenger RNA is produced.These findings underline the critical importance ofthese sequences for normal splicing.

There are, however, much more subtle abnormal-ities of messenger RNA processing due to pointmutations (table). Single base substitutions withinintrons may result in preferential alternative splicingof the precursor ,B messenger RNA molecules at thesite of the mutation-for example, a common form of,B thalassaemia which occurs in the Mediterraneanpopulation results from a single nucleotide substi-tution, G-+A, at position 110 of the first interveningsequence of the a globin gene. This change producesan AG sequence which happens to be preceded by astretch of pyrimidines and so forms a good 3' acceptorconcensus sequence. Thus about 80% of the pro-cessed messenger RNA is the result of splicing intothis site rather than the normal 3' IVS 1 AG. Themessenger RNA produced from the abnormal splic-

Normalsplkingt

Pgene -

3* thalosscoemici

GT IVSIAG GT IVS 2 AG__j

UG+GTCT |~

10%

90%Fig 3 Generation ofnew acceptor splice site in.firstintervening sequence. G_-A substitution produces new splicesite in IVS 1. This is used 90% ofthe time, with production ofmessenger RNA which still contains intron sequences.Normal splice site is only used 10% ofthe time.

m

Page 7: Molecular pathology of single gene disorders

Molecular pathology ofsingle gene disorders

5' 3'A GTTGGT------------TTAGT

ATCC ~ G

p op0 p0Fig 4 Some point mutations at intron-exon junctions offirstintron ofhuman / globin gene. Different thalassaemiaphenotypes are shown.

ing contains intron sequences and is therefore uselessas a template for globin chain synthesis. Because thissite is used preferentially, more abnormal messengerRNA than normal messenger RNA is produced andtherefore there is a severe deficiency ofnormal 0 chainproduction (fig 3).

Several other forms of thalassaemia have beendescribed which result from the production of alter-native splicing sites within introns: there are two vari-eties which result from single base changes at posi-tions 5 or 6 at the 5' end of the first interveningsequence (fig 4). Splicing occurs both at the normalsite and at the new sites generated by these basechanges. The effect of these point mutations isremarkably subtle. Some mutations at position 5cause a severe defect in / chain production, whilethose at position 6 are associated with an extremelymild phenotype. Several other types of thalassaemia

. _

A

965have been described in which point mutations in thesecond intervening sequence cause similar alternatesplicing and hence abnormal / globin messengerRNA molecules.

Perhaps even more remarkable is the fact thatmutations have been found in exons of the globingenes that seem to activate cryptic splice sites. One ofthese is particularly interesting because it is also asso-ciated with the production of a structural hae-moglobin variant, haemoglobin E. This variant has alysine for glutamic acid substitution at position 26.This results from a codon change GAG-.AAG (fig5). The latter seems to activate a "cryptic" splice sitewhich competes with the normal 5' splice site andhence leads to a reduced output of / globin chains.This may be why haemoglobin E is associated with a/ thalassaemia phenotype.

Several other single gene disorders are now knownto result from splicing defects. These include one formof phenylketonuria and several varieties of hae-mophilia and Christmas disease. Recent studies sug-gest that one form of factor IX deficiency results froma deletion of an entire exon of the factor IX gene.Despite this an abnormal gene product is produced,presumably by linking the remaining exons togetherduring messenger RNA processing. The functionalimportance of this unexpected finding is discussedbelow.

Finally, polyadenylation site mutations may alsointerfere with the normal processing of messenger

Wu'0in

C

#A

I-

$A

A A T

gene Exon 1

Consensus sequence

i -i . . ....

ttgGTGGTGA GGccctgggCAOGTTGGT111 II 11111 Ii

CAGGTAAGT CAGGTAAGT

Fig 5 Activation ofcryptic splice site in exon I as basisfor some genetic disorders of chain production.G-.A change, which is responsiblefor amino acid substitution in haemoglobin E, activates cryptic splice site

that is responsiblefor reduced rate ofproduction ofhaemoglobinE and associated thalassaemia phenotype.Haemoglobin Knossos has similar phenotype with same molecular basis. T-_A substitution produces no

amino acid difference but results in reduced rate of0 globin chain production and, again, phenotype of0 thalassaemia.

Mutations

Intron 1

Page 8: Molecular pathology of single gene disorders

Weatherall

Consensussequence

-100

C RC CCC-75

CCAAT-25

TATA AT

I I 11111 11111 Tnscrgene

- CACj---CC CCAA ATAAA

p thalassemia ? GmutationsFig 6 Point mutations at or near promotor sites ofpglobin gene. These all lead to reduced rate ofP chain productionwith varying degrees ofphenotypic severity.

RNA-for example, the single base changeAATAAA-.AATAAC, which is found in the a glo-bin genes of patients with a thalassaemia in the Mid-dle East and Mediterranean region, seems to preventalmost entirely the production ofa globin chains fromthe a2 globin gene. Instead of the normal cutting andpolyadenylation of the messenger RNA precursor along molecule is produced which does not appear inthe cell cytoplasm. There may be a small amount ofpolyadenylated messenger RNA produced but theoverall effect is to inactivate almost entirely the a2globin gene. It is also possible that this mutation mayin some way interfere with the termination of a2globin gene transcription. A similar polyadenylationsite mutation has been observed in the ,B globin geneas the basis for one type of f thalassaemia.

INITIATION CODON MUTATIONSSeveral mutations have been observed in patientswith a thalassaemia, entailing either the initiationcodon itself or the sequences which immediately pre-ceed it. As would be expected these cause a completeabsence of normal a gene product.

PROMOTOR BOX MUTATIONSSeveral forms of 1 thalassaemia have been describedin which point mutations have been found upstreamfrom the ,B globin gene (fig 6), either within or adja-cent to the promotor boxes described earlier. Thesemutations are associated with a variable reduction inoutput from the adjacent locus. Their existenceunderlines the importance of these highly conservedregions of DNA and confirms their likely promotorfunction.

Structure function relations

It is now possible at least to make a start in trying tounderstand how the molecular pathology of humangenes is reflected in differences in clinical phenotypes.

In this section I shall summarise a few examples ofrecent successes in this important aspect of humanmolecular genetics.

HAEMOGLOBIN6Structure-function relations have been studied indetail for the many varieties of structural hae-moglobin mutations. Amino acid substitutions atcritical sites may change oxygen binding properties ofthe molecule in a variety of ways, includingmodification of the interaction between subunits.This occurs as part of the allosteric changes involvedin producing a sigmoid oxygen dissociation curve, bymodifying interactions with regulatory moleculessuch as 2,3 diphosophoglycerate, or by interferingwith critical residues within the haem pocket,and leads to the production of permanent met-haemoglobinaemia. A variety of clinical phenotypesare associated with these mutations, including geneticpolycythaemia and forms of congenital cyanosis.Chronic haemolytic anaemia may result from pointmutations which lead to changed molecularconfigurations such as occur in the sickling and hae-moglobin C disorders, or by a variety of differentmutations which produce molecular instability.

POST-TRANSLATIONAL MODIFICATION; INSULINAND SIGNAL SEQUENCES FOR FACTOR IX ANDGLOBIN PROCESSING'1416As mentioned earlier some proteins undergo a consid-erable amount of post-translational modification, andit is becoming apparent that point mutations mayinterfere with this process. Insulin consists of two dis-similar peptide chains, A and B, linked by two disul-phide bonds. Unlike many other proteins which con-sist of structurally distinct subunits, however, insulinis under the control of a single gene locus and chainsA and B are derived from a one chain precursor, pro-insulin. Proinsulin is converted to insulin by the

966

Page 9: Molecular pathology of single gene disorders

Molecular pathology of single gene disorders

enzymatic removal of a segment that connects theamino end of the A chain to the carboxyl end of the Bchain, called the C peptide. Familial hyper-proinsulinaemia results from mutations at the cleav-age sites connecting the A chain to the C peptide.There have been a few reports of structurally variantinsulins that are functionally defective: the replace-ment of phenylalanine by leucine at position 24 in theinsulin gene is associated with a diabetic phenotype.Another recently described defect in post-

translational modification of a protein is a mutationresponsible for one form of Christmas disease. FactorIX is synthesised as a precursor and might beexpected to be proteolytically cleaved in at least twopositions during maturation to remove a prepeptideand a propeptide region. One form of Christmas dis-ease results from a single amino acid substitution atposition -4 in the propeptide region: an arginine isreplaced by a glutamine. This change results in theexpression of a stable, longer protein with 18 addi-tional amino acids of the N terminal propeptideregion still attached. This is an important findingbecause it suggests that during the normal maturationof factor IX a signal peptidase cleaves the peptidebond between amino acid residues -18 and -19,generating an unstable pro-factor IX intermediate.Further proteolytic processing to the mature factorIX molecule must depend on the arginine residue atposition 4.

Interestingly, the arginine at -4 is not unique tothe factor IX precursor but is also found in factor Xand prothrombin and in many other sequences pro-cessed by site specific trypsin-like enzymes-C3, C4,and C5 of the complement system and tissue typeplasminogen activator.As mentioned earlier, haemoglobin variants have

been found in which there are elongated globin chainsentailing residues at the N terminal end. These offersome particularly interesting insights into post-translational modification of proteins. The amino ter-minal methionine residue, the translation product ofthe AUG initiation codon, is present only transientlyin the nascent peptide chains of most proteins. Onevariant that has been analysed in detail, haemoglobinLong Island, shows a methionine residue at the end ofthe ,B globin chain; the second residue is prolineinstead of histidine. It seems likely that the latter basesubstitution results either in a structural or chargedifference in the nascent ,B globin chain, which inter-feres with a methionine aminopeptidase mechanism,or causes a change in the secondary structure of themessenger RNA of sufficient magnitude to impair theremoval of the amino-terminal methionine residue.

In proteins that are secreted or membrane-boundmethionine constitutes the amino-terminal residue ofa peptide of about 20 residues in length is apparently

967

essential for both protein secretion and its incorpo-ration into membranes. This "signal sequence",which is hydrophobic, varies in length and sequence(except that methionine is always present at theamino-terminal end) and it is cleaved by a membranebound enzyme in secreted proteins. In fact, it has beenfound that the signal sequence is only one of the con-stituents of an 11S protein termed the "signal recog-nition protein". As haemoglobin is a cytoplasmicnon-secreted protein it remains to be establishedwhether the preserved amino-terminal methioninepresent in these mutant haemoglobins has any effecton the processing or cellular compartmentalisation ofthis mutant.

RECEPTOR FUNCTION: MUTATIONS OF THE LOWDENSITY LIPOPROTEIN RECEPTOR17One of the most elegant emerging stories of structure-function relations at the molecular level is the eluci-dation of the mutations that involve different regionsof the low density lipoprotein receptor gene, andwhich lead to the disruption of the normal control ofcholesterol metabolism. The low density lipoproteinreceptor is a cell surface glycoprotein. It is synthesisedin the rough endoplasmic reticulin as a precursor,after which it travels to the Golgi complex and henceto the cell surface, where it is capable of binding twoproteins, apo B which is the sole protein of low den-sity lipoprotein, and apo E. These receptors undergoa quite remarkable recycling process. They appear onthe cell surface in coated pits and within a fewminutes of their formation the pits invaginate to formendocystic vesicles. Multiple vesicles fuse to createlarger sacs called endosomes. When the pH of theendosome falls below 6 5 the low density lipoproteindissociates from the receptor. The receptor thenreturns to the surface. Each low density lipoproteinreceptor makes one round trip about every 10minutes in a continuous fashion, whether or not it isoccupied by low density lipoprotein.The gene for the low density lipoprotein receptor

has been isolated and sequenced and most of its pro-tein structure has been worked out. Patients homozy-gous for familial hypercholesterolaemia show a vari-ety of different mutations which, based on the way inwhich they change receptor function, can be sepa-rated into four classes. In the first, no receptors aresynthesised. In one case this is because there is a largedeletion of the low density lipoprotein receptor gene.A second class of mutations result in a reduced rate oftransportation from the endoplasmic reticulin to theGolgi apparatus; the receptors do not appear on thesurface of the cell but remain in the endoplasmic retic-ulin until they are degraded. The molecular defect hasnot yet been determined. A third class of mutations ischaracterised by normal receptor synthesis but failure

Page 10: Molecular pathology of single gene disorders

968to bind low density lipoprotein. It is believed thatthese mutations entail amino acid substitutions, dele-tions, or duplications in a cystein rich low densitylipoprotein binding domain. Finally, there is a class ofmutations in which the receptors reach the cell surfaceand bind low density lipoprotein but fail to cluster incoated pits. Three different molecular changes havebeen defined as the basis for this abnormality, all ofwhich entail the participation of the cytoplasmic tailof the receptor that protrudes into the cell cytoplasm.In one case a tryptophan codon has been converted toa nonsense codon. In another there is a single aminoacid substitution, tyrosine for cysteine, again in themiddle of the cytoplasmic tail domain. These remark-able studies not only underline the extraordinarymolecular heterogeneity of what seem to be similargenetic disorders but also provide considerableinsights into the physiology of receptor function.

PHENOTYPIC VARIABILITY: # THALASSAEMIAINTERMEDIA5Analysis of the molecular basis of the thalassaemiasalso provides some important information about theway in which phenotypic variation can occur. Most Pthalassaemia homozygotes have a severe transfusiondependent disease. Some, however, have a mildercondition called / thalassaemia intermedia. In manyof the severe forms of / thalassaemia no # globinchains are synthesised. The molecular basis of thesedisorders is heterogeneous and may result from dele-tions, nonsense mutations, frame-shift mutations, orpoint mutations at the intron/exon junctions. Thereare, however, milder forms of, thalassaemia in which# chains are produced but at a reduced rate. Some ofthese conditions result from point mutations in thepromotor regions of the # globin genes. Many ofthem, however, are caused by single base changes inconcensus regions at the intron/exon junctions orfrom the activation of cryptic splice sites, all of whichprovide the opportunities for alternative sites of splic-ing of messenger RNA. Depending on the degree towhich the normal, compared with the abnormal,splice site is used, a whole series of phenotypes ofvarying severity are produced.

Molecular studies of the / thalassaemias have alsoshown how the interaction of one or more geneticvariants at other loci can modify clinical phenotypes.Patients with # thalassaemia have imbalanced globinchain synthesis, and the severity of the disorderdepends on the degree of excess a chain production;excess a chains precipitate in red cell precursors andcause ineffective erythropoiesis. Thus patients whoare / thalassaemia homozygotes, who also inherit athalassaemia that reduces the amount of excess achains, are phenotypically milder than those with /thalassaemia alone. Similarly, patients with / thal-

Weatherallassaemia who inherit genetic determinants that allowpersistent fetal haemoglobin production are alsoadvantaged. Thus it is becoming apparent that whenwe know what kinds of genetic interactions to lookfor at the molecular level a rational explanation forthe remarkable variation in expression of mutantgenes within members of the same family may befound.

ANTIBODY PRODUCTION IN HAEMOPHILIA ANDCHRISTMAS DISEASE18 19Recent studies of patients with haemophilia andChristmas disease are also starting to turn up someinteresting genotype phenotype associations. Individ-uals with extensive deletions of the factor IX geneshow no factor IX antigen and, in many cases, pro-duce factor IX inhibitors. Recently a patient has beendescribed with Christmas disease in whom there wasdetectable antigen but whose DNA showed a 2-8 kbdeletion that had removed an entire exon and sur-rounding sequences. Surprisingly, however, thisdefective gene is transcribed and translated, pre-sumably due to some novel type of splicing event.This patient did not produce any factor IX inhibitors.Thus it seems that even with large deletions some kindof immunologically recognisable gene product can beproduced. These observations underline the subtletyof the relations between the molecular defects whichunderline single gene disorders and their clinicalphenotypes.

THE NEOPLASTIC PHENOTYPE20Although the story is as yet incomplete, another veryimportant example of structure-function relationsseems to be emerging from studies of the rare child-hood cancers, which seem to be inherited in arecessive fashion. The best examples are retino-blastoma and Wilms' tumour. The genetic deter-minants with a role in the production of thesetumours are found on chromosomes 13 and 11. Ineach case individuals may be carriers for a recessivegene on these chromosomes which, in the presence ofa normal allele, causes no clinical disability. If thenormal allele is lost by some form of genetic rear-rangement, however, the unopposed action of themutant locus produces a malignant phenotype.Restriction enzyme analyses of malignant cells com-pared with those of normal somatic cells have shownthat rearrangements, which give rise to inactivation ofthe normal allele, may occur in a variety of ways.Other examples of this phenomenon include wide-spread embryonal tumours (Beckwith-Weidemannsyndrome), some bladder cancers, and acoustic neu-roma and meningioma. So far, the action of theserecessive "tumour genes" have not been worked out.Undoubtedly, the elucidation of their function and of

Page 11: Molecular pathology of single gene disorders

Molecular pathology of single gene disorders

the effects of these mutations will provide animportant chapter in the evolving story of the molec-ular basis of neoplastic transformation.

It is already apparent that point mutations in cellu-lar oncogenes can sometimes change their function,such that they are capable of producing malignanttransformation in appropriate cell lines. Further-more, studies of disorders like chronic myeloid leu-kaemia and Burkitt's lymphoma, in which specificchromosome translocations move oncogenes intoother regions of the genome and hence cause theiractivation, are also providing important informationabout the steps in neoplastic transformation. ThePhiladelphia chromosome, which is found in chronicmyeloid leukaemia, results from a reciprocalexchange between chromosomes 9 and 22. There isrelocation of a portion of the cellular oncogene c-abland fusion of it with a newly identified locus called bcr(breakpoint cluster region). The genetic fusion createsa chimeric protein that includes the functionaldomain of the c-abl gene product whose enzymaticactivity is more stable than that of the normal geneproduct.

Although a variety of changes in oncogene struc-ture, location, number, and activities have been foundin association with cancers in man, it is still not clearhow these findings relate to the generation of a neo-plastic phenotypye, nor, indeed, whether changedoncogene function is a primary or secondary event.Of the 40 or so cellular oncogenes that have beendescribed, only four major classes of associated bio-chemical properties have been defined: protein phos-phorylation; metabolic regulation of proteins thatbind GPT; control of gene expression by influencingbiogenesis of mRNA; and the participation in repli-cation in DNA. Genetic changes in oncogene func-tion might cause these cellular regulatory genes tomalfunction by causing constitutive activity and asurfeit of an otherwise normal gene product, or bychanging the manner by which a protein acts, such asvarying the substrate specificity of a protein kinase orthe specificity of a transcription factor.The proliferation of cells is controlled by an elabo-

rate circuitry that stretches from the surface of the cellto the nucleus. Bishop suggested that the products ofcellular oncogenes may represent some of the junctionboxes in the circuitry20; polypeptide hormones thatact on the surface of the cell, receptors for these hor-mones, proteins that carry signals from receptors, andnuclear functions may all interact to orchestrate thegenetic response to afferent commands. It is suggestedthat oncogenes may act as short circuits at these junc-tion boxes, but all this is highly speculative at themoment, and it is still not possible to relate disor-dered oncogene activity to the primary event whichrenders a particular cell line neoplastic.

969INHERITED DISORDERS OF COLLAGEN21Considerable progress has been made in defining thegenes that regulate the structure of human collagen.Clearly there must be considerable constraints on thestructure of the fibrillar collagen genes, and it followsthat serious mutations must have been eliminated inthe population. There are a variety of single gene dis-orders associated with defects in connective tissue,however, and it is quite possible that inherited con-nective tissue fragility might contribute to the inher-itance of or susceptibility to polygenic conditionssuch as osteoporosis, osteoarthritis, idiopathic sco-liosis and the floppy mitral valve syndrome.

Progress has been made in some of these areas,work which has considerable importance forstructure-function relations. Several mutations of thecollagen genes have been found in patients withosteogenesis imperfecta. In general, there are twoclasses of mutation of the structural loci for collagenthat have different phenotypic effects. The first ismutations which lead to the expression of a mutant achain that is incorporated into the collagen molecule.The second class is mutations that cause exclusion ofthe abnormal gene product from the molecule. Itturns out that the former class of mutants is moresevere because of the effect they have on theconfiguration of collagen. At least two well documen-ted cases of a lethal form of osteogenesis imperfectaare produced by mutations of this type. These earlystudies on the differential effects of collagenmutations indicate that further studies of genetic vari-ability of collagen will be of particular value in under-standing structure function relations.

DEVELOPMENTAL MUTATIONS2 2One of the central questions in human biology is howgenes are switched on and off at specific times duringhuman development. The globin genes offer a partic-ularly good example of this phenomenon as humanhaemoglobin changes its structure betweenembryonic, fetal, and adult life. There is a group ofconditions with the general title hereditary persistenceof fetal haemoglobin, in which there is a geneticallydetermined defect in the normal switch from fetal toadult haemoglobin production. A family of pointmutations has been found in the region -140 to-202 in the y globin genes that are associated withthe persistent production of the y genes of fetal hae-moglobin into adult life. These observations raise theintriguing possibility that there are critical regulatoryregions of DNA, which may have a role in inter-actions with specific proteins that play a part in thesuppression of fetal globin genes during adult life.This seems to be a particularly promising area forfurther study of the developmental regulation of theglobin gene families.

Page 12: Molecular pathology of single gene disorders

970 Weatherall

Conclusions

Studies of the molecular basis for single gene disor-ders are providing a remarkable insight into the reper-toire of lesions at the DNA level which areresponsible for single gene disorders. As informationof this type accumulates we can start to build up apicture of how these molecular lesions are reflected inthe clinical phenotypes of patients with these condi-tions. In particular, we can start to identify some ofthe factors in the remarkable variation in the clinicalpicture that are associated with what is apparently thesame type of genetic defect. As well as being of con-siderable clinical value in terms of counselling andprenatal identification of serious genetic diseases, thisnew information is providing considerable insightsinto the way in which messenger RNA is processedand into the structure-function relations of proteinssuch as enzymes and cell surface receptors. Now thatit is possible to construct mutant proteins by the newtechniques of protein engineering,23 it should be pos-sible to analyse these findings in more detail. Thisfield is also starting to yield tantalising clues about thegeneral nature of neoplastic transformation.Our ultimate goal will be to understand in detail

how individual genes are regulated and expressed inspecific tissues at particular developmental stages. Itis also hoped that it will be possible to start using thesame types of techniques as have been used to studysingle gene disorders to analyse the much more com-plicated question of polygenic disease and hence tounderstand the genetic factors which contribute tocommon conditions such as vascular disease,diabetes, and some of the major psychoses.As far as single gene disorders go the ultimate

objective is to reach a stage at which we understandenough about their regulatory sequences that we arein a position to replace defective genes.24 Until thattime is reached, however, it is quite apparent thatrecombinant DNA technology will assume anincreasingly important role in diagnostic pathology,certainly in the analysis of single gene disorders, in thefurther understanding of neoplastic transformation,and, of course, in the wider diagnostic aspects ofinfectious, degenerative, and malignant diseases,which it has not been possible to cover in this short

22review.

References

I Emery AEH. An introduction to recombinant DNA. Chichester:John Wiley and Sons, 1984.

2 Weatherall DJ. The new genetics and clinical practice. 2nd ed.Oxford: Oxford University Press, 1985.

3 Choi YD, Grabowski PJ, Sharp PA, Dreyfus G. Heterogeneousnuclear ribonucleoproteins: role in RNA splicing. Science1986;231:1534-9.

4 Lewin B. Genes. New York: John Wiley and Sons, 1985.5 Weatherall DJ, Wainscoat JS. The molecular pathology of thal-

assaemia. In: Hoffbrand AV, ed. Recent advances in hae-matology. 4th ed. Edinburgh: Churchill Livingstone,1985:63-8.

6 Bunn HF, Forget BG. Hemoglobin: molecular, genetic and clinicalaspects. Philadelphia: WB Saunders, 1986.

7 Cooper DN, Schmidtke J. Diagnosis of genetic disease usingrecombinant DNA. Human Genetics 1986;73: 1-1 1.

8 Gusella JF. DNA polymorphism and human disease. AnnualReview of Biochemistry 1986;55:831-54.

9 Orkin SH. Reverse genetics and human disease. Cell1986;47:845-50.

10 Davies KE, Robson KJH. Molecular analysis of human mono-genic disease. Bioessays 1987;6:247-53.

11 Vanin EF, Henthorn PS, Kioussis D, Grosveld F, Smithies 0.Unexpected relationships between four large deletions in thehuman f-globin gene cluster. Cell 1983;35:701-9.

12 Nicholls RD, Fischel-Ghodsian N, Higgs DR. Recombination atthe human a-globin gene cluster: sequence features and topo-ligical constraints. Cell 1987;49:369-78.

13 Nathans J, Piantanida TP, Eddy RL, Shows TB, Hogness DS.Molecular genetics of inherited variation in human colorvision. Science 1986;232:203-10.

14 Wetzel R. What is protein engineering? Protein Engineering1986;1:3-6.

15 Bentley AK, Rees DJG, Rizza C, Brownlee GG. Defective pro-peptide processing of blood clotting factor IX caused by amutation of arginine to glutamine at position -4. Cell1986;45:343-8.

16 Prchal JT, Cashman DP, Kan YW. Hemoglobin Long Island iscaused by a single mutation (adenine to cytosine) resulting in afailure to cleave amino-terminal methionine. Proc Natl AcadSci USA 1986;83:24-7.

17 Brown MS, Goldstein JL. A receptor-mediated pathway for cho-lesterol homeostasis. Science 1986;232:34-47.

18 Lawn RM. The molecular genetics of hemophilia: blood clottingfactors VIII and IX. Cells 1985;42:405-6.

19 Vidaud M, Chabret C, Gazengel C, Grunebaum L, Cazenave JP,Goosens M. A de novo intragenic deletion of the potentialEGF domain of the factor IX gene in a family with severehemophilia B. Blood 1986;68:961-3.

20 Bishop MJ. The molecular genetics of cancer. Science1987;235:305-1 1.

21 Sykes B. The molecular genetics of collagen. Bioessays1986;3:1 12-7.

22 Weatherall DJ. The regulation of the differential expression of thehuman globin genes during development. Journal of Cell Sci-ence 1986;4:319-36.

23 Leatherbarrow RJ, Fersht AR. Protein engineering. ProteinEngineering 1986;1:7-16.

24 Williams DA, Orkin SH. Somatic gene therapy. Current statusand future prospects. J Clin Invest 1986;77:1053-6.

Requests for reprints to: Sir David Weatherall, NuffieldDepartment of Clinical Medicine, John Radcliffe Hospital,Oxford OX3 9DU, England.