properties of the c-terminal domain of 4.1 proteins

9
Eur. J. Biochem. 268, 3709–3717 (2001) q FEBS 2001 Properties of the C-terminal domain of 4.1 proteins Catherine Scott, Gareth W. Phillips and Anthony J. Baines Department of Biosciences, University of Kent, Canterbury, Kent, UK At the C-terminus of all known 4.1 proteins is a sequence domain unique to these proteins, known as the C-terminal domain (CTD). Mammalian CTDs are associated with a growing number of protein–protein interactions, although such activities have yet to be associated with invertebrate CTDs. Mammalian CTDs are generally defined by sequence alignment as encoded by exons 18–21. Compari- son of known vertebrate 4.1 proteins with invertebrate (Caenorhabditis elegans and Drosophila melanogaster) 4.1 proteins indicates that mammalian 4.1 exon 19 represents a vertebrate adaptation that extends the sequence of the CTD with a Ser/ Thr-rich sequence. The CTD was first described as a 22/24-kDa domain by chymotryptic digestion of erythrocyte 4.1 (4.1R) [Leto, T.L. & Marchesi, V.T. (1984) J. Biol. Chem. 259, 4603–4608]. Here we show that in 4.1R the 22/24-kDa fragment is not stable but rapidly processed to a 15-kDa fragment by chymotrypsin. The 15-kDa frag- ment is extremely stable, being resistant to overnight digestion in chymotrypsin on ice. Analysis of this fragment indicates that it is derived from residues 709–858 (SwissProt accession no. P48193), and represents the CTD of 4.1R. The fragment behaves as a globular monomer in solution. Secondary-structure predictions indicate that this domain is composed of five or six b strands with an a helix before the most C-terminal of these. Together these data indicate that the CTD probably represents an independent folding structure which has gained function since the divergence of vertebrates from invertebrates. Keywords: cytoskeleton; domain; mass spectrometry; protein 4.1. The 4.1 proteins are a family of multifunctional cyto- skeletal proteins with particular roles in generation and maintenance of plasma membrane structure (reviewed in [1,2]). The prototypical 4.1 protein, now known as 4.1R, was defined in the human erythrocyte as an 80-kDa protein essential for normal cellular shape and integrity [3]. In this cell type, its activities are to promote binding between the two major cytoskeletal proteins, spectrin and actin [4,5], and to link the spectrin–actin meshwork to the plasma membrane by binding to the transmembrane protein glycophorin C and the membrane-associated guanylate kinase, p55 [6–8]. Mammals have four 4.1 genes, encoding 4.1R, 4.1G, 4.1 N and 4.1B proteins [9], and all these seem to have analogous biochemical activities [10]. Genome sequencing reveals the 4.1 proteins to be animal proteins: they have not so far appeared in any nonanimal genome. The inverte- brates Drosophila melanogaster and Caenorhabditis elegans each have one 4.1 gene. Of these, only the D. melanogaster 4.1 protein, coracle, has been analysed in detail. Comparison of the sequences of the human, Xenopus laevis and D. melanogaster 4.1 proteins revealed two major regions of sequence conservation [11]. One region is now known as the membrane-binding or FERM domain [12,13]. The other is the C-terminal domain (CTD). Surprisingly, the spectrin-actin binding (SAB) domain, which is highly conserved between human and frog, is absent from the fly 4.1, possibly indicating that 4.1 proteins have gained function during evolution. Coracle mutations that truncate the protein before the CTD are recessive embryonic lethal, indicating essential functions for this domain, but it is not required for membrane targeting of coracle [11,14]. Coracle is required for the formation of epithelial septate junctions, which are often compared with vertebrate tight junctions [15]. The CTD of mammalian 4.1R binds ZO-2, a ubiquitous mem- brane-associated guanylate kinase component of tight junctions [16]. Mammalian CTD activities are not restricted to binding membrane proteins: 4.1R and 4.1 N associate with mitotic spindles via interaction of the CTD with NuMA, the nuclear mitotic apparatus protein [17–21]. This contrasts strongly with the fly 4.1 protein, which seems to be restricted to membranes. The functions of 4.1 in cell division seem to have arisen after the divergence of invertebrates from vertebrates as no coracle mutation has any general effect on mitosis [15]. All these data indicate that 4.1 CTD has a critical role in the function of 4.1 proteins; moreover, CTD has gained function during evolution. It is surprising that there has been no clear definition of the CTD in biochemical terms. The homologous sequence between human and fly CTD is < 100 amino-acid residues, sufficient for it to form an independent folding domain. By sequence alignment, mam- malian CTD is usually described as encoded by exons 18–21 of the mammalian 4.1R gene, and equivalent exons in other mammalian 4.1 genes [10]. Leto and Marchesi [22] defined a series of chymotryptic fragments spanning the length of 4.1R that provided the first views of the FERM Correspondence to A. J. Baines, Department of Biosciences, University of Kent, Canterbury, Kent CT2 7NJ, UK. Fax: 1 44 1227 763912, Tel.: 1 44 1227 823462, E-mail: [email protected] Abbreviations: CTD, C-terminal domain; SAB, spectrin-actin binding; EST, expressed sequence tag. (Received 5 March 2001, accepted 8 May 2001)

Upload: catherine-scott

Post on 06-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Eur. J. Biochem. 268, 3709±3717 (2001) q FEBS 2001

Properties of the C-terminal domain of 4.1 proteins

Catherine Scott, Gareth W. Phillips and Anthony J. Baines

Department of Biosciences, University of Kent, Canterbury, Kent, UK

At the C-terminus of all known 4.1 proteins is a sequence

domain unique to these proteins, known as the C-terminal

domain (CTD). Mammalian CTDs are associated with a

growing number of protein±protein interactions, although

such activities have yet to be associated with invertebrate

CTDs. Mammalian CTDs are generally defined by

sequence alignment as encoded by exons 18±21. Compari-

son of known vertebrate 4.1 proteins with invertebrate

(Caenorhabditis elegans and Drosophila melanogaster) 4.1

proteins indicates that mammalian 4.1 exon 19 represents a

vertebrate adaptation that extends the sequence of the CTD

with a Ser/Thr-rich sequence. The CTD was first described

as a 22/24-kDa domain by chymotryptic digestion of

erythrocyte 4.1 (4.1R) [Leto, T.L. & Marchesi, V.T. (1984)

J. Biol. Chem. 259, 4603±4608]. Here we show that in 4.1R

the 22/24-kDa fragment is not stable but rapidly processed

to a 15-kDa fragment by chymotrypsin. The 15-kDa frag-

ment is extremely stable, being resistant to overnight

digestion in chymotrypsin on ice. Analysis of this fragment

indicates that it is derived from residues 709±858

(SwissProt accession no. P48193), and represents the CTD

of 4.1R. The fragment behaves as a globular monomer in

solution. Secondary-structure predictions indicate that this

domain is composed of five or six b strands with an a helix

before the most C-terminal of these. Together these data

indicate that the CTD probably represents an independent

folding structure which has gained function since the

divergence of vertebrates from invertebrates.

Keywords: cytoskeleton; domain; mass spectrometry;

protein 4.1.

The 4.1 proteins are a family of multifunctional cyto-skeletal proteins with particular roles in generation andmaintenance of plasma membrane structure (reviewed in[1,2]). The prototypical 4.1 protein, now known as 4.1R,was defined in the human erythrocyte as an 80-kDa proteinessential for normal cellular shape and integrity [3]. In thiscell type, its activities are to promote binding between thetwo major cytoskeletal proteins, spectrin and actin [4,5],and to link the spectrin±actin meshwork to the plasmamembrane by binding to the transmembrane proteinglycophorin C and the membrane-associated guanylatekinase, p55 [6±8].

Mammals have four 4.1 genes, encoding 4.1R, 4.1G,4.1 N and 4.1B proteins [9], and all these seem to haveanalogous biochemical activities [10]. Genome sequencingreveals the 4.1 proteins to be animal proteins: they have notso far appeared in any nonanimal genome. The inverte-brates Drosophila melanogaster and Caenorhabditis eleganseach have one 4.1 gene. Of these, only the D. melanogaster4.1 protein, coracle, has been analysed in detail.

Comparison of the sequences of the human, Xenopuslaevis and D. melanogaster 4.1 proteins revealed two majorregions of sequence conservation [11]. One region is nowknown as the membrane-binding or FERM domain [12,13].The other is the C-terminal domain (CTD). Surprisingly,

the spectrin-actin binding (SAB) domain, which is highlyconserved between human and frog, is absent from the fly4.1, possibly indicating that 4.1 proteins have gainedfunction during evolution.

Coracle mutations that truncate the protein before theCTD are recessive embryonic lethal, indicating essentialfunctions for this domain, but it is not required formembrane targeting of coracle [11,14]. Coracle is requiredfor the formation of epithelial septate junctions, which areoften compared with vertebrate tight junctions [15]. TheCTD of mammalian 4.1R binds ZO-2, a ubiquitous mem-brane-associated guanylate kinase component of tightjunctions [16].

Mammalian CTD activities are not restricted to bindingmembrane proteins: 4.1R and 4.1 N associate with mitoticspindles via interaction of the CTD with NuMA, the nuclearmitotic apparatus protein [17±21]. This contrasts stronglywith the fly 4.1 protein, which seems to be restricted tomembranes. The functions of 4.1 in cell division seem tohave arisen after the divergence of invertebrates fromvertebrates as no coracle mutation has any general effect onmitosis [15].

All these data indicate that 4.1 CTD has a critical role inthe function of 4.1 proteins; moreover, CTD has gainedfunction during evolution. It is surprising that there hasbeen no clear definition of the CTD in biochemical terms.The homologous sequence between human and fly CTD is< 100 amino-acid residues, sufficient for it to form anindependent folding domain. By sequence alignment, mam-malian CTD is usually described as encoded by exons18±21 of the mammalian 4.1R gene, and equivalent exonsin other mammalian 4.1 genes [10]. Leto and Marchesi [22]defined a series of chymotryptic fragments spanning thelength of 4.1R that provided the first views of the FERM

Correspondence to A. J. Baines, Department of Biosciences,

University of Kent, Canterbury, Kent CT2 7NJ, UK.

Fax: 1 44 1227 763912, Tel.: 1 44 1227 823462,

E-mail: [email protected]

Abbreviations: CTD, C-terminal domain; SAB, spectrin-actin binding;

EST, expressed sequence tag.

(Received 5 March 2001, accepted 8 May 2001)

and SAB domains (30-kDa and 8-kDa fragments, respec-tively). They also found a 22/24-kDa fragment that arosefrom the C-terminus. This is almost certainly larger than theregion encoded by exons 18±21, as these encode only16 kDa of polypeptide. The definition of the CTD so far istherefore clearly unsatisfactory.

To address the biochemical nature of the CTD, we havecompared the amino-acid sequences of all known 4.1 CTDs(identified by psi-blast analysis [23] of the nonredundantprotein database). Alignment of these sequences, and con-sideration of their exon structures, indicates that mamma-lian CTDs have gained an exon, equivalent to exon 19 inmammalian 4.1R. Digestion with chymotrypsin of bothnatural human erythrocyte 4.1R and recombinant mouse4.1R yields a fragment that defines 4.1R CTD as residues709 to the C-terminus. This includes the whole of exons18±21, plus a small extension on the N-terminus. Thisseems to represent a true structural domain given both itschymotrypsin resistance and its behaviour on gel filtration,which indicates it to be a globular monomer. Our datasuggest that the CTD of 4.1 proteins represents a foldingdomain, the core of which is conserved in evolution, butwhich has gained both additional sequence and functionssince the divergence of vertebrates and invertebrates.

M A T E R I A L S A N D M E T H O D S

Sequences and sequence analysis

blast searches of nonredundant protein and nucleotidedatabases were made using the National Center for Bio-technology Information (NCBI) server (http://www.ncbi.nlm.nih.gov/BLAST/). Searches of human and C. elegansgenomic sequences were made using blast servers at theSanger Centre (http://www.sanger.ac.uk/HGP/blast_server.shtml and http://www.sanger.ac.uk/Projects/C_elegans/blast_server.shtml, respectively). Drosophila genomic blastswere made using the Berkeley Drosophila Genome Projectserver (http://www.flybase.org/blast/).

Protein sequences were retrieved from the NCBI proteinsdatabase (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db�Protein) and aligned using the GCG Wisconsin Package[24].

Secondary-structure predictions were made using thejpred version 2 [25] server (http://jura.ebi.ac.uk:8888/).This method combines a number of prediction methods toform a consensus: the consensus prediction is shown below.

Proteins

Human erythrocyte 4.1 was prepared by the Tyler method[26] from outdated blood bank blood. A construct of proteinmouse 4.1R encoded by the last seven residues of exon 13,exons 16, 17, 18, 19, 20 and 21 with a His tag to aidpurification was produced in bacteria as described pre-viously [27]. For reference, the sequence of the construct isas in Fig. 1. Residues Glu601±Trp606 (numbering relativethe SwissProt P48193) are encoded by exon 13. Theunderlined sequence starting at residue 642 (beginning ofexon 16) is the SAB domain defined by Correas et al.[28,29]. The Arg residue immediately C-terminal to theSAB is Arg709. Glu858 is the natural C-terminus of mouse

4.1R. The construct has the sequence LEHHHHHH fused tothis.

Antibodies

Antibodies to a peptide encoded by exon 21 (anti-4.1RGB),and antibodies to 4.1R that react primarily with the SABdomain were as described previously [27].

Chymotryptic digestion

Tos-Lys-CH2Cl-treated chymotrypsin and tos-phe-CH2Clwere obtained from Roche (Lewes, Sussex, UK). Forchymotryptic digestion, native human erythrocyte 4.1R wastreated as described by Leto and Marchesi [22], withchymotrypsin concentrations as noted in the figure legends.Recombinant mouse 4.1R fragment was treated withchymotrypsin as noted in the figure legends and Resultssection in a solution consisting of 20 mm Na Hepes,pH 7.4, 10 mm MgCl2, and 0.5 mm CaCl2. Digestion wasterminated by the addition of tos-phe-CH2Cl to 100 mm.

SDS/PAGE and blotting

The Laemmli method [30] was used for 1D SDS/PAGE.Natural human erythrocyte 4.1 or its fragments wereanalysed on 5±15% linear gradient polyacrylamide gels.Recombinant 4.1R fragments were analysed on 10±20%polyacrylamide gels. Proteins were transferred from 1Dgels to nitrocellulose (Protran; Schleicher and Schuell) andprobed with antibodies and ECL reagents (AmershamPharmacia) as described by Scott et al. [27].

Gel filtration

A preparation of recombinant mouse 4.1R fragment(300 mg´mL21) was digested for 18 h on ice withchymotrypsin (0.8 mg´mL21) as described above. Thereaction was terminated by the addition of Tos-Phe-CH2Cl, and the preparation was centrifuged at 48 000 gfor 10 min at 4 8C. The mixture (0.2 mL) was applied to aSuperose 12 HR 10/30 (Amersham Pharamacia) columnequilibrated in 10 mm sodium phosphate (pH 7.4)/150 mmNaCl. The column was run at 0.8 mL´min21, and theelution profile monitored at 280 nm. The presence of the15-kDa 4.1R fragment in peak fractions was confirmed bySDS/PAGE. The column was calibrated with standardglobular proteins obtained from Sigma: apoferritin(443 kDa); b-amylase (200 kDa); alcohol dehydrogenase

Fig. 1. Mouse 4.1R protein construct.

3710 C. Scott et al. (Eur. J. Biochem. 268) q FEBS 2001

(150 kDa); BSA (66 kDa); carbonic anhydrase (29 kDa);myoglobin (16.9 kDa); cytochrome c (12.4 kDa).

HPLC and MS

A chymotryptic digest was prepared as described for gelfiltration except that the reaction was terminated by theaddition of trifluoroacetic acid to 0.2%. This mixture wascentrifuged at 48 000 g for 10 min at 4 8C. The supernatantwas separated by RP-HPLC on a 250 � 2.1 mm Vydac C18

column (The Separations Group, Hesperia, CA, USA)running on an HP 1100 HPLC system (Agilent Techno-logies, Palo Alto, CA, USA). The eluate was injecteddirectly into a LCQ ion trap mass spectrometer (Thermo-Quest, San Jose, CA, USA). Mass spectra were acquiredwith the ion trap operating in the normal mass mode andanalysed using BioExplore to give peptide masses.

Amino-acid sequencing

An 18-h chymotryptic digest was separated by SDS/PAGEand transferred to poly(vinylidene difluoride) (PVDF)membrane (Millipore) [31]. The band was identified byCoomassie blue staining, excised, and subjected toN-terminal sequencing in an Applied Biosystems 492sequencer.

R E S U LT S

Sequence alignment of 4.1 CTD reveals gain of sequenceduring evolution

To identify proteins containing the 4.1 CTD, psi-blast [23]was used. The nonredundant NCBI protein database wassubjected to this analysis using mouse 4.1R sequence as aprototypical CTD; the sequence used was encoded by exons18±21 (exon definition as in [32], i.e. residues 723±858 inSwissProt P48193). The only proteins found were pre-viously documented animal 4.1 proteins, plus four possiblegene products in C. elegans (accession numbers T20771,T20772, T20773, T20774). All the proteins identifiedcontained both FERM and CTD regions as indicated inFig. 2. As the relevant parts of the C. elegans hypotheticalproteins are encoded by the same gene, we have chosenonly one of these entries to represent the proteins we have

found, T20772: this one has both FERM and SAB domains,plus some sequence with very limited sequence identity tothe 4.1R U2region. The existence of this CTD sequence issupported by expressed sequence tag (EST) analysis:C. elegans ESTs yk223a4.5 and yk279c6.5 cover residues1±70 of the sequence in the alignment shown (no ESTs areavailable for the remaining sequence).

CTD sequences were aligned as shown in Fig. 3 usingpsi-blast [23]; the alignment was refined using the GCGpileup and lineup programs to maximize sequenceidentities. This alignment indicates that the most C-terminalhalf of the domain is well conserved between mammals andinvertebrates, but the N-terminal half is much more weaklyconserved. One possibility is that the N-terminal halfconfers functional specificity to individual 4.1 isoforms orgene products.

Walensky and coworkers [33] using a yeast two-hybridscreen described an interaction with the prolyl-peptidylisomerase FKPB13 that required a particular Pro in mouse4.1G (Pro967 in AAC40083; indicated with an asterisk inFig. 3). Interestingly this Pro is conserved in all the 4.1proteins.

The alignment in Fig. 3 indicates that the vertebrateproteins have some sequence rich in Ser and Thr residuesthat is not obviously present in the invertebrate 4.1 proteins.The observation of this gap prompts the question: does thisreflect a variation in genomic structure? To address this, theexons encoding fly and worm 4.1 proteins, human 4.1G and4.1 N were identified by comparison of their cDNAsequences with genomic sequence. The exon structure ofmouse 4.1R has been published previously [32]. At the timeof writing the human 4.1B genomic sequence is not yetavailable. Superimposing the exon structure on the sequencealignment (Fig. 4) reveals that the gap in the sequencealignment between TITY¼KTER in invertebrate 4.1proteins is filled by sequence encoded almost entirely byexon 19 of mouse 4.1R, and its equivalent in human 4.1G.The gap is also covered by sequence encoded by a similarexon in 4.1 N, but in this case, the exon boundary encodingthe N-terminus aligns further towards the N-terminus of theCTD.

The junction between the two 3 0 exons (encoding themost similar regions of all 4.1 proteins) are placed insimilar positions, indicating strong conservation of bothgene and protein in evolution. However, whereas the

Fig. 2. Domain structures of representative 4.1 proteins. Known vertebrate 4.1 proteins have a domain structure arranged around three conserved

modules. These are a FERM domain, a SAB domain and the CTD. The conserved domains are joined by regions of unconserved sequence (regions

U1±U3). The invertebrate 4.1 proteins from D. melanogaster and C. elegans also have FERM and CTD regions. Differential mRNA splicing

generates many isoforms of the proteins, e.g. the 80-kDa erythrocyte 4.1R, and the forms encoded by D. melanogaster coracle cDNAs 1 and 3. The

SAB is not obviously present in either invertebrate protein. As FERM domains are found in several other animal proteins, the CTD is the only

domain unique to all 4.1 proteins.

q FEBS 2001 Protein 4.1 C-terminal domain (Eur. J. Biochem. 268) 3711

vertebrate CTD appears to be comprised of not less thanfour exons (five in the case of 4.1 N), the invertebrates onlyhave three; note how three invertebrate exons more thanspan the entire CTD.

All these data are consistent with the view that thecritical functions common to all CTDs are encoded in thetwo most 3 0 exons in the gene. Exon 19 of vertebrate 4.1proteins is possibly associated with functions or elements ofCTD structure not required in the invertebrates.

A stable fragment represents the C-terminal domain of4.1R

Leto and Marchesi [22] described a 22/24-kDa fragment of4.1R that was close to the C-terminus. At the time, thesequence of 4.1R had not been determined, so they couldnot relate this fragment to sequence. As exons 18±21 only

encode 16 kDa of polypeptide, we sought to establish ifthe CTD was accurately represented by the 22/24-kDafragment. In initial experiments in which native human4.1R was digested with chymotrypsin, we found that the22/24-kDa fragment was transient, but that a 15-kDafragment, which appeared after formation of the 22/24-kDafragment, was more stable. Figure 5 lanes 1±3 showerythrocyte ghosts, an undigested preparation of 4.1, anda representative 4.1R digest. Note that the FERM domain(30 kDa), the 16-kDa U2 region, and the 8-kDa SAB are allrepresented. A 15-kDa band is present, but no obvious22/24-kDa band.

To probe this further, we took advantage of a recombi-nant construct of mouse 4.1R that contains the SAB domain(exon 16 and the first half of exon17), U3 (conventionallythe second half of exon 17) plus the CTD (conventionallyexons 18±21). This is much easier to prepare than native

Fig. 3. Multiple sequence alignment of 4.1

CTDs. Peptide sequences containing the 4.1

CTD were identified by psi-blast analysis of

the NCBI protein database. Sequences

identified in this way were aligned using

GCG pileup, and the alignment was refined

using lineup. The species of origin are

indicated as h (human), m (mouse), r (rat),

b (cow), x (Xenopus), dm (D. melanogaster),

ce (C. elegans). Sequences shown start and

end at the following residues in the original

database entries. h41R.pep (NCBI proteins

database entry pir:MMHUE4), residues

700±850; ce41.pep, residues 4516±4667

(accession number T20772); m41B.pep,

residues 773±929 (accession number

NP_038841.1); m41N.pep, residues 720±879

(accession number NP_038538.1); h41G.pep,

residues 853±1005 (accession number

NP_001422.1); m41R.pep, residues 708±858

(accession number P48193); dm41.pep,

residues 551±703 (accession number

AAF57593.1); h41N.pep, residues 720±879

(accession number CAC09920.1; KIAA0338);

r41B.pep, residues 807±963 (accession

number BAA90775.1); b41R.pep, residues

467±617 (accession number AAF61703.1);

r41N.pep, residues 720±879 (accession

number NP_067713.1); x41.pep, residues

652±801 (accession number P11434);

h41B.pep, residues 959±1115 (accession

number BAA76831.1); m41G.pep, residues

836±988 (accession number AAC40083.1).

This alignment was submitted to jpred for

prediction of secondary structure [25]:

predicted consensus b strands are shown as

arrows; predicted a helix is shown as a

cylinder. A Pro residue that is a substrate for

prolyl-peptidyl isomerase in 4.1G [33] is

indicated by an asterisk: note that this is

conserved in all 4.1 proteins.

3712 C. Scott et al. (Eur. J. Biochem. 268) q FEBS 2001

4.1, and has the added advantage that no 16-kDa band (thecomplete U2 region) should be formed on chymotrypticdigestion. This fragment retains the spectrin±actin bindingactivity of natural 4.1R [27], and reacts with antibodiesboth to the SAB and to a peptide encoded by exon 21(Fig. 5, lanes 4±6). Chymotryptic digestion of this con-struct resulted in the rapid formation of 24-kDa and 8-kDafragments (Fig. 5, lanes 7±9). The 8-kDa fragmentrepresented the SAB as it was immunoreactive with anti-bodies that recognize this domain (lanes 13±15). Neitherthe 24-kDa fragment nor the 8-kDa fragment were stable.Immunoreactivity for the SAB was rapidly lost (lanes 14,15). A 15-kDa fragment that was formed after the firstappearance of the 24-kDa fragment retained immunoreac-tivity with antibodies to a peptide encoded by exon 21(lanes 10±12). Thus the 24-kDa fragment is rapidlyprocessed to a 15-kDa fragment which is extremely stableby comparison: 18-h digests of the recombinant constructretained the 15-kDa fragment (Fig. 5, lane 16). Such insen-sitivity to a comparatively nonspecific protease is com-monly associated with a folded structure [34]. The totalyield of 15-kDa fragment in an 18-h digest was between45% and 60% of the theoretical maximum in differentexperiments.

An 18-h chymotryptic digest of the 4.1R construct wassubjected to analytical gel filtration (data not shown). The15-kDa fragment was eluted at Kav 0.333. By comparisonwith standard proteins, this corresponds to a relativemolecular mass of 18 kDa, indicating that the fragment ismonomeric.

Identification of residues encoding the chymotrypsin-insensitive 4.1R CTD

To determine the origin of the 15-kDa band, it was sub-jected to both N-terminal amino-acid sequencing and MS.

The N-terminal sequence was RTLNI, i.e. residues 709±713 in SwissProt P48193 (cleavage at Phe708). The digestwas also separated by RP-HPLC (Fig. 6). The major peakby both UV absorbance and ion flux when the HPLC eluatewas analysed by electrospray MS was dominated by aspecies of mass 17199.3. This is a good fit with thetheoretical mass (17199.1) of residues from 709 to theC-terminus of our construct (see Materials and methods). Alower-abundance species in this HPLC peak at 11555.1indicated a degree of cleavage after Tyr761. Interestingly,only trace amounts of a fragment at mass 8045, the intactSAB, were found in the HPLC profile (one of three peptidesin the 40.55 min UV peak). This was in agreement withimmunoblot analysis in which the SAB appeared to bedegraded by chymotrypsin whereas the CTD was retained.Several fragments detected in the HPLC profile indicatedthat the digestion conditions used resulted in enzymecleavage on the C-terminal side of Leu, in addition to Phe,Trp and Tyr (e.g. among others a peptide of mass 2580.9eluted at 34.87 min, uniquely corresponding to residues661±682 in the SAB, resulting from cleavage after Leu660and Leu682). This is an important observation becausebetween residues 709 and the C-terminus are seven Leuresidues; none of these were detectably cut. Again this isconsistent with residue 709 to the C-terminus forming afolded structure.

The mass of the CTD fragment at 17 199 is extremelyclose to the measured 18 kDa by gel filtration. As themolecular mass standards for gel filtration were allglobular, this is a good indication that the fragment is aglobular monomer.

If the CTD fragment is folded as a globular structure, itshould be possible to obtain CD spectra indicative of this.Attempts to obtain the fragment at high enough concentra-tion for CD were not successful. At concentrations above20 mg´mL21, the fragment aggregated and precipitated, and

Fig. 4. The vertebrate-specific region of 4.1

CTD sequences arises from use of an

additional exon. Comparison of genomic

structures encoding CTD regions between

m4.1R, h4.1 N, h4.1G and dm41 and ce41

indicates that the additional amino-acid

sequence present in vertebrate 4.1 CTD is

encoded by an exon not used in the fly.

Exons were assigned by comparison of both

amino-acid and cDNA sequences with

genomic DNA sequences using tblastn and

blastn, respectively. Alignments of amino-

acid sequences are shown, together with

different colours for sequence encoded by

each exon as follows: green, exon 18; blue,

exon 19; pink, exon 20; black, exon 21. The

additional sequence in mammalian 4.1

proteins is encoded by one exon, equivalent to

exon 19 in mouse 4.1R (highlighted in blue).

The D. melanogaster sequence resumes at the

following exon which starts immediately after

the inserted sequence.

q FEBS 2001 Protein 4.1 C-terminal domain (Eur. J. Biochem. 268) 3713

we were unable to obtain useful spectra from the dilutematerial obtainable.

D I S C U S S I O N

It is more than a decade and a half since a protease-resistantfragment was first described that derived from theC-terminal region of 4.1R [22]. Since then, the idea of aC-terminal domain has come to describe a conserved regionof sequence present at the C-terminus of all 4.1 proteins(see Fig. 2). This region appears to provide a ready meansfor defining 4.1 proteins in all metazoans using psi-blastanalysis (Figs 2 and 3). The only other domain present inall 4.1 proteins is the FERM domain; however, differentialsplicing can delete the FERM domain from 4.1 mRNA [35].In analysing novel invertebrate cDNAs, it will be importantto have clear definitions of the CTD to identify truerepresentatives of the 4.1 family.

The alignments we present in this paper (Figs 3 and 4)give the first comprehensive analysis of CTD sequencesthat includes two invertebrates and takes advantage ofknowing the exon structures. Our alignment of themammalian CTDs gives an interpretation that differs insome details from that presented by others. In particular,Parra et al. [10] interpreted the N-terminus of 4.1 N CTD tobe the pair of prolines at residues 723/724 in mouse 4.1 N.Our alignment does not agree with this. Our alignmentachieves a higher degree of sequence identity in theN-terminus of the domain between 4.1 N and 4.1R. Parraet al. [10] aligned their sequences on the basis of a pair ofPro residues encoded by the 5 0 end of exon 18 in 4.1R. Aswe indicate in Fig. 4, this is the case for both 4.1R and

4.1G, but the beginning of the equivalent exon in 4.1 Ndoes not encode such a pair of Pro residues. There istherefore no compulsion to assign the start of the domainin this way. On the other hand, this does highlight thehigh degree of sequence variation in the N-terminus of theCTD. Whereas the region encoded by the two 3 0 exons isvery conserved between invertebrates and vertebrates, theremainder of the CTD varies substantially. The sequencealignment we show in Fig. 3 is optimized for sequenceidentities, but it will require high-resolution structures ofCTDs to confirm an alignment that only has 12 residues outof the first 50 identical between human and invertebrate 4.1proteins. It may prove to be the case that the conservedC-terminal portion of the CTD forms an independentfolding structure in its own right, and that the N-terminalportions show isotype-specific folding variation. The strikingnature of the conservation the most C-terminal portion isillustrated in the observation that a Pro that interacts withFKBP13 is conserved from worm to man (Fig. 3).

The 4.1 proteins appear to have gained both structure andfunction during evolution. The SAB is a vertebrate adap-tation [11], and, interestingly, is itself subject to differentialsplicing, such that 4.1R only `gains' high-affinity spectrin±actin binding during late stages of erythrocyte differentia-tion (reviewed in [36]). The evolutionary gain of the SABappears to be paralleled by the gain of exon 19 in the CTD.In the protein sequence alignment of the CTDs (Fig. 3), alarge gap is inserted in the invertebrate sequences. Thesequence that fills this gap in the vertebrates is provided byexon 19 in 4.1R, and equivalent exons in 4.1 N and 4.1G.An obvious conjecture that runs from this is that exon 19encodes an activity not required in invertebrates. At present,

Fig. 5. The CTD is a chymotrypsin-resistant 15-kDa fragment. Analysis by SDS/PAGE of chymotryptic digests of either natural human

erythrocyte 4.1R (lanes 1±3; 5±15% gel; Coomassie stain) or recombinant mouse 4.1R fragment (lanes 4±16; 10±20% gel). Lane 1, human

erythrocyte ghosts; lane 2, human erythrocyte 4.1R; lane 3, chymotryptic digest of 4.1R (30 min on ice; ratio of chymotrypsin to 4.1, 1 : 200, w/w).

Note that the expected 30-kDa (FERM domain), 16-kDa (U2 region), and 8-kDa (SAB domain) bands are detectable, but no 24-kDa band is found.

A 15-kDa band is visible below the 16-kDa band. The lack of the 24-kDa band possibly indicates that it is further proteolysed to a 15-kDa band.

This suggestion is supported by more detailed analysis of the recombinant mouse 4.1R construct. Undigested mouse 4.1R construct is shown in

lanes 4±6: lane 4, stained with Coomassie blue; lane 5, immunobloted for anti-(4.1 exon 21 epitope); lane 6, immunoblotted for SAB domain. The

construct was digested with chymotrypsin (ratio 1 : 100, w/w) for 30, 60 and 90 min at 0 8C: lanes 7±8, Coomassie blue; lanes 10±12, anti-(exon

21); lanes 13±15, anti-SAB. Note that a 24-kDa and 15-kDa band are formed that retain the exon 21 epitope. The15-kDa band does not contain the

SAB epitope, which is found as expected in an 8-kDa fragment. The15-kDa fragment is relatively stable to chymotrypsin (lanes 11, 12) compared

with the 8-kDa fragment (lanes 14, 15). An 18-h digestion (ratio 1 : 375, w/w; lane 10) leaves the 15-kDa band with no 24-kDa band detectable.

Some additional lower-molecular-mass bands are also visible.

3714 C. Scott et al. (Eur. J. Biochem. 268) q FEBS 2001

Fig. 6. MS characterization of the C-terminal fragment from 4.1R. An overnight digest of recombinant m4.1R SAB±CTD construct was

analysed by HPLC/electrospray MS. The digest was separated on an RP-HPLC C18 column, and eluted peptides were injected directly into an

electrospray mass spectrometer. (A) Peptide elution profile as total ion abundance (upper) in the spectrometer and analog UV absorbance at 214 nm

(lower). Ions appearing at 42.4±43.0 min were analysed and are shown in (B). The major peak at 17 199.3 corresponds to sequence commencing at

residue 709 in SwissProt entry P48193 and extending to the C-terminus of the His-tagged construct.

q FEBS 2001 Protein 4.1 C-terminal domain (Eur. J. Biochem. 268) 3715

however, it is not possible to speculate adequately on whatthis might be. The interactions of 4.1 with most of theprotein ligands that have been studied in greatest detail sofar [16,20,21,37] do not indicate selective interactions ofsequence encoded by this exon.

In their assignment of domains to the first 4.1R sequence,Conboy et al. [38] suggested that the CTD comprisedthe most C-terminal 117 residues, and commented thatthis would be surprisingly small to be represented by a22/24-kDa fragment. Since then, the definition of the CTDhas been revised to include all homologous sequence at theC-termini of vertebrate 4.1 proteins. Our data support thisdefinition, with the first experimental evidence that thehomologous region forms a folded structure resistant tochymotrypsin. A fragment that runs on SDS/polyacryla-mide gels at 15 kDa, with a true mass of 17 199, is resistantto chymotrypsin treatment on ice overnight (Figs 5 and 6),and this derives from residue 709 to the C-terminus, i.e.encompassing the sequence encoded by exons 18±21, plusa few resides N-terminal to this. We did note someadditional cleavage between Tyr761 and Glu762, yielding aC-terminal fragment of 11555.1 (Fig. 6). Interestingly, thisfragment represents the most conserved region of all. Nocleavage products from this were noted, indicating thatperhaps this represents a subsidiary folded region.

The 17 199 mass fragment has a relative molecular masson gel filtration of 18 kDa. As the column was calibratedwith a series of globular proteins, the close agreementbetween the true and observed masses probably indicatesthat this fragment represents a globular structure. We wereunable to obtain high enough concentrations of thefragment for CD analysis, but secondary-structure predic-tions indicate a series of b strands (five or six), with an ahelix before the most C-terminal of these. The predictionmethod used here was jpred 2, which uses several high-quality modern methods of prediction on the overallsequence alignment [25]. In Fig. 3 the consensus predictionfrom these methods is shown. The additional, vertebrate-specific, sequence is predicted to contribute b strand(s) tothe final structure, which evidently cannot be present in theinvertebrates. It seems unlikely that this structure forms aloop out from a core folded structure, because we obtainedno evidence that this region is especially protease sensitive(although there was some cleavage at Tyr761, which marksthe start of the vertebrate-specific sequence). Other globularstructures with this pattern of secondary structure includepertussis toxin subunit D (PDB entry 1PRT), the C-terminaldimerization domain of FAD/NAD-linked reductases (e.g.1FCD), and members of the pleckstrin homology (PH)domain superfamily (e.g. 1PLS), all of which representligand-binding structures.

A further intriguing aspect of CTD structure is that eachof the exons encoding 4.1R CTD is known to be differ-entially spliced [36]. Examination of the EST databaseindicates that, in all mammalian CTDs, at least one exoncan be differentially spliced (not shown). How this relatesto the folding of the structure and its function will be animportant topic for future investigation.

A C K N O W L E D G E M E N T S

This work was supported by a grant to A. J. B. from the Biotechnology

and Biological Sciences Research Council. C. S. held a BBSRC

Special Studentship. We thank Kevin Howland and Judy Hardy

(Wellcome Trust funded Protein Science Facility) for their help with

MS and protein sequencing. We thank Dr Jennifer Pinder for

constructive comments on the manuscript.

R E F E R E N C E S

1. Gascard, P. & Mohandas, N. (2000) New insights into functions of

erythroid proteins in nonerythroid cells. Curr. Opin. Hematol. 7,

123±129.

2. Hoover, K.B. & Bryant, P.J. (2000) The genetics of the protein 4.1

family: organizers of the membrane and cytoskeleton. Curr. Opin.

Cell Biol. 12, 229±234.

3. Shi, Z.T., Afzal, V., Coller, B., Patel, D., Chasis, J.A., Parra, M.,

Lee, G., Paszty, C., Stevens, M., Walensky, L., Peters, L.L.,

Mohandas, N., Rubin, E. & Conboy, J.G. (1999) Protein 4.1R-

deficient mice are viable but have erythroid membrane skeleton

abnormalities. J. Clin. Invest. 103, 331±340.

4. Ohanian, V., Wolfe, L.C., John, K.M., Pinder, J.C., Lux, S.E. &

Gratzer, W.B. (1984) Analysis of the ternary interaction of the

red cell membrane skeletal proteins spectrin, actin, and 4.1.

Biochemistry 23, 4416±4420.

5. Ungewickell, E., Bennett, P.M., Calvert, R., Ohanian, V. &

Gratzer, W.B. (1979) In vitro formation of a complex between

cytoskeleton proteins of the human erythrocyte. Nature (London)

280, 811±814.

6. Marfatia, S.M., Lue, R.A., Branton, D. & Chishti, A.H. (1994) In

vitro binding-studies suggest a membrane-associated complex

between erythroid P55, protein-4.1, and glycophorin-C. J. Biol.

Chem. 269, 8631±8634.

7. Pinder, J.C., Chung, A., Reid, M.E. & Gratzer, W.B. (1993)

Membrane attachment sites for the membrane cytoskeletal protein

4.1 of the red blood cell. Blood 82, 3482±3488.

8. Hemming, N.J., Anstee, D.J., Staricoff, M.A., Tanner, M.J.A. &

Mohandas, N. (1995) Identification of the membrane attachment

sites for protein-4.1 in the human erythrocyte. J. Biol. Chem. 270,

5360±5366.

9. Peters, L.L., Weier, H.U., Walensky, L.D., Snyder, S.H., Parra, M.,

Mohandas, N. & Conboy, J.G. (1998) Four paralogous protein 4.1

genes map to distinct chromosomes in mouse and human.

Genomics 54, 348±350.

10. Parra, M., Gascard, P., Walensky, L.D., Gimm, J.A., Blackshaw, S.,

Chan, N., Takakuwa, Y., Berger, T., Lee, G., Chasis, J.A., Snyder,

S.H., Mohandas, N. & Conboy, J.G. (2000) Molecular and

functional characterization of protein 4.1B, a novel member of

the protein 4.1 family with high level, focal expression in brain.

J. Biol. Chem. 275, 3247±3255.

11. Fehon, R.G., Dawson, I.A. & Artavanistsakonas, S. (1994) A

Drosophila homolog of membrane-skeleton protein-4.1 is associ-

ated with septate junctions and is encoded by the coracle gene.

Development 120, 545±557.

12. Chishti, A.H., Kim, A.C., Marfatia, S.M., Lutchman, M., Hanspal,

M., Jindal, H., Liu, S.C., Low, P.S., Rouleau, G.A., Mohandas, N.,

Chasis, J.A., Conboy, J.G., Gascard, P., Takakuwa, Y., Huang,

S.C., Benz, E.J., Bretscher, A., Fehon, R.G., Gusella, A.F., Ramesh,

V., Solomon, F., Marchesi, V.T., Tsukita, S., Arpin, M., Louvard, D.,

Tonks, N.K., Anderson, J.M., Fanning, A.S., Bryant, P.J., Woods,

D.F. & Hoover, K.B. (1998) The FERM domain: a unique module

involved in the linkage of cytoplasmic proteins to the membrane.

Trends Biochem. Sci. 23, 281±282.

13. Han, B.G., Nunomura, W., Takakuwa, Y., Mohandas, N. & Jap,

B.K. (2000) Protein 4.1R core domain structure and insights

into regulation of cytoskeletal organization. Nat. Struct. Biol. 7,

871±875.

14. Ward, R.E., Lamb, R.S. & Fehon, R.G. (1998) A conserved

functional domain of Drosophila coracle is required for localization

3716 C. Scott et al. (Eur. J. Biochem. 268) q FEBS 2001

at the septate junction and has membrane-organizing activity.

J. Cell Biol. 140, 1463±1473.

15. Lamb, R.S., Ward, R.E., Schweizer, L. & Fehon, R.G. (1998)

Drosophila coracle, a member of the protein 4.1 superfamily, has

essential structural functions in the septate junctions and

developmental functions in embryonic and adult epithelial cells.

Mol. Biol. Cell. 9, 3505±3519.

16. Mattagajasingh, S.N., Huang, S.C., Hartenstein, J.S. & Benz, E.J.

Jr (2000) Characterization of the interaction between protein 4.1R

and ZO-2. A possible link between the tight junction and the actin

cytoskeleton. J. Biol. Chem. 275, 30573±30585.

17. Decarcer, G., Lallena, W.J. & Correas, I. (1995) Protein-4.1 is a

component of the nuclear matrix of mammalian cells. Biochem. J.

312, 871±877.

18. Mattagajasingh, S.N., Huang, S.C. & Benz, E.J. (1996) Direct

evidence for a nuclear-localization and function of protein-4.1 in

the nucleus: in vivo association with mitotic apparatus proteins.

Blood 88, 1094±1094.

19. Lallena, M.J., Martinez, C., Valcarcel, J. & Correas, I. (1998)

Functional association of nuclear protein 4.1 with pre-mRNA

splicing factors. J. Cell Sci. 111, 1963±1971.

20. Ye, K., Compton, D.A., Lai, M.M., Walensky, L.D. & Snyder, S.H.

(1999) Protein 4.1N binding to nuclear mitotic apparatus protein in

PC12 cells mediates the antiproliferative actions of nerve growth

factor. J. Neurosci. 19, 10747±10756.

21. Mattagajasingh, S.N., Huang, S.C., Hartenstein, J.S., Snyder, M.,

Marchesi, V.T. & Benz, E.J. (1999) A nonerythroid isoform of

protein 4.1R interacts with the nuclear mitotic apparatus (NuMA)

protein. J. Cell Biol. 145, 29±43.

22. Leto, T.L. & Marchesi, V.T. (1984) A structural model of human

erythrocyte protein-4.1. J. Biol. Chem. 259, 4603±4608.

23. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z.,

Miller, W. & Lipman, D.J. (1997) Gapped BLAST and PSI-

BLAST: a new generation of protein database search programs.

Nucleic Acids Res. 25, 3389±3402.

24. GeneticsComputerGroup (1994) Program Manual for the Wisconsin

Package, Version 8. Genetics Computer Group, Madison, WI,

USA.

25. Cuff, J.A. & Barton, G.J. (2000) Application of multiple sequence

alignment profiles to improve protein secondary structure predic-

tion. Proteins 40, 502±511.

26. Tyler, J.M., Hargreaves, D.R. & Branton, D. (1979) Purification of

two spectrin-binding proteins. Proc. Natl Acad. Sci. USA 76,

5192±5196.

27. Scott, C., Keating, L., Bellamy, M. & Baines, A.J. (2001) Protein

4.1 in forebrain postsynaptic density preparations. Enrichment of

4.1 gene products and detection of 4.1R binding proteins. Eur. J.

Biochem. 268, 1084±1094.

28. Correas, I., Leto, T.L., Speicher, D.W. & Marchesi, V.T. (1986)

Identification of the functional site of erythrocyte protein 4.1

involved in spectrin±actin associations. J. Biol. Chem. 261,

3310±3315.

29. Correas, I., Speicher, D.W. & Marchesi, V.T. (1986) Structure of

the spectrin-actin binding-site of erythrocyte protein 4.1. J. Biol.

Chem. 261, 3362±3366.

30. Laemmli, U.K. (1970) Cleavage of structural proteins during the

assembly of the head of bacteriophage T4. Nature (London) 227,

680±685.

31. Matsudaira, P. (1989) A Practical Guide for Protein and Peptide

Purification for Microsequencing, 1st edn. Academic Press, San

Diego, CA.

32. Huang, J.P., Tang, C.J., Kou, G.H., Marchesi, V.T., Benz, E. Jr &

Tang, T.K. (1993) Genomic structure of the locus encoding

protein 4.1. Structural basis for complex combinational patterns

of tissue-specific alternative RNA splicing. J. Biol. Chem. 268,

3758±3766.

33. Walensky, L.D., Gascard, P., Fields, M.E., Blackshaw, S., Conboy,

J.G., Mohandas, N. & Snyder, S.H. (1998) The 13-kD FK506

binding protein, FKBP13, interacts with a novel homologue of the

erythrocyte membrane cytoskeletal protein 4.1. J. Cell Biol. 141,

143±153.

34. Creighton, T.E. (1993) Proteins: Structures and Molecular

Properties, pp. 261±328. W.H. Freeman, New York, USA.

35. Gascard, P., Lee, G., Coulombel, L., Auffray, I., Lum, M., Parra,

M., Conboy, J.G., Mohandas, N. & Chasis, J.A. (1998) Charac-

terization of multiple isoforms of protein 4.1R expressed during

erythroid terminal differentiation. Blood 92, 4404±4414.

36. Conboy, J. (1999) The role of alternative pre-mRNA splicing in

regulating the structure and function of skeletal protein 4.1. Proc.

Soc. Exp. Biol. Med. 220, 73±78.

37. Hou, C.L., Tang, C., Roffler, S.R. & Tang, T.K. (2000) Protein

4.1R binding to eIF3-p44 suggests an interaction between the

cytoskeletal network and the translation apparatus. Blood 96,

747±753.

38. Conboy, J., Kan, Y.W., Shohet, S.B. & Mohandas, N. (1986)

Molecular-cloning of protein 4.1, a major structural element of the

human-erythrocyte membrane skeleton. Proc. Natl Acad. Sci. USA

83, 9512±9516.

q FEBS 2001 Protein 4.1 C-terminal domain (Eur. J. Biochem. 268) 3717