resurrection of an urbilaterian u1a/u2b″/snf protein · those that form a common structural...

17
Resurrection of an Urbilaterian U1A/U2B/SNF Protein Sandra G. Williams 1 , Michael J. Harms 2 and Kathleen B. Hall 1 1 - Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63108, USA 2 - Center for Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA Correspondence to Kathleen B. Hall: [email protected] http://dx.doi.org/10.1016/j.jmb.2013.05.031 Edited by M. F. Summers Abstract The U1A/U2B/SNF family of proteins found in the U1 and U2 spliceosomal small nuclear ribonucleo- proteins is highly conserved. In spite of the high degree of sequence and structural conservation, modern members of this protein family have unique RNA binding properties. These differences have necessarily resulted from evolutionary processes, and therefore, we reconstructed the protein phy- logeny in order to understand how and when divergence occurred and how protein function has been modulated. Contrary to the conventional un- derstanding of an ancient human U1A/U2Bgene duplication, we show that the last common ancestor of bilaterians contained a single ancestral protein (URB). The gene for URB was synthesized, the protein was overexpressed and purified, and we assessed RNA binding to modern snRNA sequences. We find that URB binds human and Drosophila U1 snRNA SLII and U2 snRNA SLIV with higher affinity than do modern homologs, suggesting that both Drosophila SNF and human U1A/U2Bhave evolved into weaker binders of one RNA or both RNAs. © 2013 Elsevier Ltd. All rights reserved. Introduction The eukaryotic spliceosome is a large, complex, and highly dynamic macromolecular machine that splices pre-mRNA. Among its many associated proteins is the U1A/U2B/SNF family, which are components of the U1 and U2 small nuclear ribonucleoproteins (snRNPs). These three proteins use RNA recognition motifs (RRMs) to recognize their RNA targets, but their N-terminal RRMs Legend: Protein phylogenic trees can now be constructed if the sequence space is sufficient. Mapping a protein's evolution can distinguish functional amino acids from those that form a common structural scaffold. The authors used an Archaeopteryx program, Han M.V. and Zmasek C.M. (2009) phyloXML: XML for evolutionary biology and comparative genomics, BMC Bioinformatics, 10:356. 0022-2836/$ - see front matter © 2013 Elsevier Ltd. All rights reserved. J. Mol. Biol. (2013) 425, 38463862 Featured Arcle

Upload: dokhanh

Post on 28-Apr-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

FeaturedAr�cle

Sandra G. Will

0022-2836/$ - see front m

Resurrection of an UrbilaterianU1A/U2B″/SNF Protein

iams1, Michael J. Harms2

and Kathleen B. Hall 1

1 - Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63108, USA2 - Center for Ecology and Evolution, University of Oregon, Eugene, OR 97403-5289, USA

Correspondence to Kathleen B. Hall: [email protected]://dx.doi.org/10.1016/j.jmb.2013.05.031Edited by M. F. Summers

Abstract

Legend: Protein phylogenic trees can now be constructedif the sequence space is sufficient. Mapping a protein'sevolution can distinguish functional amino acids fromthose that form a common structural scaffold. The authorsused an Archaeopteryx program, Han M.V. and ZmasekC.M. (2009) phyloXML: XML for evolutionary biology andcomparative genomics, BMC Bioinformatics, 10:356.

The U1A/U2B″/SNF family of proteins found in theU1 and U2 spliceosomal small nuclear ribonucleo-proteins is highly conserved. In spite of the highdegree of sequence and structural conservation,modern members of this protein family have uniqueRNA binding properties. These differences havenecessarily resulted from evolutionary processes,and therefore, we reconstructed the protein phy-logeny in order to understand how and whendivergence occurred and how protein function hasbeen modulated. Contrary to the conventional un-derstanding of an ancient human U1A/U2B″ geneduplication, we show that the last commonancestor of bilaterians contained a single ancestralprotein (URB). The gene for URB was synthesized,the protein was overexpressed and purified, andwe assessed RNA binding to modern snRNAsequences. We find that URB binds human andDrosophila U1 snRNA SLII and U2 snRNA SLIVwith higher affinity than do modern homologs,suggesting that both Drosophila SNF and humanU1A/U2B″ have evolved into weaker binders of oneRNA or both RNAs.

© 2013Elsevier Ltd.All rights reserved.

Introduction

The eukaryotic spliceosome is a large, complex,and highly dynamic macromolecular machine thatsplices pre-mRNA. Among its many associated

atter © 2013 Elsevier Ltd. All rights reserve

proteins is the U1A/U2B″/SNF family, which arecomponents of the U1 and U2 small nuclearribonucleoproteins (snRNPs). These three proteinsuse RNA recognition motifs (RRMs) to recognizetheir RNA targets, but their N-terminal RRMs

d. J. Mol. Biol. (2013) 425, 3846–3862

Page 2: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

3847Reconstruction of an Ancestral RRM

(RRM1) are distinguished from most RRMs by theirextremely high affinity and the exquisite specificity oftheir RNA binding. The RRMs of these three proteinshave high sequence identity, as do the two RNAstem–loops that they recognize, consistent with ashared phylogenetic lineage of the proteins.One copy of U1A, U2B″, or SNF is present in the

U1 and U2 snRNPs, where the protein binds to aspecific stem–loop. In spite of their high sequenceidentity and structural similarity, Drosophila SNF,human U1A, and human U2B″ have distinct RNAbinding properties. Drosophila SNF binds both U1snRNA SLII and U2 snRNA SLIV [1], whereas inhumans, U1A binds exclusively to U1 SLII, and innuclear extracts, U2B″ localizes exclusively to theU2 snRNP, where it binds SLIV [2]. In vitro, thesedifferences are manifested in very high affinity andspecificity of U1A for SLII, modest affinity and nospecificity between SLII and SLIV for U2B″, and anintermediate specificity for SNF [3–5]. Therefore,comparing the modern RRMs, it is clear that eachhas unique RNA binding properties, but themolecular basis for these differences has beendifficult to explain from structural considerations ofthe proteins. Placing the observed functionaldiversity within its evolutionary context promisesto provide new insight into modern protein function:residues responsible for altered function can bedetermined, allowing insights into how these muta-tions altered the protein to result in changes to RNAbinding.Pauling and Zuckerkandl first explored the possi-

bility of studying extinct proteins 50 years ago [6], butit has only been recently that advances in phyloge-netic analysis and the explosion of availablesequence data have made these “molecular resto-ration studies” feasible [7–10]. The power of thisapproach is that it provides the evolutionary contextfor studying proteins, thus taking advantage ofNature's long-running experiments. This obviatesmany of the difficulties of traditional comparativemutagenesis studies, which include sifting throughthe large number of functionally irrelevant back-ground mutations, identifying interacting mutations,and contending with lineage dependence of func-tionally relevant mutations [11].We used U1A/U2B″/SNF sequences from broadly

diverse organisms to reconstruct the metazoanprotein phylogeny and resurrect the ancestralprotein of the last common ancestor of humanU1A, human U2B″, and Drosophila SNF. Our goalswere to determine whether the gene duplicationresponsible for subfunctionalization of human pro-teins occurred early or late in metazoan evolutionand how the RNA binding properties of theancestral protein compare with its modern de-scendants. Our new phylogeny revises the currentunderstanding of U1A/U2B″/SNF functional diver-gence: a single protein family member was present

in the last common ancestor of bilaterians, andgene duplications resulting in separate U1A andU2B″ proteins are relatively recent. Like its moderncounterparts, the resurrected URB protein (whichcorresponds to the last common ancestor ofbilaterians) has two RRMs separated by a flexiblelinker. URB binds modern U1 snRNA SLII andU2 snRNA SLIV with high affinity, and its specific-ities for RNAs most closely resemble those ofDrosophila SNF. URB's dynamic properties insolution also resemble those of SNF and differfrom human U1A, suggesting that dynamics havebeen evolutionarily conserved and may be impli-cated in protein function.

Results

A new family phylogeny

Because humans, potatoes, and yeast each codefor separate U1A and U2B″ proteins, prior charac-terization of U1A/U2B″ molecular evolution positedon the basis of parsimony that U1A and U2B″paralogs emerged after a single, ancient geneduplication prior to the divergence of plants, fungi,and metazoans [1]. We tested whether metazoanproteins were consistent with this model usingmodern phylogenetic methods and sequences frombroadly diverse metazoans. A schematic of theancestral reconstruction approach we employed isshown in Supplementary Fig. 1. Putative U1A, U2B″,and SNF proteins from 160 diverse metazoanorganisms were obtained from BLAST searchesand subsequently aligned. PhyML [12] was used toreconstruct the protein phylogeny, and a resultingcladogram of the maximum likelihood (ML) tree ispresented in Fig. 1a. The striking result of thisanalysis is that the last common ancestor of allbilaterians had a single U1A/U2B″ protein, as domost modern metazoans.The reconstructed phylogeny reveals that gene

duplications within the bilaterian lineage haveoccurred at least three times: once in the evolutionof jawed vertebrates, once in the lophotrochozoanlineage, and once in the nematode lineage. Theimplication of our reconstruction is that, if geneduplication results in subfunctionalization of RNAbinding and localization to distinct snRNPs, thisoccurred late in the proteins' molecular evolution.Reconstruction of the full tree (Fig. 1a) results inpoor resolution of the deuterostome phylogeny,prompting a separate analysis of the deuterostomesequences. This reconstruction used an alignmentcontaining more residues from the interdomainlinker (see Materials and Methods). ML andmaximum parsimony (MP) reconstructions of thisdeuterostome phylogeny are consistent with asingle gene duplication in an ancestor of jawed

Page 3: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

Fig. 1. Phylogeny of the U1A/U2B″/SNF protein family indicates that gene duplications for separate U1A and U2B″proteins occurred late in metazoan evolution. (a) The phylogeny was inferred by MLmethods, and a reduced version of theresulting cladogram is shown. Proteins are labeled SNF if the organisms contain a single U1A/U2B″/SNF protein, and theresurrected Urbilaterian node is indicated. The number of sequences in each group is given in brackets. Branches in thedeuterostome lineage with aLRT scores of b4.6 were collapsed to a polytomy. (b) A separate ML tree of deuterostomeswas inferred to resolve the phylogeny of the deuterostome lineage. Branch supports correspond to the aLRT statistic. Fulltrees with branch lengths and supports are available in Supplementary Fig. 2.

3848 Reconstruction of an Ancestral RRM

vertebrates (Fig. 1b and Supplementary Fig. 2) thatresulted in separate U1A and U2B″ proteins inthese animals.U1A/U2B″/SNF ancestral protein sequences were

subsequently inferred from modern sequence align-ments and the reconstructed phylogeny usingCodeML [13]. Sequence alignments of the recon-structed Urbilaterian SNF (indicated in Fig. 1a andwhich we call URB) and modern homologs areshown in Fig. 2a. Also shown in Fig. 2b are theresidues that have diverged between URB and thehuman proteins (left panel) and URB and DrosophilaSNF (right panel), plotted on the RRM structure. Withfew exceptions, amino acids within the RRMs wereunambiguously predicted (Supplementary Fig. 3 andSupplementary Table 1), and the alignment illus-trates the high sequence conservation of the RRMs.Particularly striking is the conservation of RRM2, for

Fig. 2. The RRMs of human U1A, human U2B″, and Drosotheir Urbilaterian ancestor. (a) Alignments of the ML ancestral pshown. Residues are highlighted in red if shared with human U1italicized, and secondary structure elements are indicated aboplotted on the RRM structure. On the left, residues that have dblue, residues that have diverged between URB and U2B″ but nboth U1A and U2B″ are colored black. On the right, residuesResidues conserved between all proteins are shown in purple

which there is no known biological or biochemicalfunction.

A resurrected Urbilaterian ancestral proteinresembles Drosophila SNF

RNA stem–loops and URB binding

In order to study the RNA binding properties of theancestral RRMs, it was important to determineappropriate RNA sequences for binding studies.Modern U1 stem–loop II and U2 stem–loop IVsequences from diverse metazoans were obtainedand aligned. Consensus sequences for the loop andloop-closing base pair, which are known to beimportant for protein recognition, are shown inFig. 3 (Supplementary Fig. 4). Features of the RNAstem–loops that have previously been identified as

phila SNF have remained highly conserved with those ofrotein (URB) and SNF with both human U1A and U2B″ areA only and in blue if shared with U2B″ only. RNPmotifs areve the alignment. (b) Regions of sequence divergence areiverged between URB and U1A but not U2B″ are coloredot U1A are colored red, and residues that have diverged inthat have diverged in Drosophila SNF are colored black..

Page 4: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

Fig. 2 (legend on previous page)

3849Reconstruction of an Ancestral RRM

Page 5: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

Fig. 3. RNA sequences for U1 SLII and U2 SLIV arehighly conserved in metazoans, as shown in the consen-sus sequences for these RNAs. Input sequences andaccessions (Supplementary Table 6), as well as consen-sus sequences broken up by clade (Supplementary Fig.4), are available in the supplementary material.

3850 Reconstruction of an Ancestral RRM

important for recognition by human U1A and U2B″[2,14,15] are highly conserved across phyla. Inparticular, the AUUGCA sequence at the 5′ side ofthe loops is almost invariant, and loop length istypically 10–11 nucleotides. Due to the RNAconservation across phyla, it is likely that thesefeatures were also shared by the Urbilateriancounterparts of these snRNA stem–loops.The gene forURBwas synthesized (GenScript) and

the full-length (FL) protein and each RRM wereoverexpressed and purified. Representative modernU1 SLII and U2 SLIV sequences were used tocharacterize URB binding (Fig. 4 and Table 1). TheRNA binding affinities and specificities of URB areunique, although its specificity is most similar to that ofDrosophila SNF (Table 2). URB and SNF share amarked preference for SLII over SLIV, but URB bindswith higher affinity to both RNAs. Human U2B″ alsobinds to both SLII and SLIV but does not discriminatebetween the two RNAs, and its affinity is substantiallyweaker than that of URB or SNF [4]. URB and U1Abind with equally high affinity to SLII (humanU1A doesnot detectably bind SLIV). We conclude that humanU1A and U2B″ have evolved away from URB tosubfunctionalize RNA binding through radical changesin their RNA specificity while many of URB's RNAbinding preferences are retained by Drosophila SNF.It is important to evaluate whether the experimental

results obtained from the resurrected URB protein arerobust to uncertainties in the reconstruction. Althoughthe ML sequence is the sequence with the highestposterior probability for a given node, position 84could be plausibly reconstructed as either Ala or Ser(Supplementary Fig. 3). This was the only site inRRM1 that had a significant alternative reconstruc-tion. We introduced the A84S substitution into URBand found that binding to the RNA targets tested wasidentical with that of the ML RRM1 (Table 1).The comparison of RNA binding affinity of U1A, U2B

″, SNF, and URB illustrates an important evolutionaryadaptation. U1A RRM1 binds only SLII with subnano-molar affinity in physiological solution, while the other

proteins bind both SLII and SLIV. Our originalhypothesis, based on our measured affinity of SNFfor RNAs, was that RNA binding affinity would becompromised when an RRM bound two (slightly)different RNA targets. However, a comparison of URBbinding with U1A shows that the RRM's ability to bindboth SLII and SLIV does not need to compromise theprotein's intrinsic high affinity for SLII. Rather, it is clearthat, in both Drosophila and humans, the RRMs haveevolved into weaker binders of oneRNA or both RNAs.Only the N-terminal RRM1 of URB binds RNA; URB

C-terminal RRM2 alone does not detectably bind toSLII, SLIV, or a 25-nucleotide random pool RNA atconcentrations as high as 10 μM (data not shown).The interdomain linker sequence and length is poorlyconserved in the protein family, but it does containregions with high positive charge density, typicallyfrom multiple lysine residues that could interact withthe RNA backbone contributing to binding electrostat-ics. For URB, the difference in RNA binding affinitybetween the FL protein and RRM1 is modest(Table 3), indicating that RRM1 is the predominantsource of RNA binding affinity. In contrast, FL SNFprotein has a higher affinity for SLII RNA than doesSNF RRM1 alone. In 250 mM KCl and 1 mM MgCl2(22 °C), the RNA binding affinity of SNF RRM1 alonefor SLII is weaker than the affinity of the FL protein byΔΔG°(binding) = −3.3 ± 0.4 kcal/mol (Table 3). At lowersalt concentrations, FL SNF also binds with higheraffinity to SLIV than does RRM1 alone. Since SNFRRM2 does not detectably bind RNA, its linker mustcontribute to RNA binding affinity [5].Because the salt dependence of RNA binding

indicates the contribution of electrostatics to theassociation, we measured the binding of URB RRM1and FL protein to SLII and SLIV as a function of KClconcentration and compared its properties to thoseof SNF (Fig. 5). A comparison of the net ionsreleased upon RNA binding (Table 4) shows thatURB/SNF RRM1 binding to SLII releases 3.2/3.1ions while binding to SLIV releases 3.9/2.7 net ions.In addition, SNF's linker does contribute to bindingaffinity, most likely through nonspecific interactionsbetween lysines and the RNA stem. While URB'slinker has seven lysines near RRM1 (Fig. 2), they areapparently not involved in binding the stem–loops.

Protein structure and dynamics

Given the different RNA binding properties of URB,Drosophila SNF, human U1A, and human U2B″, wewere interested in further investigating differencesbetween the RRMs that could explain differences inbinding. Homology models for URB RRM1 (modeledon existing structures of Drosophila SNF, humanU1A, and human U2B″) are shown in Fig. 6a. Thesemodels predict a structure that is very similar to that ofthe three modern proteins. A similar alignment ofhomology models for URB RRM2 is also shown

Page 6: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

Fig. 4. FL URB binds U1 SLIIwith an affinity equal to that ofhuman U1A and binds U2 SLIVbetter than either human U1A orhuman U2B″. Titrations from nitro-cellulose filter binding experimentsof FL URB (a) and URB RRM1 (b)with SLII (squares), SLIV (circles),or a 25-nucleotide random-mer (tri-angles) are shown. Fits to a simpleLangmuir binding isotherm areshown (purple lines), and simulatedcurves from U1A binding to SLII(red) and U2B″ binding SLII/SLIV(blue) are shown, based on previ-ously reported affinities under sim-ilar binding conditions [4,14].Binding was assessed in 250 mMKCl, 10 mM cacodylate, and 1 mMMgCl2 (pH 7 at 22 °C).

3851Reconstruction of an Ancestral RRM

(Fig. 6b). These models also predict a typical RRMwhose structures are similar, with the exception ofthe loops, which are likely to sample multipleconformations and are difficult to model correctly.Chemical denaturation monitored by circular dichro-

ism shows that URBRRM1has a folding free energy of−5.1 kcal/mol (Fig. 6c),which is intermediate in stabilitybetween that of U1A RRM1 [ΔG°(folding) = −9.4 kcal/mol] [16] and that of SNF RRM1 [ΔG°(folding) =−3.5 kcal/mol] [5]. The notion that stability is anevolutionarily neutral trait as long as proper folding

Table 1. RNA binding of FL URB and URB RRM1

Kobs,FL (M) ΔGFL° (kcal

URB-SLII 4.2 ± 0.4 × 10−10 −12.7 ± 0URB-SLIV 6.9 ± 2.9 × 10−9 −11.0 ± 0URB-N25a ~1 × 10−6 ~−8−8URB A84Sb SLIIURB A84S SLIV

Binding free energies of binding of RRM1 and the FL proteins are showbinding experiments were performed in 250 mM KCl, 10 mM cacodyla

a N25 is a control for nonspecific RNA binding.b URB A84S is the second most probable reconstruction of the RR

can be maintained [17] seems plausible for thisprotein family.

1H/15N heteronuclear single quantum coherence(HSQC) NMR spectra of FL URB, RRM1, and RRM2are overlaid in Fig. 7a. The 1H/15N HSQC amidespectra of the two independent RRMs can clearly beidentified in the context of the FL protein. Assign-ments of the RRM1 cross-peaks are indicated on thespectrum of Supplementary Fig. 5. Amide reso-nances for the individual RRMs are well dispersed,indicating structured, folded domains. Amide

/mol) Kobs,RRM1 (M) ΔGRRM1° (kcal/mol)

.1 1.2 ± 0.2 × 10−9 −12.1 ± 0.1

.3 1.5 ± 0.2 × 10−8 −10.6 ± 0.1N1 × 10−6 N−8

7.9 ± 2.0 × 10−10 −12.3 ± 0.11.8 ± 0.5 × 10−8 −10.5 ± 0.1

n. Data for SNF and U1A binding were previously reported [2]. Allte, and 1 mM MgCl2 (pH 7) at room temperature.

M1 sequence.

Page 7: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

Table 2. Protein family specificity for SLII/SLIV

ΔΔG° (kcal/mol)

URB 1.6 ± 0.3SNF 2.4 ± 0.5U1A N6U2B″ 0.1 ± 0.3

The difference in the binding free energy [ΔΔG° = ΔG°(SLIV) −ΔG°(SLII)] of the FL proteins binding SLII and human SLIV wasassessed with nitrocellulose filter binding experiments in 250 mMKCl, 10 mM sodium cacodylate (pH 7.0), and 1 mM MgCl2, atroom temperature.

3852 Reconstruction of an Ancestral RRM

resonances from the linker region between theRRMs are clustered in the proton dimension around8–8.5 ppm, consistent with a disordered structure.These data indicate that the two RRMs areindependent of each other in solution, as was alsoseen with SNF [18] and U1A (unpublished results).Resonances from RRM1 exhibit a large dynamicrange in both SNF and URB. This is distinct fromRRM2, and it is illustrated in the comparison of peakintensities (Fig. 7b). RRM1 of both URB and SNFshow broad amide resonances, with peak intensitiesthat are, on average, 36% and 42%, respectively, oftheir RRM2 counterparts, suggesting that the reso-nances are in exchange on the chemical shifttimescale. The broad resonances of URB and SNFRRM1 backbones are observed in both the isolateddomain and the FL protein, under various experi-mental conditions [temperatures between 10 and30 °C, and for SNF, high and low KCl, and 1.2 Mglycine betaine (data not shown)]. One-dimensional[15N–1H]TRACT experiments suppress the influ-ence of chemical exchange in determining trans-verse relaxation rates and can be used to estimaterotational correlation times [19]. Such analysis onURB RRM1 yields a τc of 4.9 ns, consistent with amonomeric, globular protein of 12 kDa and compa-rable to the rotational correlation time of U1A RRM1determined with similar experiments. A TRACTexperiment with RRM2 gives a τc of 4.1 ns (RRM2is 9.5 kDa). The line broadening observed in URBRRM1 is not simply due to aggregation but is ratheran intrinsic property of the domain.

Table 3. Protein family RNA affinity of RRM1 versus FLprotein

ΔΔFL-RRM10 (kcal/mol)

URB-SLII −0.6 ± 0.1URB-SLIV −0.4 ± 0.3U1A-SLII −0 ± 0.6SNF-SLII −3.3 ± 0.4SNF-SLIV −0.7 ± 0.3

Difference in free energies of binding of RRM1 and FL proteins isshown. Data for SNF and U1A are from [1]. All experiments wereperformed in 250 mM KCl, 10 mM sodium cacodylate (pH 7), and1 mM MgCl2, room temperature.

CPMG (Carr–Purcell–Meiboom–Gill) experimentscan be used to quantify exchange on the millisecond-to-microsecond timescale. Figure 8 shows the differ-ence in the effective transverse relaxation rates(ΔR2,eff) of the amide nitrogen when pulsed at highand low CPMG field strengths (1000 and 50 Hz) foreach residue. Residues showing CPMG dispersion(nonzero ΔR2,eff) are experiencing exchange on thistimescale. While residues in α1 and Loop 5 of U1Aexhibit exchange (regions distant from the RNAbinding surface), resonances from the rest of thedomain do not show dispersion. In contrast, both SNFand URB show evidence of fast exchange throughoutmuch of the protein. While ΔR2,eff is largest in α1 andLoop 5, the RNA binding surface of both proteinsexhibits significant dispersion, indicating that much ofthe domain, including the RNA binding surface, isexchanging between at least two conformations onthe millisecond-to-microsecond timescale.Rapid (picosecond-to-nanosecond) RRM1 back-

bone dynamics were assessed with 1H/15N hetero-nuclear nuclear Overhauser enhancement (NOE)measurements (Fig. 9). Those amides constrainedin a stable structure are readily distinguished fromdisordered or floppy regions by the value of the NOE.For URB and SNF RRM1, those backbone amides inthe body have an average value of 0.8, typical of afolded protein. The C-terminal tail of URB RRM1(including α3) is flexible, with heteronuclear NOEsthat decrease to b0. Here we again compare URBRRM1 with SNF RRM1 and U1A RRM1 [20], and wefind that the domains have very similar fastdynamics. Thus, while structure and fast timescaledynamics are very highly conserved among theproteins, slower timescale dynamics distinguish theRRMs and are most similar in the two proteins withcomparable RNA binding properties.

Jawed vertebrates experienced a uniqueevolution of U1A and U2B″

Experimental evidence has established that theRNA binding properties of URB family proteinschanged significantly subsequent to the gene dupli-cation in an ancestor of jawed vertebrates. However,the RNA binding properties of most metazoan U1A/U2B″/SNF proteins have not been determined. Whilethe sequencesimilarity of proteins in this family is high,we as yet have a poor grasp of the extent of functionalconservation or divergence of proteins within thisfamily. Given the results of the phylogenetic recon-struction, it is tempting to hypothesize that proteinfunction is highly conserved in organismswith a singleU1A/U2B″/SNF family protein.Functional divergence can be assessed through

statistical comparisons of evolutionary rates be-tween clusters of a phylogeny. The premise of thisanalysis is that functional divergence is highlycorrelated with changes in evolutionary rates, and

Page 8: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

Fig. 5. Salt dependence of URBand SNF binding to SLII and SLIV.Protein–RNA pairs are indicated inthe panels. Black data points andlines are for the FL protein, andblue data points and lines are forexper iments performed withRRM1. All experiments were per-formed in 10 mM cacodylate and2 mM MgCl2 (pH 7, 22 °C). Thesalt dependence for FL SNF waspreviously reported [5] and isshown for comparison. Slopes ofthe lines are interpreted in terms ofnet ions released and are tabulat-ed in Table 4.

3853Reconstruction of an Ancestral RRM

the analysis tests for differences in evolutionaryrates between clusters of proteins [21]. Usingfunctional divergence analysis on the U1A/U2B″/SNF protein family shows that the functionaldistance between most bilaterian clusters (particu-larly those with a single SNF protein) is small. Withthe exception of nematodes and jawed vertebrates,coefficients of functional divergence are low (b0.15)between other bilaterian clusters (SupplementaryTable 2). In contrast, coefficients of functionaldivergence are much higher between gnathostomeclusters and other bilaterian clusters (N0.65; Sup-plementary Table 2 and Supplementary Fig. 6a),represented schematically in the functional diver-gence map in Supplementary Fig. 6b. Given theprotein phylogeny, the results of the functionaldivergence analysis, and the experimentally deter-

Table 4. Comparison of salt dependencies for different RNAs

FL Protein

U1A-SLIISNF-SLII −5.7 ± 0.2SNF-SLIV −4.0 ± 0.2URB-SLII −4.2 ± 0.3URB-SLIV −3.8 ± 0.2U2B″-SLII −5.2 ± 0.3U2B″-SLIV −4.6 ± 0.3

The slope of the ln(KA,app) versus ln([KCl]) indicates the net ions absorbSNF, and U2B″ were previously reported [2–4]. ΔΔ is the difference bRRM1, indicating a difference in net ions released between RRM1 an

mined functional similarities between DrosophilaSNF and URB, it is likely that most SNF paralogsfrom organisms containing a single protein sharevery similar RNA binding properties. Our results withthe resurrected URB protein indicate that theseproperties have been conserved since prior to theCambrian radiation.Functional divergence analysis can be extended

to look at individual sites within the sequence andtest what parts of the protein are likely to becontributing to the functional differences. Artificiallyengineered U1A/U2B″ chimeras established β2 andLoop 3 as a region of the proteins that determinedtheir specificity (their “specificity motif”) for either U1SLII or U2 SLIV [2]. The RNA specificity motif ofDrosophila SNF includes sequences that appear ineither U1A or U2B″. Under the previous framework

and proteins

RRM1 ΔΔ (Net ions released)

−6.7 ± 1.1−3.1 ± 0.5 2.5 ± 0.6−2.7 ± 0.4 1.4 ± 0.4−3.2 ± 0.3 1.0 ± 0.4−3.9 ± 0.1 −0.04 ± 0.23

ed (positive) or released (negative) upon binding. Data for U1A, FLetween the slope of the salt dependence for the FL protein andd the FL protein.

Page 9: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

RNA binding surface

Loop 3

(a) (b)

(c)

MR

E

5.1 ± 0.4 kcal/mol

-1,500

-3,000

-4,500

-6,000

0 1 2 3 4 5 6 7 8 9

[Urea] (M)

Fig. 6. URB is predicted to be very similar in structure to its modern descendents Drosophila SNF and human U1A andU2B″. Structural models for URB RRM1 (a) were templated from 1FHT (a U1A RRM1 solution structure) [44], 1A9N (acocrystal structure that includes U2B″ RRM1) [52], and 2K3K (a solution structure of SNF RRM1) [18]. These models arealigned and colored by secondary structure. The RNA binding surface is indicated, as well as Loop 3. Similar models forURB RRM2 are shown (b) and were templated from 2U1A [53] and 2AYM [18]. (c) Chemical denaturation of URB RRM1.Mean residue ellipticity (deg cm2 dmol−1 residue−1) at 221 nm is plotted as a function of urea concentration for URBRRM1in 50 mM KCl and 10 mM sodium cacodylate (pH 7). Linear extrapolation of the data [43] gives a folding free energy of−5.1 kcal/mol.

3854 Reconstruction of an Ancestral RRM

of an ancient origin of protein subfunctionalization,this led to the identification of SNF as a chimericprotein. However, our reconstruction of U1A/U2B″phylogeny indicates the opposite: Drosophila SNF'sRNA specificity motif is unchanged from that of its

Urbilaterian ancestor. In organisms with a singleprotein, the RNA specificity motif has remainedhighly conserved (Supplementary Fig. 7). However,the RNA specificity motif evolved away from theoriginal sequence following the gene duplication in

Page 10: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

Fig. 7. (a) 1H/15N HSQC spectra of FL URB (black), RRM1 (pink), and RRM2 (green) show that backbone cross-peaksfor the individual RRMs overlay with cross-peaks from the FL protein. HSQC spectra were collected in 50 mM KCl, 20 mMsodium cacodylate, and 2 mM EDTA at 22.5 °C on a Varian Inova Unity 700-MHz spectrometer. (b) Relative cross-peakintensities for FL SNF (top) and FL URB resonances (bottom) are compared. Peak intensities are normalized to theaverage of RRM2 peak intensities, and the corresponding secondary structure is indicated above each plot.

3855Reconstruction of an Ancestral RRM

an ancestor of jawed vertebrates, and the sitesimplicated in functional divergence are segregatedto different parts of the RNA specificity motif for U1Aand U2B″ proteins (Supplementary Table 3). Thisresult is consistent with a model of subfunctionaliza-tion of SNF RNA binding properties in U1A and U2B″proteins of jawed vertebrates.

Lophotrochozoans and nematodes

The lophotrochozoan lineage also contains a geneduplication event, and while there is evidence offunctional divergence following this gene duplication(Supplementary Table 4), the functional distancefrom other lineages is much smaller than that of thegnathostome proteins. In particular, the RNA spec-ificity motifs of these lophotrochozoan clusters showsome indication of functional divergence, but this ismuch less pervasive than in jawed vertebrates. Thefunctional distance between the paralogous lopho-trochozoan clusters is smaller, indicating a substan-tial degree of functional similarity between theseproteins. This raises the possibility that these pro-teins do not function similar to gnathostome U1A andU2B″. It thus appears that the extent to which thegnathostome proteins have diverged from theircounterparts in other bilaterians to subfunctionalizeRNA binding through adaptations of the RNAspecificity motif is unique. If gene duplication inlineages other than gnathostomes result in subfunc-tionalization of RNA binding, it is likely that this is

accomplished through different evolutionary andbiochemical mechanisms.The nematode lineage is notable for its unusual

snRNA stem–loops and its U1A/U2B″/SNF proteins.Nematode sequences are distinguished in the proteinphylogeny by their long branch lengths (Supplemen-tary Fig. 2a); thus, it is not entirely surprising that theRNA loop sequences have also diverged from thoseof other bilaterians. snRNA stem–loops from nema-todes have shorter loop sequences that sometimeslack the almost universally conserved adenine at the5′ end of the loop. While an adenine is still the mostcommon 5′ loop position of nematode SLII and SLIV,the decrease in conservation of this RNA featuresuggests that different RNA bindingmechanismsmayhave evolved in this lineage, preceding the geneduplication in ancestors ofCaenorhabditis. NematodeRRM1 sequences have correspondingly unusualfeatures in the RNA specificity motif. Not surprisingly,Drosophila SNF and human U1A bind to Caenorhab-ditis elegans U1 SLII and U2 SLIV with much weakeraffinities than they bind to their natural counterparts(Table 5), which is likely to have resulted in selectionagainst such sequences in non-nematode lineages.

Discussion

Phylogenetic analysis of the U1A/U2B″/SNF pro-tein family has allowed us to determine that meta-zoans have a shared history of a single protein

Page 11: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

Fig. 8. Backbone resonances showing 15N CPMG dispersion. (a) The difference in the effective 15N transverserelaxation rate between νCPMG = 50 Hz and νCPMG = 1000 Hz is shown for U1A (top), SNF (middle), and URB (bottom) RRM1,indicating regions experiencing millisecond-to-microsecond exchange. In (b), regions of significant CPMG dispersion areplotted onto the protein structure of each RRM.

3856 Reconstruction of an Ancestral RRM

whose sequence and function are highly conserved.URB is our resurrected ancestral protein, which weshow to be a thermodynamically stable and solubleprotein with unique RNA binding properties.

Implications of conservation of SLII and SLIVsequences for RNA–protein coevolution

The RNA loop sequences important for proteinbinding are highly conserved, regardless of whetheran organism has one or two U1A/U2B″/SNF proteins.One of the distinguishing features between SLII andSLIV is the loop-closing base pair, which is almost

universally conserved as aWatson–Crick C-G pair inSLII. The loop-closing base pair of U2 SLIV is muchmore variable. Most commonly, U-U or U-G is found,but G-U andC-G base pairs also exist in this position.If a gene duplication relaxes the evolutionaryconstraints of the protein, it is plausible that thiscould be accompanied by a decrease in conservationof the RNA binding sequences from the ancestralstate as the protein–RNA interactions coevolve awayfrom a single protein state. However, there are nomajor differences in RNA sequence conservationbetween organisms with separate U1A/U2B″ pro-teins and those with a single SNF protein. It is

Page 12: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

Fig. 9. Heteronuclear NOEs of SNF and URB RRM1. (a) 15N–{1H} NOEs for SNF RRM1 (top) and URB RRM1 (bottom)amide residues are shown. (b) Heteronuclear NOEs are mapped as indicated by the legend for U1A (data from Ref. [20]),SNF, and URB RRM1.

3857Reconstruction of an Ancestral RRM

possible that the RNA loop sequence conservationreflects the relatively recent origin of gene duplica-tions and that, with time, it will change, too. However,it may also reflect additional evolutionary constraintson the RNA beyond U1A/U2B″ binding that arecurrently unknown, including pleiotropic effects.

Mechanistic implications for RNA binding

URB's properties persist in select modern para-logs. In particular, the slow backbone dynamics ofURB RRM1 are also found in SNF. Curiously orsuggestively, these two proteins also bind two RNAtargets with similar relative specificities. The RNAspecificity motif of URB RRM1 is conserved in SNF;however, while this motif is important in determiningRNA binding specificity, it is not the sole determinantof RNA binding properties: SNF and URB haveidentical specificity motifs but nevertheless havedistinct RNA binding affinities. Finally, in none of theproteins does RRM2 contribute to RNA binding nordoes it interact with RRM1. NMR data show that itsbackbone amides are in fast exchange both aloneand in the context of the FL protein.

The distinct functional properties of human U1A,human U2B″, and Drosophila SNF have evolved inthe presence of extensive structural identity; thedifferences at the level of primary, secondary, andtertiary structure among RRM1 of URB, U1A, U2B″,and SNF are minor. Within the common tertiarystructure, the determinants of their unique RNAbinding properties remain unclear. A vital contributionto RNA binding comes from the surface hydrogenbonding networks that couple the specificity motifto the RRM's characteristic RNP motifs. Thesesequences are coupled to other sites on the RRM,as well as to the RNA [22,23]. We have shown thatLoop 3, as part of the RNA specificity motif, is a regionof extensive functional divergence in jawed verte-brates. Modulation of this region is therefore likely toalter the surface hydrogen bonding networks of theRRM. We can now use information about proteinevolution to understand which mutations were re-sponsible for altering RNA binding and the mecha-nisms by which these mutations modulate proteinfunction. We predict that changes in protein dynamicsand exchange properties that result from alterations tothe hydrogen bonding networks are likely to beimportant determinants of RRM function. While the

Page 13: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

Table 5. U1A and SNF binding to C. elegans (CE) RNAs

Kobs,CE (M) ΔG° (kcal/mol) Kobs,CE/Kobs,hs ΔΔG° (kcal/mol)

U1A-CE II 4.7 ± 0.4 × 10−8 −9.9 ± 0.05 116.5 ± 30.6 3.1 ± 0.4SNF-CE II 4.8 ± 0.3 × 10−8 −8.5 ± 0.04 22.0 ± 2.1 1.8 ± 0.1SNF-CE IV N2.5 × 10−6 N−7.6 N10 N1.4

CE SLII: CUACCC AUUGCACUUUU GGUGCG.CE SLIV: CCUGGC GUUGCACUGCU GCCGGG.Putative loop sequences are in boldface. ΔΔG° = ΔG°(CE) − ΔG°(hs); hs, Homo sapiens.

3858 Reconstruction of an Ancestral RRM

importance of protein structure in determining functionhas long been appreciated, measuring protein mo-tions and, more importantly, establishing their func-tional significance is an ongoing endeavor [24–32].Establishing how thesemotions havebeen conservedand modulated is therefore critical to our understand-ing of these molecules.The RNA binding properties of human U1A and

U2B″ have long been understood to be character-istic of the U1A/U2B″/SNF protein family. However,it is now clear that the gnathostome proteins are notat all characteristic of this family but functionallyquite distant. While gene duplication in this lineagehas led to distinct changes in specificity for targetRNAs, it is possible that alternative functions havealso emerged for these gnathostome proteins.Given our functional divergence analysis and theseparate origin of gene duplications in otherlineages, it is also likely that gene duplications indifferent lineages do not result in protein evolutiontoward identical functional endpoints. Indeed, ex-periments have shown that C. elegans U1A andU2B″ are functionally redundant [33]. Transgenicexpression of both human U1A and U2B″ is unableto rescue the embryonic lethal phenotype of SNFknockout in flies [34]. Given these results and thesmall functional distance between lophotrochozoanSNF family paralogs, an intriguing question iswhether functional redundancy has been retainedfor a specific purpose or whether these paralogs innematodes and lophotrochozoans are still in theprocess of diverging.

Conclusions

Subfunctionalization following gene duplication isan important source of molecular diversity. While ourwork shows that the characteristics distinguishingthe U1A/U2B″/SNF family's RRM1 from most RRMs—their extremely high affinity and specificity for SLII/SLIV type RNA sequences—were well establishedin URB, subsequent subfunctionalization of RNAbinding as seen in humans is much more recent andis restricted to jawed vertebrates. We can now useour phylogenic tree to construct intermediates in theevolution from URB to U1A/U2B″ to trace theprogress of their distinctive properties.

The eukaryotic spliceosome seems to haveevolved its modern, complex architecture veryearly [35], but the presence of a single SNF proteinsuggests a simpler early architecture preceding thedivision of the eukaryotic kingdoms. That a singleprotein was historically found to bind both the U1 andU2 snRNAs raises the possibility that, in a primitivespliceosome, a single snRNP may have recognizedboth the 5′ splice site and the branch point, tasks thatsubsequently were subfunctionalized by the U1 andU2 snRNPs, respectively. Understanding the rela-tionship between modern snRNP proteins andmodern snRNAs may further define how thespliceosome evolved to its present state andelucidate fundamental aspects of the splicing reac-tion. While functional roles for U1A/U2B″/SNF pro-teins in pre-mRNA splicing have not yet beendetermined, it will be intriguing to understand theconsequences for splicing of whether an organismhas one or two SNF family proteins and, in thetwo-protein case, whether the consequences arelineage dependent.

Materials and Methods

Phylogenetic analysis

The protein phylogeny was inferred from extant proteinsequences using ML methods. Protein sequences frombroadly diverse metazoans and closely related choanozoawere obtained after BLASTing (TBLASTN) annotated/confirmed U1A, U2B″, and SNF sequences againstmultiple databases in National Center for BiotechnologyInformation. Sequences that had greater similarity to otherknown, RRM-containing proteins (PTB, ELAV-2, etc.) werediscarded. Accession numbers of sequences used infurther analysis are given in Supplementary Table 10.RRM regions were aligned using MUSCLE and had veryhigh alignment scores when processed in T-Coffee (Core[36]) and GBlocks [37,38]. The aligned linker sequenceswere manually refined, after which they produced highalignment scores in T-Coffee and GBlocks. The alignedsequences were used as inputs for ProtTest [39] to identifythe best model for tree reconstruction. The sequencealignments were then used for ML tree reconstructionusing PhyML [12] with SPR tree searches. Because ofresults from the ProtTest analysis, the LG model of aminoacid substitution was used, along with equilibrium amino

Page 14: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

3859Reconstruction of an Ancestral RRM

acid frequencies from the LG model. Substitution ratevariation was described with a discrete gamma model, andthe α parameter was optimized for the data. Branchsupports are given as the approximate likelihood ratio test(aLRT) statistic [40]. aLRT scores of 4.6 and 9.2correspond to being a 10-fold or a 100-fold (respectively)more likely to observe the node than not. In Fig. 1a, thebranches in the deuterostomes with aLRT scores less than4.6 were collapsed into a polytomy, as the phylogeny waspoorly resolved.The sequence identity of the RRMs (particularly RRM2)

is much higher than ideal for phylogenetic reconstruction,as the phylogenetic signal is weak with such highsequence identity, resulting in diminished topologicalaccuracy [12]. Inclusion of a linker alignment was thereforecritical for reasonable tree topologies. Tree reconstructionis performed on gapless alignments, and because theinterdomain linker is a site of many insertions anddeletions, the alignable linker sequence for all proteinswas relatively short. For the subgroup of deuterostomesequences, it was possible to include a longer interdomainlinker alignment for tree reconstruction. This was used toresolve the topology of the deuterostome tree. SeparateML and MP trees were reconstructed for deuterostomes.The MP tree was reconstructed with ProtPars in thePHYLIP package [41].The ML trees were rooted using Choanozoan proteins

as an outgroup for the rest of the metazoan tree or C.elegans protein sequences for the deuterostome tree.Proteins were labeled SNF if the organism contained asingle U1A/U2B″/SNF protein. In cases outside the jawedvertebrates where organisms contained two proteins,these were arbitrarily numbered 1 and 2. Trees werevisualized in Archaeopteryx.

Ancestral U1A/U2B″ synthesis

CodeML in PAML [13,42] was used for inference of theancestral amino acid sequences of the U1A/U2B″/SNFprotein family. The ancestral sequences are inferred fromthe phylogeny and the modern sequences using MLmethods. A marginal reconstruction was performed usingsimilar evolutionary model parameters as those used fortree reconstruction. The ML tree obtained with PhyML wasused as the input tree. The highest posterior probabilitysequence at the node corresponding to the last commonancestor of modern bilaterians was obtained, and thissequence was sent to GenScript for synthesis of the URBgene. The protein sequence was back-translated, and theDNA sequence was optimized for Escherichia coli codonuse. NCO (5′) and HindIII (3′) restriction sites were addedto the sequence for subsequent subcloning into the Ptacexpression vector.

Protein characterization

Ptac plasmids with the URB gene under an IPTG-indu-cible promoter were transformed into BL-21 cells forsubsequent overexpression of URB proteins. Purificationwas performed in a manner similar to purification ofDrosophila SNF proteins [5]. E. coli BL-21 cells weregrown at 37 °C to an optical density of 0.8. Cells were theninduced with 1 mM IPTG for 4 h at 25 °C, spun down, and

kept at −80 °C. Cells were then thawed and resuspendedat 4 °C in 30 mM sodium acetate (pH 5.3), 200 mMNaCl, 2 mM ethylenediaminetetraacetic acid (EDTA), and8.5% sucrose. Sigma protease inhibitor cocktail, PMSF,and DNase II were added. The cells were French-pressed and spun down. For FL URB, the supernatantwas passed over an SP Sepharose FPLC column (GE),washed with 50 mM Tris (pH 7.5) and 175 mM NaCl, andeluted over 275 mM NaCl gradient (4 h, 1.5 mL/min).Fractions with URB were collected, concentrated, andbuffer-exchanged into 10 mM sodium cacodylate and10 mM KCl (pH 7.0). Purification of URB RRM1 wassimilar, except that the column was pre-equilibrated in0 M NaCl and 50 mM Tris (pH 7.5); the supernatant waswashed with 0 M and 100 mM NaCl, and the salt gradientwas adjusted to 100–350 mM NaCl (3 h).

Purification of RRM2

Cells with the URB RRM2 gene were grown, induced,resuspended, and lysed in a similar manner to the othercells. Following centrifugation, the supernatant wasdialyzed against 25 mM sodium acetate (pH 5.3) for 2 h.The dialysate was filtered and loaded onto an SPSepharose FPLC column (GE) pre-equilibrated with25 mM NaOAc (pH 5.3). The column was washed in thebuffer and then eluted in a 0–250 mM NaCl gradient (3 h,1.5 mL/min). Fractions with URB RRM2 were collected,concentrated, and buffer-exchanged into 10 mM sodiumcacodylate and 10 mM KCl (pH 7.0).

Protein circular dichroism/denaturation

A Jasco J715 spectropolarimeter was used to record CDspectra. All spectra were recorded at room temperature.The sample buffer contained 50 mM KCl and 10 mMcacodylate (pH 7), and spectra were recorded for sampleswith a protein concentration of 20 μM. CD spectra of eachpurified RRM (in 0 M urea) are consistent with a canonicalα/β composition.For chemical denaturation, mean residue ellipticity at

221 nm was monitored as a function of urea concentration.The melts were fit in Scientist (Micromath) to a two-statefolding model using the linear extrapolation method ofSantoro and Bolen [43]. The unfolding free energy wasdetermined from the average of two separate experiments.Uncertainty was propagated from the two experiments. Thisresulted in an unfolding free energy of 5.1 ± 0.4 kcal/mol.NMR samples were prepared following E. coli growth

in minimal media supplemented with either 15NH4Cl or15NH4Cl and 13C glucose. Proteins were purified asdescribed above and buffer-exchanged into 50 mM KCl,20 mM sodium cacodylate, and 2 mM EDTA at pH 6.5with 10% D2O. For all samples, the protein concentra-tion was 350 μM. Previously published assignments forSNF and U1A RRM1 were used (Biological MagneticResonance Bank ID 6930 and Protein Data Bank ID1FHT [44]).All spectra were acquired on Varian Unity Inova

500-MHz or 700-MHz spectrometers. Data were pro-cessed in NMRPipe and analyzed using nmrViewJ. URBRRM1 1H/15N and 13Cα assignments were made basedon HNCA spectra acquired at 30 °C and similarities to SNFassignments. With the exception of I30, L46, and K85, all

Page 15: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

3860 Reconstruction of an Ancestral RRM

non-proline amide resonances were assigned. URB1H/15N HSQC spectra were acquired at 22.5 °C,700 MHz; spectral widths were 7300 Hz in the 1Hdimension and 2000 Hz in 15N.For 15N–{1H} NOE measurements, duplicate pairs of

NOE spectra were collected for SNF and U1A with andwithout a 3-s 1H pre-saturation. Spectra were collected at28 °C on a 500-MHz spectrometer with 80 scans and a 3-srecycle delay. The intensity ratio was used to determinethe steady-state NOE. Heteronuclear NOEs determinedfrom duplicate data sets were averaged.Backbone 15N millisecond-to-microsecond exchange in

SNF, U1A, and URB RRM1 was probed using CPMGexperiments [45]. Spectra were recorded at 22.5 °C on a700-MHz spectrometer, with 32 scans and a 2.5-s recycledelay. For all samples, a reference spectrumwas acquired,as well as spectra with CPMG field strengths of 50 Hz and1000 Hz. The total CPMG block was 40 ms. The ΔR2,eff =ΔR2,eff,50 Hz − ΔR2,eff,1000 Hz was calculated as:

ΔR2;eff ¼ − ln Iν¼50Hz=Irefð Þ− ln Iν¼1000Hz=Irefð Þ½ �=0:04

where I indicates the peak intensity from the referencespectrum (Iref) or the 50-Hz or 1000-HzCPMG field strengthspectra (Iν = 50 Hz, Iν = 1000 Hz). ΔR2,eff is a function of thechemical shift difference among the states, the exchangerate, and the population of the states. Uncertainties wereestimated from the baseplane noise.

Uncertainty in URB reconstruction

The marginal reconstruction calculates posterior proba-bilities for each amino acid at each site. This allowsassessment of the protein sequence for sites that areambiguously determined. There was a single site in RRM1for which a second amino acid had a posterior probability(PP) 0.2 b PP b 0.8 (Supplementary Table 3). Site-dir-ected mutagenesis (QuikChange, Agilent) was used tointroduce the URB RRM1 A84S mutation in order toassess the functional implication of the uncertainty in thereconstruction. The “mutant” protein was expressed andpurified identically with its wild-type counterpart.

Functional divergence analysis

The premise of Gu's functional divergence analysis isthat functional change is highly correlated with changes inevolutionary rate [46,47]. After a major evolutionary event,such as a gene duplication, the evolutionary rate of one ofthe genes may increase at sites that are responsible forfunctional divergence. Tests for Type I functional diver-gence detect whether or not there are significant differ-ences in the evolutionary rates between two clusters orbranches of the tree. Type II functional divergencecharacterizes the situation in which a change in character(amino acid) occurred early after the gene duplication, butsubsequent evolutionary rates between the paralogousclusters are similar. This is also called “constant butdifferent” or “cluster specific” divergence [48].DIVERGE compares monophyletic clusters within the

tree to determine if there is a significant difference inevolutionary rates between sites and between clusters. Itestimates a coefficient of functional divergence (Θ)

between two clusters. Θ varies between 0 and 1, where0 indicates no functional divergence, and larger values ofΘ indicate a larger degree of divergence. The statisticaltest for functional divergence is to determine whether Θ issignificantly different from 0. For the functional distanceanalysis, Θ is transformed into a distance.Different gene clusters in the U1A/SNF/U2B″ family

were analyzed for Type I and Type II functionaldivergence. The ML tree shown in Fig. 1a was modifiedsuch that the deuterostome branch was replaced withthe deuterostome tree shown in Fig. 1b. Non-bilateriansequences were removed, and branch lengths werere-optimized in CodeML. The protein sequence align-ment and resulting tree were then used as inputs forDIVERGE 2.0. Gnathostome U1A and U2B″ clusterswere then compared with other clusters of proteins withinthe tree. Based on the Type I functional divergenceanalysis, a functional distance map [21] of the clusterswas generated.

RNA consensus sequences

Curated U1 and U2 snRNA sequences from multipleorganisms are available through the former uRNADB† andRfam‡§. Metazoan sequences from these databases wereobtained and used as BLAST inputs to obtain additionalsnRNA sequences. From the snRNA sequences andfragments obtained following BLAST searches, stem–loopII of U1 and stem–loop IV of U2 were aligned. Thesesequences are shown in Supplementary Table 10b.Consensus sequence figures were made with WebLogo[49,50].

RNA binding experiments

Nitrocellulose filter binding experiments were performedas previously described [5] to determine binding constantsfor protein–RNA interactions. Unless otherwise noted, allexperiments were performed in 250 mM KCl, 10 mMcacodylate, and 1 mM MgCl2 (pH 7.0) at room tempera-ture. Titrations were fit to a Langmuir isotherm in Scientist(Micromath). Experiments were performed in duplicateand repeated at least 2 times. Reported errors are thelarger of either the standard deviation from repeatexperiments or the propagated error.RNA stem–loops were transcribed from DNA oligonu-

cleotides (IDT) with T7 RNA polymerase, using [α-32P]UTP and [α-32P]CTP. The transcription products weregel-purified. The different RNAs were as follows:

U1 SLII: 5′-GGAGACCAUUGCACUCCGGUUUCCU2 SLIV: 5′-GGCCGGUAUUGCAGUACCGCCGGGUCCCE SLII: 5′-GGCCGCAUUGCACUUUUGCGGCCCE SLIV: 5′-GGCCGCGUUGCACUGCUGCGGCC

The underlined loop sequences for U1 SLII and U2 SLIVcorrespond to sequences from humans (SLII) and Dro-sophila (SLIV). URB binding to RNAs with the Drosophilaloop of SLII (AUUGCACCUC) and the human loop-closingbase pair of SLIV (U-U; human Loop IV is identical with thatof Drosophila) was identical under the conditions tested[250 mM KCl, 10 mM cacodylate, and 1 mM MgCl2

Page 16: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

3861Reconstruction of an Ancestral RRM

(pH 7.0, room temperature)]. Experiments for nonspecificbinding were conducted using a 25-nucleotide randomsequence pool.

Homology modeling

Homology modeling of the URB RRM1 structure wasperformed with SWISS-MODEL [51]. The RRM1 sequencewas aligned with RRM1 sequences from Drosophila SNFand human U1A and U2B″. Structures for URB weretemplated from 1FHT (a U1A RRM1 solution structure)[44], 1A9N (a cocrystal structure that includes U2B″RRM1) [52], and 2K3K (a solution structure of SNFRRM1) [18]. Backbone RMSD values from the threeresulting structures (excluding Helix 3) were minimized inVMD to align the structures. Models for URB RRM2 weretemplated from 2U1A [53] and 2AYM [18].

Acknowledgements

Research reported in this publication was sup-ported by the National Institute of General MedicalSciences of the National Institutes of Health underaward numbers R01 GM096444 to K.B.H., F31GM089576 to S.G.W., and F32 GM090650 to M.J.H.We thank Dr. Kim Delaney for assistance withcloning URB constructs and Dr. Gregory DeKosterfor assistance with NMR spectroscopy.

Conflict of Interest. The authors declare noconflicts of interest.

Supplementary Data

Supplementary data to this article can be foundonline at http://dx.doi.org/10.1016/j.jmb.2013.05.031

Received 16 March 2013;Received in revised form 6 May 2013;

Accepted 8 May 2013Available online 22 June 2013

Keywords:RRM;

ancestral reconstruction;RNA binding;

protein phylogeny;U1A protein

This is an open-access article distributed under the termsof the Creative Commons Attribution-NonCommercial-NoDerivative Works License, which permits non-commercialuse, distribution, and reproduction in any medium,provided the original author and source are credited.

† http://web.archive.org/web/20050913043846/http://psyche.uthct.edu/dbs/uRNADB/u2/u2.html

‡ http://rfam.sanger.ac.uk/family/RF00003§ http://rfam.sanger.ac.uk/family/RF00004

Abbreviations used:snRNP, small nuclear ribonucleoprotein; RRM, RNA

recognition motif; ML, maximum likelihood; MP, maximumparsimony; FL, full length; HSQC, heteronuclear single

quantum coherence; NOE, nuclear Overhauser enhance-ment; aLRT, approximate likelihood ratio test; EDTA,

ethylenediaminetetraacetic acid.

References

[1] Polycarpou-Schwarz M, Gunderson SI, Kandels-Lewis S,Seraphin B, Mattaj IW. Drosophila SNF/D25 combines thefunctions of the two snRNP proteins U1A and U2B' that areencoded separately in human, potato, and yeast. RNA1996;2:11–23.

[2] Scherly D, Boelens W, Dathan NA, van Venrooij WJ, MattajIW. Major determinants of the specificity of interactionbetween small nuclear ribonucleoproteins U1A and U2B″and their cognate RNAs. Nature 1990;345:502–6.

[3] Hall KB, Stump WT. Interaction of N-terminal domain of U1Aprotein with an RNA stem/loop. Nucleic Acids Res 1992;20:4283–90.

[4] Williams SG, Hall KB. Human U2B″ protein binding to snRNAstemloops. Biophys Chem 2011;159:82–9.

[5] Williams SG, Hall KB. Coevolution of Drosophila snf proteinand its snRNA targets. Biochemistry 2010;49:4571–82.

[6] Pauling L, Zuckerkandl E. Chemical paleogenetics: molec-ular “Restoration Studies” of exctinct forms of life. Acta ChemScand 1963;17:S9–S16.

[7] Shi Y, Yokoyama S. Molecular analysis of the evolutionarysignificance of ultraviolet vision in vertebrates. Proc NatlAcad Sci USA 2003;100:8308–13.

[8] Gaucher EA, Thomson JM, Burgan MF, Benner SA. Inferringthe palaeoenvironment of ancient bacteria on the basis ofresurrected proteins. Nature 2003;425:285–8.

[9] Thornton JW, Need E, Crews D. Resurrecting the ancestralsteroid receptor: ancient origin of estrogen signaling. Science2003;301:1714–7.

[10] Ugalde JA, Chang BSW, Matz MV. Evolution of coralpigments recreated. Science 2004;305:1433.

[11] Harms MJ, Thornton JW. Analyzing protein structure andfunction using ancestral gene reconstruction. Curr OpinStruct Biol 2010;20:360–6.

[12] Guindon S, Gascuel O. A simple, fast, and accurate algorithmto estimate large phylogenies by maximum likelihood. SystBiol 2003;52:696–704.

[13] Yang Z. PAML 4: phylogenetic analysis by maximumlikelihood. Mol Biol Evol 2007;24:1586–91.

[14] Hall KB. Interaction of RNA hairpins with the human U1AN-terminal RNA binding domain. Biochemistry 1994;33:10076–88.

[15] Williams DJ, Hall KB. RNA hairpins with non-nucleotidespacers bind efficiently to the human U1A protein. J Mol Biol1996;257:265–75.

[16] Zeng Q, Hall KB. Contribution of the C-terminal tail of U1ARBD1 to RNA recognition and protein stability. RNA 1997;3:303–14.

[17] Goldstein RA. The evolution and evolutionary consequencesof marginal thermostability in proteins. Proteins 2011;79:1396–407.

Page 17: Resurrection of an Urbilaterian U1A/U2B″/SNF Protein · those that form a common structural scaffold. The authors ... Phylogeny of the U1A/U2B″/SNF protein family indicates that

3862 Reconstruction of an Ancestral RRM

[18] Hu J, Cui G, Li C, Liu C, Shang E, Lai L, et al. Structure andnovel functional mechanism of Drosophila SNF in sex-lethalsplicing. PLoS One 2009;4:e6890.

[19] Lee D, Hilty C, Wider G, Wuthrich K. Effective rotationalcorrelation times of proteins from NMR relaxation interfer-ence. J Magn Reson 2006;178:72–6.

[20] Showalter SA, Hall KB. Altering the RNA-binding mode of theU1A RBD1 protein. J Mol Biol 2004;335:465–80.

[21] Gu X. Functional divergence in protein (family) sequenceevolution. Genetica 2003;118:133–41.

[22] Kranz JK, Hall KB. RNA recognition by the human U1Aprotein is mediated by a network of local cooperativeinteractions that create the optimal binding surface. J MolBiol 1999;285:215–31.

[23] Showalter SA, Hall KB. Correlated motions in the U1 snRNAstem/loop 2: U1A RBD1 complex. Biophys J 2005;89:2046–58.

[24] Showalter SA, Hall KB. A functional role for correlated motionin the N-terminal RNA-binding domain of human U1A protein.J Mol Biol 2002;322:533–42.

[25] Katoh E, Louis JM, Yamazaki T, Gronenborn AM, TorchiaDA, Ishima R. A solution NMR study of the binding kineticsand the internal dynamics of an HIV-1 protease-substratecomplex. Protein Sci 2003;12:1376–85.

[26] Papaleo E, Riccardi L, Villa C, Fantucci P, De Gioia L.Flexibility and enzymatic cold-adaptation: a comparativemolecular dynamics investigation of the elastase family.Biochim Biophys Acta 2006;1764:1397–406.

[27] Butterwick JA, Palmer AGR. An inserted Gly residue finetunes dynamics between mesophilic and thermophilic ribo-nucleases H. Protein Sci 2006;15:2697–707.

[28] Henzler-Wildman KA, Lei M, Thai V, Kerns SJ, Karplus M,Kern D. A hierarchy of timescales in protein dynamics islinked to enzyme catalysis. Nature 2007;450:913–6.

[29] Marlow MS, Dogan J, Frederick KK, Valentine KG, Wand AJ.The role of conformational entropy in molecular recognitionby calmodulin. Nat Chem Biol 2010;6:352–8.

[30] Sapienza PJ, Mauldin RV, Lee AL. Multi-timescale dynamicsstudy of FKBP12 along the rapamycin-mTOR bindingcoordinate. J Mol Biol 2011;405:378–94.

[31] Bhabha G, Lee J, Ekiert DC, Gam J, Wilson IA, Dyson HJ,et al. A dynamic knockout reveals that conformationalfluctuations influence the chemical step of enzyme catalysis.Science 2011;332:234–8.

[32] Rivalta I, Sultan MM, Lee NS, Manley GA, Loria JP, BatistaVS. Allosteric pathways in imidazole glycerol phosphatesynthase. Proc Natl Acad Sci USA 2012;109:E1428–36.

[33] Saldi T, Wilusz C, MacMorris M, Blumenthal T. Functionalredundancy of worm spliceosomal proteins U1A and U2B″.Proc Natl Acad Sci USA 2007;104:9753–7.

[34] Stitzinger SM, Conrad TR, Zachlin AM, Salz HK. Functionalanalysis of SNF, the Drosophila U1A/U2B″ homolog: identi-fication of dispensable and indispensable motifs for bothsnRNP assembly and function in vivo. RNA 1999;5:1440–50.

[35] Collins L, Penny D. Complex spliceosomal organizationancestral to extant eukaryotes. Mol Biol Evol 2005;22:1053–66.

[36] Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A,Chang JM, et al. T-Coffee: a web server for the multiplesequence alignment of protein and RNA sequences usingstructural information and homology extension. Nucleic AcidsRes 2011;39:W13–7.

[37] Castresana J. Selection of conserved blocks from multiplealignments for their use in phylogenetic analysis. Mol BiolEvol 2000;17:540–52.

[38] Talavera G, Castresana J. Improvement of phylogeniesafter removing divergent and ambiguously aligned blocksfrom protein sequence alignments. Syst Biol 2007;56:564–77.

[39] Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fastselection of best-fit models of protein evolution. Bioinformat-ics 2011;27:1164–5.

[40] Anisimova M, Gascuel O. Approximate likelihood-ratio testfor branches: a fast, accurate, and powerful alternative. SystBiol 2006;55:539–52.

[41] Felsenstein J (2005). PHYLIP (Phylogeny Inference Pack-age) version 3.6. [Distributed by the author].

[42] Yang Z. PAML: a program package for phylogeneticanalysis by maximum likelihood. Comput Appl Biosci1997;13:555–6.

[43] Santoro MM, Bolen DW. Unfolding free energy changesdetermined by the linear extrapolation method. 1. Unfoldingof phenylmethanesulfonyl α-chymotrypsin using differentdenaturants. Biochemistry 1988;27:8063–8.

[44] Avis JM, Allain FH, Howe PW, Varani G, Nagai K, NeuhausD. Solution structure of the N-terminal RNP domain of U1Aprotein: the role of C-terminal residues in structure stabilityand RNA binding. J Mol Biol 1996;257:398–411.

[45] Loria JP, RanceM, Palmer AGR. A TROSYCPMG sequencefor characterizing chemical exchange in large proteins. JBiomol NMR 1999;15:151–5.

[46] Gu X. Statistical methods for testing functional divergenceafter gene duplication. Mol Biol Evol 1999;16:1664–74.

[47] Gu X.Maximum-likelihood approach for gene family evolutionunder functional divergence. Mol Biol Evol 2001;18:453–64.

[48] Gu X. A simple statistical method for estimating type-II(cluster-specific) functional divergence of protein sequences.Mol Biol Evol 2006;23:1937–45.

[49] Schneider TD, Stephens RM. Sequence logos: a new way todisplay consensus sequences. Nucleic Acids Res 1990;18:6097–100.

[50] Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: asequence logo generator. Genome Res 2004;14:1188–90.

[51] Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODELworkspace: a web-based environment for protein structurehomology modelling. Bioinformatics 2006;22:195–201.

[52] Price SR, Evans PR, Nagai K. Crystal structure of thespliceosomal U2B″–U2A′ protein complex bound to afragment of U2 small nuclear RNA. Nature 1998;394:645–50.

[53] Lu J, Hall KB. Tertiary structure of RBD2 and backbonedynamics of RBD1 and RBD2 of the human U1A proteindetermined by NMR spectroscopy. Biochemistry 1997;36:10393–405.