evolutionary and transcriptional analysis of karyopherin ... · pdf fileate nucleocytoplasmic...

16
Evolutionary and Transcriptional Analysis of Karyopherin Superfamily Proteins* S Yu Quan‡, Zhi-Liang Ji‡§, Xiao Wang‡, Alan M. Tartakoff¶, and Tao Tao‡§ In eukaryotes, karyopherin superfamily proteins medi- ate nucleocytoplasmic transport of macromolecules. We investigated the evolutionary and transcriptional patterns of these proteins using bioinformatics approaches. No obvious homologs were found in prokaryotes, but an ex- tensive set of -karyopherin proteins was found in yeast. Among 14 -karyopherins of Saccharomyces cerevisiae, eight corresponded to their human orthologs directly without diversification, two were lost, and the remaining four proteins exhibited gene duplications by different mechanisms. We also identified -karyopherin orthologs in Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Xenopus tropicalis, Gallus gallus, and Mus musculus. -Karyopherins were ubiquitously but nonuni- formly expressed in distinct cells and tissues. In yeast and mice, the titer of some -karyopherin transcripts appeared to be regulated both during the cell cycle and during development. Further virtual analysis of promoter binding elements suggested that the transcription fac- tors SP1, NRF-2, HEN-1, RREB-1, and nuclear factor Y regulate expression of most -karyopherin genes. These findings emphasize new mechanisms in func- tional diversification of -karyopherins and regulation of nucleocytoplasmic transport. Molecular & Cellular Proteomics 7:1254 –1269, 2008. Nucleocytoplasmic transport of proteins, RNAs, and ribo- somes is essential in eukaryotic cells. In this process, the -karyopherins (or importin- superfamily members) play a central role as they transport cargoes across the nuclear pore complex (1–9). Members of this family of 95–145-kDa proteins have a so-called importin- N-terminal (IBN_N) 1 domain (or RanGTP binding domain) at their N terminus and several “HEAT repeat” motifs that mostly occupy the C- terminal portion of the structure. The HEAT repeat motifs are able to assume different conformations in different func- tional states. The flexibility of these HEAT repeat motifs facilitates the accommodation of their binding partners by an induced fit type of mechanism (10). These features dis- tinguish -karyopherins from transporters that deliver car- goes to other subcellular organelles. -Karyopherins have been found from yeast to humans. In Saccharomyces cer- evisiae, 14 -karyopherins have been shown to transport corresponding cargoes in and out of the nucleus (11, 12), whereas in mammalian cells, about 20 -karyopherins par- ticipate in these events (7). The family includes both import and export receptors. Nuclear proteins larger than the “diffusion limit” of the nuclear pore complex (50 kDa) generally have a nuclear localization signal for nuclear import, whereas smaller pro- teins that are actively excluded from the nucleus have a nuclear export signal (3). The transport of these cargoes is mediated by karyopherins in conjunction with the small GTPase Ran, which cycles between its GDP-bound (in the cytoplasm) and GTP-bound (in the nucleus) forms. RanGTP confers directionality upon transport by controlling the con- formation of -karyopherins and their ability to bind cargo (13, 14). Although most karyopherins are constitutively ex- pressed in cells (6), emerging evidence shows that expres- sion of at least some karyopherin genes is regulated (15, 16) presumably due to varying requirements for transport, e.g. during development (17). For example, expression of Kpnb1 and Ranbp5 is regulated during rodent spermatogenesis (18). Moreover the expression of IPO13 in rats and humans is regulated both hormonally and developmentally (16, 19). Although -karyopherins have common features, the bio- logical importance of their functions in nuclear transport is impressively varied. A little explored aspect of -karyo- pherin biology is their evolution. In particular, it is unclear how the 14 yeast -karyopherins gave rise to 20 members in human. Moreover it is unknown whether the cell type ex- pression of -karyopherin genes reflects corresponding functional requirements. To address the above questions, we used bioinformatics approaches. We also investigated the expression of each -karyopherin member through the systematic analysis of transcriptional profiles and identification of putative transcrip- tion factor binding sites. From the ‡School of Life Sciences and Key Laboratory for Cell Biology and Tumor Cell Engineering, the Ministry of Education of China, Xiamen University, Xiamen, Fujian 361005, China and ¶De- partment of Pathology and Cell Biology Program, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106 Received, October 22, 2007, and in revised form, February 25, 2008 Published, MCP Papers in Press, March 18, 2008, DOI 10.1074/ mcp.M700511-MCP200 1 The abbreviations used are: IBN_N, importin- N-terminal; BLAST, Basic Local Alignment Search Tool; GNF, Genomics Institute of the Novartis Research Foundation; SPM, specificity measure; TFBS, transcription factor binding site; IDs, identities; NF-Y, nuclear factor Y; NRF-2, nuclear respiratory factor 2; RREB-1, Ras-respon- sive element-binding protein 1; HEN-1, helix-loop-helix protein 1; dpc, days postcoitus; HEAT, huntingtin, elongation factor 3, protein phosphatase 2A, Tor1. Research © 2008 by The American Society for Biochemistry and Molecular Biology, Inc. 1254 Molecular & Cellular Proteomics 7.7 This paper is available on line at http://www.mcponline.org

Upload: vuongmien

Post on 06-Mar-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Evolutionary and Transcriptional Analysis ofKaryopherin � Superfamily Proteins*□S

Yu Quan‡, Zhi-Liang Ji‡§, Xiao Wang‡, Alan M. Tartakoff¶, and Tao Tao‡§

In eukaryotes, karyopherin � superfamily proteins medi-ate nucleocytoplasmic transport of macromolecules. Weinvestigated the evolutionary and transcriptional patternsof these proteins using bioinformatics approaches. Noobvious homologs were found in prokaryotes, but an ex-tensive set of �-karyopherin proteins was found in yeast.Among 14 �-karyopherins of Saccharomyces cerevisiae,eight corresponded to their human orthologs directlywithout diversification, two were lost, and the remainingfour proteins exhibited gene duplications by differentmechanisms. We also identified �-karyopherin orthologsin Caenorhabditis elegans, Drosophila melanogaster,Danio rerio, Xenopus tropicalis, Gallus gallus, and Musmusculus. �-Karyopherins were ubiquitously but nonuni-formly expressed in distinct cells and tissues. In yeastand mice, the titer of some �-karyopherin transcriptsappeared to be regulated both during the cell cycle andduring development. Further virtual analysis of promoterbinding elements suggested that the transcription fac-tors SP1, NRF-2, HEN-1, RREB-1, and nuclear factor Yregulate expression of most �-karyopherin genes.These findings emphasize new mechanisms in func-tional diversification of �-karyopherins and regulation ofnucleocytoplasmic transport. Molecular & CellularProteomics 7:1254–1269, 2008.

Nucleocytoplasmic transport of proteins, RNAs, and ribo-somes is essential in eukaryotic cells. In this process, the�-karyopherins (or importin-� superfamily members) play acentral role as they transport cargoes across the nuclearpore complex (1–9). Members of this family of 95–145-kDaproteins have a so-called importin-� N-terminal (IBN_N)1

domain (or RanGTP binding domain) at their N terminus andseveral “HEAT repeat” motifs that mostly occupy the C-terminal portion of the structure. The HEAT repeat motifsare able to assume different conformations in different func-tional states. The flexibility of these HEAT repeat motifsfacilitates the accommodation of their binding partners byan induced fit type of mechanism (10). These features dis-tinguish �-karyopherins from transporters that deliver car-goes to other subcellular organelles. �-Karyopherins havebeen found from yeast to humans. In Saccharomyces cer-evisiae, 14 �-karyopherins have been shown to transportcorresponding cargoes in and out of the nucleus (11, 12),whereas in mammalian cells, about 20 �-karyopherins par-ticipate in these events (7). The family includes both importand export receptors.

Nuclear proteins larger than the “diffusion limit” of thenuclear pore complex (�50 kDa) generally have a nuclearlocalization signal for nuclear import, whereas smaller pro-teins that are actively excluded from the nucleus have anuclear export signal (3). The transport of these cargoes ismediated by karyopherins in conjunction with the smallGTPase Ran, which cycles between its GDP-bound (in thecytoplasm) and GTP-bound (in the nucleus) forms. RanGTPconfers directionality upon transport by controlling the con-formation of �-karyopherins and their ability to bind cargo(13, 14). Although most karyopherins are constitutively ex-pressed in cells (6), emerging evidence shows that expres-sion of at least some karyopherin genes is regulated (15, 16)presumably due to varying requirements for transport, e.g.during development (17). For example, expression of Kpnb1and Ranbp5 is regulated during rodent spermatogenesis(18). Moreover the expression of IPO13 in rats and humansis regulated both hormonally and developmentally (16, 19).Although �-karyopherins have common features, the bio-logical importance of their functions in nuclear transport isimpressively varied. A little explored aspect of �-karyo-pherin biology is their evolution. In particular, it is unclearhow the 14 yeast �-karyopherins gave rise to 20 members inhuman. Moreover it is unknown whether the cell type ex-pression of �-karyopherin genes reflects correspondingfunctional requirements.

To address the above questions, we used bioinformaticsapproaches. We also investigated the expression of each�-karyopherin member through the systematic analysis oftranscriptional profiles and identification of putative transcrip-tion factor binding sites.

From the ‡School of Life Sciences and Key Laboratory for CellBiology and Tumor Cell Engineering, the Ministry of Education ofChina, Xiamen University, Xiamen, Fujian 361005, China and ¶De-partment of Pathology and Cell Biology Program, Case WesternReserve University School of Medicine, Cleveland, Ohio 44106

Received, October 22, 2007, and in revised form, February 25, 2008Published, MCP Papers in Press, March 18, 2008, DOI 10.1074/

mcp.M700511-MCP2001 The abbreviations used are: IBN_N, importin-� N-terminal;

BLAST, Basic Local Alignment Search Tool; GNF, Genomics Instituteof the Novartis Research Foundation; SPM, specificity measure;TFBS, transcription factor binding site; IDs, identities; NF-Y, nuclearfactor Y; NRF-2, nuclear respiratory factor 2; RREB-1, Ras-respon-sive element-binding protein 1; HEN-1, helix-loop-helix protein 1;dpc, days postcoitus; HEAT, huntingtin, elongation factor 3, proteinphosphatase 2A, Tor1.

Research

© 2008 by The American Society for Biochemistry and Molecular Biology, Inc.1254 Molecular & Cellular Proteomics 7.7This paper is available on line at http://www.mcponline.org

EXPERIMENTAL PROCEDURES

Identification of Orthologous Proteins—The assignment of ortholo-gous proteins is critical for evolutionary analyses of �-karyopherins.For this purpose, the orthologous proteins were searched and com-pared in eight species: Homo sapiens, Mus musculus, Gallus gallus,Danio rerio, Xenopus tropicalis, Drosophila melanogaster, Caenorh-abditis elegans, and S. cerevisiae. Each assignment of orthologs wasmade on the basis of multiple lines of evidences. 1) As the funda-mental method, the putative orthologs were identified or verified byBLAST searching against assembled genomes, adopting the defaultparameters of the National Center for Biotechnology Information(NCBI) blastp for sequence alignment. 2) Most �-karyopherin or-thologs can be identified by searching the NCBI Clustering of Ortholo-gous Groups (COG) database (20). 3) The “Orthologue Prediction”function of the Ensembl database also helps to identify putativeorthologs of �-karyopherins in whole genomes (21).

Construction of Phylogenetic Trees—The protein sequences andexon information of karyopherin � superfamily members of eightorganisms were derived from either the Ensembl genome databaserelease 46 (21) or GenBankTM. Corresponding protein sequenceswere extracted from GenBank and Ensembl (supplemental Table 1).To illustrate the evolution of �-karyopherins from yeast to human,phylogenetic trees were generated. Multiple sequence alignment onprotein sequences was first demonstrated using MUSCLE3.6 (22),then refined manually with consideration of the secondary structure,and finally visualized by ClustalX 1.83 (23). 14 individual phylogenetictrees (Fig. 1) were then constructed based on the multiple alignmentsusing the neighbor-joining method provided by the software MEGA4(24) under the Poisson correction amino acid substitution model withuniform rates among sites (25, 26). Bootstrap analysis was conductedusing 1,000 replicates to test the robustness of these phylogenetictrees.

Investigation of Transcription Patterns—The gene expression mi-croarray data sets GNF1H_GCRMA and GNF1M_GCRMA used in thisstudy were retrieved from GNF SymAtlas (27), which covers 24,277human genes in 79 tissues and 32,905 mouse genes in 61 tissues.The gene expression patterns of �-karyopherins were analyzed quan-titatively as described in previous work (28). Geometric comparison(similarity measure) was used to indicate how similar two gene ex-pression profiles are. A value of similarity measure close to 1 indicateshigh similarity of gene expression patterns regardless of their expres-sion levels. Such genes may have related biological roles. Specificitymeasure (SPM) was calculated to compare expression in distincttissues (28). Such information is helpful for understanding the phys-iological significance of genes. Three different microarray data sets ofS. cerevisiae (based on treatment with � factor and analysis of cdc15and cdc28 mutants) (29, 30) were adopted to investigate possible cellcycle-dependent expression of �-karyopherins.

Investigation of Regulatory Elements—400 bp of nucleotide se-quences from �300 to �100 bp of the transcript start of �-karyo-pherin genes in eight species were derived from the Ensembl data-base (21) or GenBank for the transcription factor binding sites (TFBSs)analysis. The MAPPER search engine is a program developed to aidthe molecular biologist in determining what eukaryotic transcriptionfactor binding elements may exist in a given DNA sequence (31). Inthis study, the putative TFBSs were identified by screening the up-stream regulatory sequences of �-karyopherin genes. Default scoreand E-value thresholds (0 and 10, respectively) of the MAPPERsearch engine were specified. To reduce the false positives, onlyspecies-specific TFBS models were adopted for predictions, e.g.TFBS models built from human, mouse, or fly data were only appliedto human, mouse, or fly promoter sequences, respectively. Thesepredictions were further combined with information on gene regula-tion to identify possible regulatory patterns.

RESULTS AND DISCUSSION

Evolution of the Karyopherin � Superfamily

The Sequence Alignments—Previous studies have identi-fied 14 �-karyopherins in bakers’ yeast (S. cerevisiae) andabout 20 in human (32). A few studies that have investigated�-karyopherin divergence and multiplication during evolutionwere based on limited data (33, 34). Through ortholog analy-ses, 138 �-karyopherin genes (132 annotated and six novel)were identified in human (20 genes), S. cerevisiae (14 genes),and six other organisms: M. musculus (20 genes), G. gallus(17 genes), X. tropicalis (18 genes), D. rerio (23 genes), D.melanogaster (16 genes), and C. elegans (10 genes) (Table I).The IDs of these sequences in Ensembl or GenBank can befound in the supplemental Table 1.

Multiple sequence alignment of full-length protein se-quences indicated the extensive diversity of �-karyopherinswithin a species (data not shown). Because the full-lengthsequence alignment within a species cannot provide an inclu-sive description of �-karyopherin evolution, local structuralclues were searched to provide complementary information.The IBN_N domain (Pfam PF03810) plays a pivotal role incargo release and thus carries distinctive “structural signa-tures” of the karyopherin � family (33, 35). We then comparedthe IBN_N domains of �-karyopherins within a species. Theconservation of IBN_N domains within a species is compar-atively poor (Fig. 2). This diversity suggests that �-karyo-pherins may not all be derived from one or a small number ofancestors. In other words, it is inappropriate to probe thekinship of �-karyopherins on the basis of multiple sequencealignment of the IBN_N domain or full-length sequence withina species. Exceptions were found in some pairs of �-karyo-pherins that share domain composition, length, and start sites(e.g. IPO7-IPO8, IPO13-TNPO3, RANBP5-RANBP6, TNPO1-TNPO2, and XPO7-RANBP17), suggesting that they areparalogous pairs. Although it is difficult to identify an alto-gether constant IBN_N domain among human �-karyo-pherins, several amino acid residues in this region are rela-tively conserved (Val/Ile/Leu46, Val/Ile/Leu74, and Lys/Arg75)(Fig. 2). These amino acids may be critical to maintain domainstructure and bind RanGTP. The comparison of IBN_N do-mains among species was also undertaken using KPNB1proteins as an example (Fig. 3). By contrast, the IBN_N do-main shows strong conservation among species, especiallyafter D. melanogaster, which agrees well with the results ofmultiple sequence alignment of full-length proteins (supple-mental materials).

The Ancestors of �-Karyopherins—On the basis of the mul-tiple sequence alignments of full-length protein sequences,the �-karyopherins were allocated to 14 separate trees withhigh confidence (with bootstrap values �50%). They areCSE1L, IPO4, IPO7-IPO8, IPO9, IPO11, IPO13-TNPO3,KPNB1, RANBP5-RANBP6, TNPO1-TNPO2, XPO1, XPO4-XPO7-RANBP17, XPO5, XPO6, and XPOT. It is interesting

Evolution and Expression of Karyopherin � Proteins

Molecular & Cellular Proteomics 7.7 1255

FIG. 1. Phylogenetic trees constructed on the full-length protein sequences of karyopherin � superfamily members in eightorganisms using the neighbor-joining methods provided by the software MEGA4 under the Poisson correction amino acid substitutionmodel with uniform rates among sites. Bootstrap values calculated by MEGA4 are beside their nodes. The �-karyopherins are arranged into14 trees as shown. Yeast �-karyopherin genes are marked with solid circles. The names of species are abbreviated as follows: Hs, H. sapiens;Mm, M. musculus; Gg, G. gallus; Xt, X. tropicalis; Dr, D. rerio; Dm, D. melanogaster; Ce, C. elegans; and Sc, S. cerevisiae.

Evolution and Expression of Karyopherin � Proteins

1256 Molecular & Cellular Proteomics 7.7

that these 14 phylogenetic trees do not all correspond toancestors in S. cerevisiae. As summarized by phylogeneticanalyses (Fig. 1) and ortholog analyses (Table I), only 12 yeast�-karyopherin genes have orthologs in multicellular organ-isms judging from multiple sequence alignments and phylo-genetic analysis. The other two genes, KAP122 and SXM1, donot have orthologs. Actually before the diversification of�-karyopherins stabilized, gain and loss of genes was com-mon. Thus, seven �-karyopherin genes of S. cerevisiae(KAP114, KAP120, KAP122, KAP123, MSN5, NMD5, andSXM1) are absent in C. elegans, although two new genes weregained: C35A5.8 (the possible ancestor of human RANBP17and XPO7) and Y69A2AR.16 (ancestor of human XPO4). Gainand loss were also observed in D. melanogaster whereY69A2AR.16 (ancestor of human XPO4) and LOS1 (ancestorof human XPOT) are absent, and a new �-karyopherin gene,

Exp6 (ancestor of human XPO6), emerges. These changespresumably equip distinct species with an ample but minimalrepertoire of carriers to transport critical cargoes.

Sequence analysis of seven other yeasts, Candida glabrata(14 genes), Debaryomyces hansenii (13 genes), Eremotheciumgossypii (14 genes), Kluyveromyces lactis (13 genes), Yarrowialipolytica (12 genes), Schizosaccharomyces pombe (13 genes),and Cryptococcus neoformans (nine genes), and two plants,Arabidopsis thaliana (17 genes) and Oryza sativa (14 genes),showed that the absence of selected �-karyopherin genes is notrare during evolution. The causes of gene loss may be related tounique aspects of physiology, behavior, and development. Asno new types of �-karyopherin genes were detected in D. rerio(although there are duplicates of some known �-karyopheringenes), the diversification and amplification of �-karyopheringenes became stable no later than D. rerio. However, this does

TABLE ISummary of �-karyopherin genes of eight organisms, including H. sapiens, M. musculus, G. gallus, X. tropicalis, D. rerio, D. melanogaster,

C. elegans, and S. cerevisiae

Orthologous gene groups are separated by shading. Note that IPO7 and IPO8 may come from the same common ancestor gene, NMD5,in S. cerevisiae; IPO13 and TNPO3 may come from the same common ancestor gene, MTR10, in S. cerevisiae; RANBP6 and RANBP5 maycome from the same common ancestor gene, PSE1, in S. cerevisiae; TNPO1 and TNPO2 may come from the same common ancestor gene,KAP104, in S. cerevisiae; and XPO7 and RANBP17 may come from the same common ancestor gene, C35A5.8, in C. elegans. Slashesrepresent no homologous genes found.

Evolution and Expression of Karyopherin � Proteins

Molecular & Cellular Proteomics 7.7 1257

not mean that no further gain or loss of genes occurred. Forexample, orthologs of IPO4 and TNPO2 are absent in G. gallus.

It is still a mystery why no very rudimentary set of “ur-karyopherins” has been detected. However, simplified sets ofkaryopherins have been found in some protozoa, e.g. Giardialamblia and in C. elegans. G. lamblia has only four �-karyo-pherin genes (36). It is not possible to estimate how few�-karyopherins might be sufficient for survival because

knowledge of their transport specificity is incomplete and be-cause there is redundancy: some cargoes can be recognizedand transported by more than one �-karyopherin, and many�-karyopherins recognize multiple cargoes. Nevertheless it isreasonable to postulate that the presence or absence of some�-karyopherins is critical for survival for all eukaryotes.

To search for possible ancestors in prokaryotes, we BLASTsearched both the RNA and protein sequences of all yeast

FIG. 2. Comparison of the IBN_N domain (Pfam PF03810) in human �-karyopherins. Multiple sequence alignment of the IBN_N domainresidues was performed using MUSCLE3.6 and refined manually on the basis of a reference alignment (importin-� N-terminal domain profile,PROSITE PS50166) and the secondary structure of human KPNB1 (Protein Data Bank 1QGR) (79). The numbers under “Start site” are positionsof the first residues of IBN_N domain in each protein; numbers under “Length” are the number of residues in the IBN_N domain in each protein.A conservation estimate for each position in the alignment is plotted under the alignment. Highly conserved positions in the alignment will geta high score (the peaks), whereas low conservation or exceptional residues at a partially conserved position will lower the score (the valleys).The “:” character indicates that one of the “strong” groups of amino acids is fully conserved, whereas “.” indicates that one of the “weaker”groups of amino acids is conserved as described in ClustalX. Consensus secondary structure predicted using the software JPred (80) is givenabove the alignment. Note the conservation of the IBN_N domain in �-karyopherin pairs IPO7-IPO8, IPO13-TNPO3, RANBP5-RANBP6,TNPO1-TNPO2, and XPO7-RANBP17. Sequences were shaded based on the alignment consensas, which was calculated automatically by theClustalX program.

FIG. 3. Comparison of the IBN_N domain (Pfam PF03810) in KPNB1 orthologs. Hs, H. sapiens; Mm, M. musculus; Gg, G. gallus; Xt, X.tropicalis; Dr, D. rerio; Dm, D. melanogaster; Ce, C. elegans; Sc, S. cerevisiae. The sequences were aligned using MUSCLE3.6, and thedefinitions of the symbols are described in Fig. 2. “*” indicates positions that are fully conserved. RanGTP binding sites for yeast Kap95p asdescribed previously (81) are labeled with arrows. Note the conservation of the IBN_N domain through different organisms especially afterD. rerio.

Evolution and Expression of Karyopherin � Proteins

1258 Molecular & Cellular Proteomics 7.7

FIG. 4. The exon arrangements of 20 human �-karyopherin genes. The transcript IDs in Ensembl are ENST00000261412,ENST00000356861, ENST00000379719, ENST00000256079, ENST00000252512, ENST00000377354, ENST00000265388,ENST00000372343, ENST00000290158, ENST00000354464, ENST00000361565, ENST00000389352, ENST00000389538,ENST00000255305, ENST00000265351, ENST00000304658, ENST00000389161, ENST00000262982, ENST00000357602, andENST00000259569, respectively. The shaded blocks are translated regions, whereas the transparent blocks are non-translated regions.The numbers under the blocks are the length of the exon (in bp), and the numbers in parenthesis indicate the length of translated regionswhen an exon contains a non-translated region. Note the similarity between TNPO1-TPNP2, IPO7-IPO8, and XPO7-RANBP17. RANBP6has only one exon and is thought to originate from RANBP5 through retroposition.

Evolution and Expression of Karyopherin � Proteins

Molecular & Cellular Proteomics 7.7 1259

�-karyopherins against 327 bacterial and 27 archaebacterialgenomes. Mild similarity (sequence identity, 26%) was foundfor Sxm1p to an �143-amino acid fragment (fragment ofsensory transduction histidine kinase in Clostridium acetobu-tylicum). Other �-karyopherins also show mild similarity (nomore than 30% sequence identity) to some short prokaryoticsequences, many of which are fragments of uncharacterizedhypothetical proteins. These sequences are remotely homol-ogous to HEAT repeats.

Gene Duplications—The expansion of the karyopherin �

family is accompanied by repeated gene amplification. Se-quence alignment shows that there are five pairs of human�-karyopherins (10 proteins) that closely resemble each other:IPO7-IPO8, IPO13-TNPO3, RANBP5-RANBP6, TNPO1-TNPO2, and XPO7-RANBP17. Because each pair of �-karyo-pherins may be derived from a common ancestor, exon anal-ysis within species and among species was undertaken (Figs.4 and 5 and the supplemental materials). We found that these�-karyopherin genes are well conserved over species judgingfrom both the length and nucleotide acid composition ofeach exon. Such conservation can be traced back to theearliest common ancestor, the PSE1 gene of S. cerevisiae.Comparison of exon arrangements among �-karyopheringenes indicates that for IPO7-IPO8, TNPO1-TNPO2, andXPO7-RANBP17 both members of each pair have almostthe same exon number, length, and order (Fig. 4). Thisobservation strongly supports the hypothesis that these�-karyopherin pairs are derived from the same ancestors bygene duplication.

Pairs of duplicated �-karyopherin genes were also found inD. melanogaster (CG32164-CG32165 and Trn-CG8219) andin D. rerio (ipo8a-ipo8B, tnpo2a-tnpo2b, xpo1a-xpo1b,xpo4a-xpo4b, and xpo7a-xpo7b) (Table I). These gene pairsexhibit high identity of protein sequence: 99.9% for ipo8a-ipo8b, 97.7% for CG32164-CG32165, 94.8% for xpo1a-

xpo1b, 91.9% for tnpo2a-tnpo2b, 84.1% for xpo7a-xpo7b,68.6% for Trn-CG8219, and 63.9% for xpo4a-xpo4b. Therenevertheless is no evidence that the duplications persist inhigher organisms, and some duplicated �-karyopherin genepairs may diverge to the point that they acquire specializedneofunctions (37). Thus gene duplication in D. melanogasterand D. rerio could in fact contribute to speciation (37).

Retroposition—As discussed previously, gene duplicationis one of several mechanisms that could have made it possi-ble for the karyopherin � family to expand. RANBP5 andRANBP6 are very closely related judging from sequencealignment and ortholog analysis. To draw the evolutionarypath of RANBP5 and RANBP6, their exon arrangement wasstudied. Fig. 5 shows, remarkably, that the exon numbergrows from one exon for PSE1, the putative ancestor ofhuman RANBP5 in S. cerevisiae, to 26 exons in D. rerio andfinally to 29 exons in H. sapiens. The multiplication of exonspresumably reflects the high phenotypic complexity of mam-mals (38). The enormous increase of the untranslated regions(0 bp in S. cerevisiae to about 2,709 bp in human) is likely tocontribute to regulation of gene transcription (39, 40). How-ever, unlike RANBP5 and other �-karyopherins, the RANBP6gene has only one exon in both human and mouse (Fig. 5).Further BLAST searches against the Ensembl genome data-base found that RANBP6 orthologs exist only in mammals.We therefore suggest that the emergence of RANBP6 mayresult from the retroposition of RANBP5 gene, which leads toproduction of a new functional pseudogene with only a singleexon (37). Judging from known RANBP6 orthologs, the ret-roposition of RANBP6 from ancient RANBP5 may have hap-pened before the divergence of mammals and birds. Thisinference is supported by the previous evidence that RANBP6is a functional retrocopy of RANBP5 (41). Additional orthologanalysis found that some RANBP6 orthologs contain severalexons, e.g. Canis familiaris (three exons), Oryctolagus cunic-

FIG. 5. The exon arrangements of RANBP5 and RANBP6 in different organisms, including H. sapiens (Hs), M. musculus (Mm), G.gallus (Gg), X. tropicalis (Xt), D. rerio (Dr), D. melanogaster (Dm), C. elegans (Ce), and S. cerevisiae (Sc). The transcript IDs of these�-karyopherin genes in Ensembl or GenBank are ENST00000357602, ENSMUST00000032898, XM_416978.2, ENSXETT00000050713,XM_692846.2, CG1059-RA, and C53D5.6 YMR308C for RANBP5 and ENST00000361966 and ENSMUST00000046742 for RANBP6. Theshaded blocks are translated regions, whereas the transparent blocks are non-translated regions. The numbers under the blocks are the lengthof exons, and numbers in parentheses are the length of translated regions when an exon contains a non-translated region. Note theconservation of exon arrangements of RANBP5 after D. rerio and the single exon RANBP6 in human and mouse after retroposition.

Evolution and Expression of Karyopherin � Proteins

1260 Molecular & Cellular Proteomics 7.7

ulus (six exons), and Sorex araneus (12 exons). Because noortholog was found other than in mammals, it is reasonable toconclude that the retroduplication event happened in thespeciation of mammals. After the retroposition, the RANBP6is further diversified in mammals: introns are species-specif-ically inserted into the only exon. This may explain the multipleexons of RANBP6 orthologs. Similar phenomena have beenreported previously (42, 43).

In conclusion, karyopherin � superfamily proteins are notfound in prokaryotes. �-Karyopherin evolution is suggested toproceed along 14 lines from yeast, although two genes(KAP122 and SXM1) are lost as summarized in Table I. Someyeast �-karyopherins (KAP95, KAP114, KAP120, KAP123,CSE1, MSN5, CRM1, and LOS1) can be directly linked to theirorthologs in man (KPNB1, IPO9, IPO11, IPO4, CSE1L, XPO5,XPO1, and XPOT), respectively. Other �-karyopherins (PSE1,NMD5, MTR10, and KAP104) underwent diversification duringtheir evolution.

The selective pressure for efficient communication betweenthe nucleus and the cytoplasm likely prompts the multiplica-tion and diversification of karyopherins to such an extent thatonly weak sequence homology is often observed between�-karyopherins in a species. Considering the evolutionary his-tory and functional importance of �-karyopherins, reclassifica-tion of the karyopherin � superfamily may become necessary.Classification should consider not only their overall structuralcharacteristics but their entire sequence, their cargo specificity,and their interactions with both Ran and nucleoporins.

Expressions of the Karyopherin � Superfamily in Cellsand Tissues

The functional importance of �-karyopherins determinestheir expression. Motivated by this premise, we thereforeexamined the gene expression patterns of �-karyopherinsby statistically analyzing gene expression microarray data,classifying them as comparatively high expression (SPM �

4), above average (SPM � 2), and below average (SPM � 2).�-Karyopherin genes are thought to be “housekeeping”

genes, and Table II shows that they are ubiquitously ex-pressed. Nevertheless they are not evenly expressed in allhuman tissues. 12 of 20 human �-karyopherin genes (CSE1L,IPO4, IPO7, IPO9, KPNB1, RANBP5, TNPO1, TNPO3, XPO1,XPO4, XPO5, and XPOT) show comparatively high expressionin tissues that proliferate actively, e.g. lymphocytes, tumors,testis, and stem cells. The high expression of these genesmay be due to their ability to carry cargoes needed for cellproliferation such as histones, glucocorticoid receptors, andRNAs (7). For example, CSE1L, which transports importin-�

(44), is overexpressed in colon cancer cell lines, breast can-cers, and liver neoplasms (45), and excess CSE1L may reduceimportin-�/�-dependent import. Higher expression of CSE1Lin tumors may reflect deregulation of nuclear transport.Among the overexpressed genes, KPNB1, RANBP5, CSE1L,

XPO1, XPO5, and TNPO3 are more highly expressed in lym-phoblasts than in other proliferating tissues. A second groupof karyopherin genes (IPO11, IPO13, RANBP6, TNPO2, andXPO6) is expressed more strongly in brain and spinal cordthan in other tissues. Different expression patterns of �-karyo-pherin genes among tissues were also documented in mousemicroarray data; however, the patterns are different fromthose in man. Tissue expression screening of �-karyopheringenes in the mouse revealed the restricted expression of Ipo8in both oocytes and fertilized eggs, whereas, curiously, suchrestriction was not observed in man. Mouse Ranbp17 mRNAis abundantly expressed in testis and pancreas (46). Thisfinding is supported by our ongoing analysis of microarraydata. We also observed that Ranbp17 has high expression inbrown fat and skeletal muscle.

Developmental Regulation of Karyopherin � SuperfamilyGene Expression—During the cell cycle and during develop-ment, there is a continuous need for precise transport of largemolecules in and out of the nucleus. Regulation of expressionof �-karyopherins could be important for these processes (16,19). We therefore examined the gene expression patterns of�-karyopherins during mouse development (27) by analyzinggene microarray data (Fig. 6). Most �-karyopherin genes in-crease their expression gradually during development, reachtheir peak expression by day 6.5–9.5, and then drop by day10.5. Through the similarity measure of gene expression pro-files in the early embryonic period (from fertilized egg to day10.5), we grouped 19 mouse �-karyopherin genes (Xpo6 dataare not available) into six expression patterns. The pattern thatwas found most frequently is characteristic of 10 genes(Cse1l, Ipo4, Ipo7, Ipo11, Ipo13, Ranbp5, Tnpo2, Tnpo3,Xpo1, and Xpot) (Fig. 6a) whose expression increases signif-icantly from fertilized egg to blastocysts and then fluctuatesslightly until day 10.5. Both Ranbp6-Xpo5 and Tnpo1-Xpo4pairs show expression peaks; however, the former pair is atday 6.5 and day 9.5 (Fig. 6b), and the latter pair is primarily atday 9.5 (Fig. 6c). Ipo9 and Ranbp17 are nearly constant (Fig.6d), whereas Xpo7, Kpnb1, and Ipo8 decrease in blastocysts,and all but Ipo8 then rebound (Fig. 6e). Expression of Xpo7and Kpnb1 rebounds on day 6.5 and remains high until day10.5 (Fig. 6f). In conclusion, �-karyopherin transcript levels dovary during development. Differences in their cargo specificitymay explain these changes.

Cell Cycle-dependent Transcriptional Regulation of Karyo-pherin � Superfamily Genes—As nuclear import of macromol-ecules can be regulated during the cell cycle due to alter-ations in the nuclear pore complex (47), it is of interest to askwhether expression of �-karyopherin genes is regulated in acell cycle-dependent manner. Different approaches havebeen adopted to identify, correct, and complete microarraydata sets of S. cerevisiae synchronized with mating factor orin cell cycle mutants. KAP95 shows the most obviouschanges during cell cycle progression (29, 30). In addition, aperiodic least square regression suggests cell cycle regulation

Evolution and Expression of Karyopherin � Proteins

Molecular & Cellular Proteomics 7.7 1261

TAB

LEII

Sum

mm

ary

ofge

neex

pre

ssio

nof

�-k

aryo

phe

rins

inad

ult

hum

antis

sues

and

dur

ing

mou

seem

bry

onic

dev

elop

men

t(fr

omfe

rtili

zed

egg

tod

ays

10.5

)

The

exp

ress

ion

leve

lsar

esy

mb

oliz

edb

yth

enu

mb

erof

“�”

sym

bol

sac

cord

ing

toth

elo

g 2va

lues

ofth

eir

aver

age

diff

eren

ceva

lue.

Mor

e�

sym

bol

sin

dic

ate

ahi

gher

exp

ress

ion

leve

l(27

).—

den

otes

that

the

exp

ress

ion

dat

aar

eno

tav

aila

ble

.B

M,

bon

em

arro

w.

Gen

es

Hum

and

ata

Mou

sed

ata

Hig

hex

pre

ssio

nA

bov

eav

erag

eB

elow

aver

age

Fert

ilize

deg

gB

last

ocys

ts6.

5d

pc

7.5

dp

c8.

5d

pc

9.5

dp

c10

.5d

pc

KP

NB

172

1_B

_lym

pho

bla

sts

Lym

pho

cyte

s,B

M-s

tem

cells

,ca

ncer

ous

cells

,sm

ooth

mus

cle

Neu

rons

,in

tern

alor

gans

,ov

ary,

etc.

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

IPO

7C

ance

rce

lls,

BM

-CD

34�

cells

,th

yroi

dIn

tern

alor

gans

,te

stis

,ne

uron

s,et

c.�

��

��

��

��

��

��

��

��

��

��

��

��

��

IPO

9Ly

mp

hocy

tes,

pre

fron

talc

orte

x,B

M-s

tem

cells

Neu

rons

,in

tern

alor

gans

,te

stis

,m

uscl

e,p

ancr

eas,

etc.

��

��

��

RA

NB

P5

721_

B_l

ymp

hob

last

sTe

stis

,B

M-s

tem

cells

,ca

ncer

cells

,b

ronc

hial

epith

elia

lcel

ls,

smoo

thm

uscl

e

Neu

ron,

inte

rnal

orga

n,ut

erus

,et

c.�

��

��

��

��

��

��

��

��

��

��

��

��

��

TNP

O1

BM

-34�

,ly

mp

hocy

tes,

smoo

thm

uscl

e,fe

talt

hyro

id,

pitu

itary

Bra

in,

neur

on,

inte

rnal

orga

ns,

skin

,co

lore

ctal

aden

ocar

cino

ma,

etc.

��

��

��

��

CS

E1L

721_

B_l

ymp

hob

last

sTe

stis

,ca

ncer

cells

,b

ronc

hial

epith

elia

lcel

ls,

BM

-CD

105�

end

othe

lial

Bra

in,

neur

ons,

inte

rnal

orga

ns,

glan

ds,

etc.

��

��

��

��

��

��

��

��

��

��

��

��

��

XP

O1

Lym

pho

bla

sts,

BM

-CD

34�

Lym

pho

cyte

s,st

emce

lls,

canc

erce

lls,

pre

fron

talc

orte

xB

rain

,ne

uron

s,in

tern

alor

gans

,te

stis

,ov

ary,

pan

crea

s

��

��

��

��

��

��

��

��

��

��

��

��

��

XP

O4

BM

-CD

71�

early

eryt

hroi

d,

pan

crea

s,te

stis

,ly

mp

hocy

tes,

lym

pho

ma

test

is,

canc

erce

lls

The

othe

rtis

sues

��

��

��

��

��

��

��

XP

O5

BM

-CD

34�

,72

1_B

_lym

pho

bla

sts

Neu

ron,

inte

rnal

orga

ns,

mus

cles

,gl

and

s,ov

ary

��

��

��

��

��

��

XP

O6

Blo

odLy

mp

hocy

tes,

stem

cells

The

othe

rtis

sues

——

——

——

—X

PO

TC

ance

rce

llsB

M-C

D34

�,

lym

pho

bla

sts,

bro

nchi

alep

ithel

ialc

ells

Bra

in,

neur

on,

inte

rnal

orga

ns,

test

is,

ovar

y�

��

��

��

��

��

��

��

��

��

��

��

TNP

O3

721_

B_l

ymp

hob

last

san

dot

her

lym

pho

cyte

s

Leuk

emia

,sm

ooth

mus

cle,

BM

-ste

mce

lls,

test

isTh

eot

her

tissu

es�

��

��

��

��

��

��

��

��

��

��

��

��

��

��

IPO

4Te

stis

,ly

mp

hob

last

s,ca

ncer

The

othe

rtis

sues

��

��

��

��

��

��

��

��

�IP

O11

Trig

emin

alga

nglio

n,te

stis

,B

M-s

tem

cells

Lym

pho

cyte

san

dly

mp

hom

a�

��

��

��

��

��

��

��

��

��

IPO

13B

rain

,sp

inal

cord

,te

stis

,he

art

Ste

mce

lls,

lym

pho

cyte

s,ut

erus

��

��

��

��

��

��

��

��

��

RA

NB

P6

Pre

fron

talc

orte

xB

rain

,ly

mp

hocy

tes,

BM

-ste

mce

llsC

ance

r,te

stis

,tis

sues

,et

c.�

��

��

��

��

��

��

TNP

O2

Pre

fron

talc

orte

xW

hole

bra

inG

angl

ia,

lym

pho

cyte

s,m

uscl

es�

��

��

��

��

��

��

��

��

��

XP

O7

BM

-CD

71�

early

eryt

hroi

dN

oex

pre

ssio

nin

the

othe

rtis

sues

��

��

��

��

��

��

��

��

��

��

IPO

8N

oob

viou

sex

pre

ssio

nin

allt

issu

es�

��

��

��

��

��

��

��

��

��

��

��

RA

NB

P17

No

obvi

ous

exp

ress

ion

inal

ltis

sues

��

��

��

Evolution and Expression of Karyopherin � Proteins

1262 Molecular & Cellular Proteomics 7.7

of KAP123 and NMD5 (48), and an approach based on cubicsplines suggests modulation of MTR10 (49). Bayesian mod-eling techniques also indicate that KAP114 and KAP95 aremodulated (50). Thus, although it would be useful to haveadditional data sets, cell cycle-dependent regulation of�-karyopherin expression could impact nuclear import andexport during the cell cycle.

Mechanism of Regulation of Karyopherin � SuperfamilyGenes—DNA regulatory elements are crucial for understand-ing gene expression because the binding of correspondingfactors determines the timing, location, and level of geneexpression. Computational methods have been developed toidentify and localize regulatory elements in a high throughputmanner (51, 52). Moreover similar clusters of TFBSs can befound in the promoter regions of orthologous genes (53). Acomputational tool, MAPPER (31), was therefore adopted toidentify putative TFBSs of human �-karyopherin genes (Fig. 7)and their orthologs in eight model species (supplemental dataand supplemental Table 2).

We observed that the predicted promoter binding sites of�-karyopherin genes lack TATA boxes but are rich in GCcontent (Fig. 7 and supplemental materials). Although theTATA box is often required for transcription by nuclear RNApolymerases, several in vivo studies show that the TATA boxis more specifically important for cellular proliferation, trans-

formation, and control of the cell cycle (54, 55) than for reg-ulation of housekeeping genes (56, 57). Mammalian promot-ers lacking a TATA box often contain SP1 (GGGCGG)/NF-1(TGGNNNNNNGCCAA) binding sites and rely on them toinitiate gene expression (58–60). SP1 is ubiquitously ex-pressed and plays a key role in maintaining basal transcriptionof housekeeping genes (61). SP1 is also implicated in medi-ating development-specific gene expression (62). It is pre-dicted that the SP1 binding site is one of the most commonTFBSs for �-karyopherin genes: 13 of 20 human �-karyo-pherin genes (CSE1L, IPO8, IPO9, IPO13, KPNB1, RANBP5,RANBP6, RANBP17, TNPO1, TNPO2, TNPO3, XPO1, andXPOT) are predicted to have one or multiple SP1 binding sitesin their proximal regulatory regions (Table III and Fig. 7). TheCAAT box is another frequent candidate binding site for�-karyopherin genes: seven of 20 human �-karyopheringenes (CSE1L, IPO9, KPNB1, RANBP5, RANBP6, TNPO2,and XPO1) and 10 of 20 mouse �-karyopherin genes (Cse1l,Ipo9, Ipo11, Kpnb1, Ranbp5, Tnpo1, Tnpo2, Xpo1, Xpo4, andXpo6) were predicted to posses this CAAT box in their up-stream regulatory regions. The CAAT box is also ubiquitouslydistributed, being present in about 30% of eukaryotic promot-ers (63, 64). In higher eukaryotes, it is involved in many typesof promoters: developmentally controlled (65), cell-cycle reg-ulated (66), and housekeeping (67). It is interesting that some-

FIG. 6. Gene expression patterns of �-karyopherins in mouse early development. The vertical axis is the log2 ratio of gene expressionlevels; the horizontal axis is the developmental period abbreviated as follows (days postcoitus (dpc)): T1, fertilized egg; T2, blastocysts; T3, dpc6.5; T4, dpc 7.5; T5, dpc 8.5; T6, dpc 9.5; and T7, dpc 10.5. The gene expression of mouse �-karyopherin genes was grouped into six patterns:a, Cse1l, Ipo4, Ipo7, Ipo11, Ipo13, Ranbp5, Tnpo2, Tnpo3, Xpo1, and Xpot; b, Ranbp6 and Xpo5; c, Tnpo1 and Xpo4; d, Ipo9 and Ranbp17;e, Kpnb1 and Xpo7; and f, Ipo8.

Evolution and Expression of Karyopherin � Proteins

Molecular & Cellular Proteomics 7.7 1263

FIG. 7. Transcription factor bindingsites of human �-karyopherin genespredicted by the MAPPER search en-gine. The regulatory analyses of the 20�-karyopherin genes are illustrated lineby line. The numbers under the lines arethe positions (in bp) from the start ofthe transcript (position �1). The TFBSsare indicated above or beneath thelines. Five mostly potential TFBSs (SP1,NRF-2, Hen-1, RREB-1, and CAAT box)are highlighted by symbols. USF, up-stream stimulatory factor; PPAR, per-oxisome proliferator-activated recep-tor; COUP-TF, chicken ovalbuminupstream promoter-transcription fac-tor; CREB, cAMP-response element-binding protein; RXR, retinoid X recep-tor; VDR, vitamin D receptor; ER,estrogen receptor; NF, nuclear factor;SRY, sex-determining region Y-chro-mosome protein; FTF, �-1-fetoproteintranscription factor; HLF, hepatic leu-kemia factor; ATF, activating transcrip-tion factor; CDP, CCAAT displacementprotein; NF-GMa, nuclear factor forgranulocyte/macrophage colony-stim-ulating factor; HLTF, helicase-like tran-scription factor.

Evolution and Expression of Karyopherin � Proteins

1264 Molecular & Cellular Proteomics 7.7

FIG. 7—continued

Evolution and Expression of Karyopherin � Proteins

Molecular & Cellular Proteomics 7.7 1265

times SP1 can interact with nuclear factor Y (NF-Y), alsoknown as CAAT box DNA-binding protein, to regulate down-stream genes cooperatively (68, 69). Nuclear respiratory fac-tor 2 (NRF-2), known as GA-binding protein, was also re-ported to cooperate with SP1 in the activation of severalwidely expressed housekeeping genes and genes that controlcell cycle, differentiation, development, and other key cellularfunctions (70). The TFBS for NRF-2 is also found frequently(12 of 20 genes) in promoters of human �-karyopherin genes(IPO4, IPO7, IPO8, IPO9, IPO11, IPO13, RANBP17, TNPO1,TNPO3, XPO5, XPO6, and XPOT). There are some TFBSs,e.g. the binding sites for Ras-responsive element-binding pro-tein 1 (RREB-1), helix-loop-helix protein 1 (HEN-1), aryl hy-drocarbon receptor nuclear transporter, and B-cell-specificactivator protein, that are also predicted to be potentialTFBSs of human or mouse �-karyopherin genes. These tran-scription factors are known to participate in the regulation ofdevelopment, differentiation, cell cycle, and other karyo-pherin-related cellular processes (38, 71, 72). Moreover sev-eral further TFBSs are predicted to be frequently involved inthe regulation of �-karyopherin genes of different experimen-tal models (Table III), e.g. five of 18 X. tropicalis �-karyopheringenes have binding sites for Staf, 12 of 16 D. melanogaster�-karyopherin genes have binding sites for Broad-complexisoform 4, three of 10 C. elegans �-karyopherin genes havebinding sites for UNC-86, and three of 14 S. cerevisiae�-karyopherin genes have binding sites for Ste12p.

Despite our conclusion that selected TFBSs are sharedamong �-karyopherin genes within a species (Fig. 8 and sup-plemental materials), many of the putative TFBSs seem to be

species-specific. For example, putative binding sites of SP1,NRF-2, HEN-1, RREB-1, and NF-Y emerge frequently in up-stream regions of �-karyopherin genes in human and mouse,whereas Broad-complex serial binding sites are preferred inthe fly (Table III and supplemental materials). These resultssuggest that functional regulatory factors change quickly dur-ing evolution and that the gain and loss of functional TFBSsare frequent during evolution (73). These and related conceptswill be refined as the �-karyopherins and �-karyopherin genesof additional species are characterized.

Conclusions

We analyzed karyopherin � superfamily members with re-gard to their evolution and expression. Although bacteria doencode HEAT repeat proteins, such as CpcE and CpcF (36),the karyopherin � family itself appears to be absent fromprokaryotes. Interestingly yeast Ran proteins are extremelysimilar to mammalian Ran or “Ran proteins of other species.”(74). Moreover most nucleoporins have homologs in all extanteukaryotic lineages, and the existence of distant prokaryotichomologs of several nucleoporins has been proposed (75). Bycontrast, there is more variability among the �-karyopherins.

Given the incomplete evolutionary record, we found theseemingly sudden appearance of 14 �-karyopherin membersin S. cerevisiae. Yet these 14 yeast �-karyopherins are notuniformly represented in higher organisms. Moreover they arenot all required for eukaryotic cells considering that C. elegansand some protozoa (e.g. G. lamblia) lack many of them. Fromyeast to man, eight �-karyopherin genes appear to haveevolved directly without diversification, but KAP104, MTR10,NMD5, and PSE1 diverged by gene duplication or retroposi-tion. Although the number of �-karyopherins increases from14 in yeast to 20 in human, only 10 �-karyopherins have beenidentified so far in C. elegans although the size of the genomeof C. elegans (97 Mbp) is much larger than that of S. cerevisiae(12 Mbp). It will be of interest to know how higher organismsthat have lost �-karyopherins still maintain their normal phys-iological functions.

Expression analysis of �-karyopherins showed that theirtranscripts vary with cell differentiation and proliferation.Moreover in yeast, the titer of some �-karyopherin transcriptsappears to be regulated during the cell cycle, and in mice,titers vary during development, suggesting changing roles ofnucleocytoplasmic transport during cell differentiation. Virtualanalysis of promoter binding elements showed that the com-binations of transcription factors SP1, NRF-2, HEN-1,RREB-1, and NF-Y regulate expression of most �-karyopheringenes.

We thus used bioinformatics techniques to give a system-atic overview of �-karyopherins: their sequence and structuralfeatures, their evolution, their transcriptional patterns, andtheir gene regulatory motifs. Future studies will need to ad-dress how a full set of �-karyopherins first appeared and

TABLE IIIThe top putative TFBSs of �-karyopherin genes in eight organisms: H.sapiens (Hs), M. musculus (Mm), G. gallus (Gg), X. tropicalis (Xt), D.rerio (Dr), D. melanogaster (Dm), C. elegans (Ce), and S. cerevisiae

(Sc)

The number following the abbreviated taxa indicates the number of�-karyopherin genes in the designated organism, whereas the num-ber after a TFBS indicates the number of �-karyopherin genes thatwere predicted by MAPPER search engine to be regulated by theTFBS. ARNT, aryl hydrocarbon receptor nuclear transporter; Bsap,B-cell-specific activator protein.

Species TFBSs

Hs (20) SP1 (13), NRF-2 (12), HEN-1 (10),RREB-1 (9), CAAT box (7)

Mm (20) CAAT box (10), ARNT (9), Bsap (9)Gg (17) NF-�B (2)Xt (18) Staf (5)Dr (23) HNF-1 (11)Dm (16) Broad-complex_1 (12),

Broad-complex_2 (7),Broad-complex_3 (7),Broad-complex_4 (13),Hunchback (6)

Ce (10) UNC-86 (3)Sc (14) STE12 (3)

Evolution and Expression of Karyopherin � Proteins

1266 Molecular & Cellular Proteomics 7.7

FIG. 8. Transcription factor binding sites of XPO1 orthologous genes predicted by the MAPPER search engine. The regulatoryanalyses of XPO1 over species are illustrated line by line: Hs, H. sapiens; Mm, M. musculus; Gg, G. gallus; Xt, X. tropicalis; Dr, D. rerio; Dm,D. melanogaster; Ce, C. elegans; and Sc, S. cerevisiae. The numbers under the lines are the positions (in bp) from the start of the transcript(position �1). The TFBSs are indicated above or beneath the lines. TF, transcription factor; PPAR, peroxisome proliferator-activated receptor;NF, nuclear factor; Bsap, B-cell-specific activator protein. NF-GMa, nuclear factor for granulocytes/macrophage colony-stimulating factor;DREF, DNA replication-related element binding factor; Su(H), suppressor of hairless protien; SU_h, suppressor of hairless protein.

Evolution and Expression of Karyopherin � Proteins

Molecular & Cellular Proteomics 7.7 1267

diversified (from nine to 14 members) in yeast, whether thereare additional functional �-karyopherins that have not beenfound, and how the �-karyopherins have co-evolved alongwith nucleoporins, components of the Ran GTPase cycle, andcargo diversity (70, 76–78).

* This work was supported by Grant 2006AA02A310 from the Min-istry of Science and Technology, China; Grants 3047085, 20423002,and 90608007 from the National Natural Science Foundation of Chi-na; Grant C0510003 from the Natural Science Foundation of FujianProvince; Grant 2005-383 from the Ministry of Education of China;and Intramural Fund XK0014 from Xiamen University (to T. T.) and byGrant 30400573 from the National Natural Science Foundation ofChina and a grant from the Program for New Century ExcellentTalents of Ministry of Education of China (to Z.-L. J.). The costs ofpublication of this article were defrayed in part by the payment of pagecharges. This article must therefore be hereby marked “advertisement”in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

□S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material.

§ To whom correspondence may be addressed: School of LifeSciences, Xiamen University, Xiamen City, Fujian 361005, China.Tel./Fax: 86-592-2182880; E-mail: [email protected] (Tao Tao) [email protected] (Zhiliang Ji).

REFERENCES

1. Bednenko, J., Cingolani, G., and Gerace, L. (2003) Importin � contains aCOOH-terminal nucleoporin binding region important for nuclear trans-port. J. Cell Biol. 162, 391–401

2. Chook, Y. M., and Blobel, G. (2001) Karyopherins and nuclear import. Curr.Opin. Struct. Biol. 11, 703–715

3. Gorlich, D., and Kutay, U. (1999) Transport between the cell nucleus andthe cytoplasm. Annu. Rev. Cell Dev. Biol. 15, 607–660

4. Macara, I. G. (2001) Transport into and out of the nucleus. Microbiol. Mol.Biol. Rev. 65, 570–594

5. Mattaj, I. W., and Englmeier, L. (1998) Nucleocytoplasmic transport: thesoluble phase. Annu. Rev. Biochem. 67, 265–306

6. Nakielny, S., and Dreyfuss, G. (1999) Transport of proteins and RNAs in andout of the nucleus. Cell 99, 677–690

7. Pemberton, L. F., and Paschal, B. M. (2005) Mechanisms of receptor-mediated nuclear import and nuclear export. Traffic 6, 187–198

8. Pemberton, L. F., and Paschal, B. M. (2006) Scientists share nuclearsecrets at Jekyll Island. Traffic 7, 751–760

9. Poon, I. K., and Jans, D. A. (2005) Regulation of nuclear transport: centralrole in development and transformation? Traffic 6, 173–186

10. Conti, E., Muller, C. W., and Stewart, M. (2006) Karyopherin flexibility innucleocytoplasmic transport. Curr. Opin. Struct. Biol. 16, 237–244

11. Hodges, J. L., Leslie, J. H., Mosammaparast, N., Guo, Y., Shabanowitz, J.,Hunt, D. F., and Pemberton, L. F. (2005) Nuclear import of TFIIB ismediated by Kap114p, a karyopherin with multiple cargo-binding do-mains. Mol. Biol. Cell 16, 3200–3210

12. Mosammaparast, N., and Pemberton, L. F. (2004) Karyopherins: from nu-clear-transport mediators to nuclear-function regulators. Trends CellBiol. 14, 547–556

13. Kuersten, S., Ohno, M., and Mattaj, I. W. (2001) Nucleocytoplasmic trans-port: Ran, � and beyond. Trends Cell Biol. 11, 497–503

14. Macara, I. G. (1999) Nuclear transport: randy couples. Curr. Biol. 9,R436–R439

15. Hogarth, C. A., Calanni, S., Jans, D. A., and Loveland, K. L. (2006) Importin� mRNAs have distinct expression profiles during spermatogenesis. Dev.Dyn. 235, 253–262

16. Zhang, C., Sweezey, N. B., Gagnon, S., Muskat, B., Koehler, D., Post, M.,and Kaplan, F. (2000) A novel karyopherin-� homolog is developmentallyand hormonally regulated in fetal lung. Am. J. Respir. Cell Mol. Biol. 22,451–459

17. Vrailas, A. D., Marenda, D. R., Cook, S. E., Powers, M. A., Lorenzen, J. A.,Perkins, L. A., and Moses, K. (2006) smoothened and thickveins regulate

Moleskin/Importin 7-mediated MAP kinase signaling in the developingDrosophila eye. Development 133, 1485–1494

18. Loveland, K. L., Hogarth, C., Szczepny, A., Prabhu, S. M., and Jans, D. A.(2006) Expression of nuclear transport importins � 1 and � 3 is regulatedduring rodent spermatogenesis. Biol. Reprod. 74, 67–74

19. Tao, T., Lan, J., Presley, J. F., Sweezey, N. B., and Kaplan, F. (2004)Nucleocytoplasmic shuttling of lgl2 is developmentally regulated in fetallung. Am. J. Respir. Cell Mol. Biol. 30, 350–359

20. Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B.,Koonin, E. V., Krylov, D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya,A. N., Rao, B. S., Smirnov, S., Sverdlov, A. V., Vasudevan, S., Wolf, Y. I.,Yin, J. J., and Natale, D. A. (2003) The COG database: an updatedversion includes eukaryotes. BMC Bioinformatics 4, 41

21. Birney, E., Andrews, D., Caccamo, M., Chen, Y., Clarke, L., Coates, G.,Cox, T., Cunningham, F., Curwen, V., Cutts, T., Down, T., Durbin, R.,Fernandez-Suarez, X. M., Flicek, P., Graf, S., Hammond, M., Herrero, J.,Howe, K., Iyer, V., Jekosch, K., Kahari, A., Kasprzyk, A., Keefe, D.,Kokocinski, F., Kulesha, E., London, D., Longden, I., Melsopp, C., Meidl,P., Overduin, B., Parker, A., Proctor, G., Prlic, A., Rae, M., Rios, D.,Redmond, S., Schuster, M., Sealy, I., Searle, S., Severin, J., Slater, G.,Smedley, D., Smith, J., Stabenau, A., Stalker, J., Trevanion, S., Ureta-Vidal, A., Vogel, J., White, S., Woodwark, C., and Hubbard, T. J. (2006)Ensembl 2006. Nucleic Acids Res. 34, D556–D561

22. Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high ac-curacy and high throughput. Nucleic Acids Res. 32, 1792–1797

23. Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins,D. G. (1997) The CLUSTAL_X windows interface: flexible strategies formultiple sequence alignment aided by quality analysis tools. NucleicAcids Res. 25, 4876–4882

24. Kumar, S., Tamura, K., and Nei, M. (2004) MEGA3: integrated software forMolecular Evolutionary Genetics Analysis and sequence alignment. Brief.Bioinform. 5, 150–163

25. Dohm, J. C., Vingron, M., and Staub, E. (2006) Horizontal gene transfer inaminoacyl-tRNA synthetases including leucine-specific subtypes. J. Mol.Evol. 63, 437–447

26. Montanini, B., Blaudez, D., Jeandroz, S., Sanders, D., and Chalot, M. (2007)Phylogenetic and functional analysis of the Cation Diffusion Facilitator(CDF) family: improved signature and prediction of substrate specificity.BMC Genomics 8, 107

27. Su, A. I., Cooke, M. P., Ching, K. A., Hakak, Y., Walker, J. R., Wiltshire, T.,Orth, A. P., Vega, R. G., Sapinoso, L. M., Moqrich, A., Patapoutian, A.,Hampton, G. M., Schultz, P. G., and Hogenesch, J. B. (2002) Large-scaleanalysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci.U. S. A. 99, 4465–4470

28. Wang, Y. P., Liang, L., Han, B. C., Quan, Y., Wang, X., Tao, T., and Ji, Z. L.(2006) GEPS: the Gene Expression Pattern Scanner. Nucleic Acids Res.34, W492–W497

29. Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen,M. B., Brown, P. O., Botstein, D., and Futcher, B. (1998) Comprehensiveidentification of cell cycle-regulated genes of the yeast Saccharomycescerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297

30. de Lichtenberg, U., Jensen, L. J., Fausboll, A., Jensen, T. S., Bork, P., andBrunak, S. (2005) Comparison of computational methods for the identi-fication of cell cycle-regulated genes. Bioinformatics 21, 1164–1171

31. Marinescu, V. D., Kohane, I. S., and Riva, A. (2005) MAPPER: a searchengine for the computational identification of putative transcription factorbinding sites in multiple genomes. BMC Bioinformatics 6, 79

32. Harel, A., and Forbes, D. J. (2004) Importin �: conducting a much largercellular symphony. Mol Cell 16, 319–330

33. Gorlich, D., Dabrowski, M., Bischoff, F. R., Kutay, U., Bork, P., Hartmann,E., Prehn, S., and Izaurralde, E. (1997) A novel class of RanGTP bindingproteins. J. Cell Biol. 138, 65–80

34. Malik, H. S., Eickbush, T. H., and Goldfarb, D. S. (1997) Evolutionaryspecialization of the nuclear targeting apparatus. Proc. Natl. Acad. Sci.U. S. A. 94, 13738–13742

35. Bayliss, R., Littlewood, T., Strawn, L. A., Wente, S. R., and Stewart, M.(2002) GLFG and FxFG nucleoporins bind to overlapping sites on impor-tin-�. J. Biol. Chem. 277, 50597–50606

36. Mans, B. J., Anantharaman, V., Aravind, L., and Koonin, E. V. (2004)Comparative genomics, evolution and origins of the nuclear envelopeand nuclear pore complex. Cell Cycle 3, 1612–1637

Evolution and Expression of Karyopherin � Proteins

1268 Molecular & Cellular Proteomics 7.7

37. Zhang, J. Z. (2003) Evolution by gene duplication: an update. Trends Ecol.Evol. 18, 292–298

38. Mukhopadhyay, N. K., Cinar, B., Mukhopadhyay, L., Lutchman, M., Ferdi-nand, A. S., Kim, J., Chung, L. W., Adam, R. M., Ray, S. K., Leiter, A. B.,Richie, J. P., Liu, B. C., and Freeman, M. R. (2007) The zinc finger proteinras-responsive element binding protein-1 is a coregulator of the andro-gen receptor: implications for the role of the Ras pathway in enhancingandrogenic signaling in prostate cancer. Mol Endocrinol. 21, 2056–2070

39. Kenyon, J. R., and Craig, I. W. (1999) Analysis of the 5� regulatory region ofthe human Norrie’s disease gene: evidence that a non-translated CTdinucleotide repeat in exon one has a role in controlling expression.Gene (Amst.) 227, 181–188

40. Seroussi, E., Shani, N., Ben-Meir, D., Chajut, A., Divinski, I., Faier, S., Gery,S., Karby, S., Kariv-Inbal, Z., Sella, O., Smorodinsky, N. I., and Lavi, S.(2001) Uniquely conserved non-translated regions are involved in gen-eration of the two major transcripts of protein phosphatase 2C�. J. Mol.Biol. 312, 439–451

41. Vinckenbosch, N., Dupanloup, I., and Kaessmann, H. (2006) Evolutionaryfate of retroposed gene copies in the human genome. Proc. Natl. Acad.Sci. U. S. A. 103, 3220–3225

42. Wang, P. J. (2004) X chromosomes, retrogenes and their role in malereproduction. Trends Endocrinol. Metab. 15, 79–83

43. Bai, Y., Casola, C., Feschotte, C., and Betran, E. (2007) Comparativegenomics reveals a constant rate of origination and convergent acquisi-tion of functional retrogenes in Drosophila. Genome Biol. 8, R11

44. Goldfarb, D. S., Corbett, A. H., Mason, D. A., Harreman, M. T., and Adam,S. A. (2004) Importin �: a multipurpose nuclear-transport receptor.Trends Cell Biol. 14, 505–514

45. Behrens, P., Brinkmann, U., and Wellmann, A. (2003) CSE1L/CAS: its rolein proliferation and apoptosis. Apoptosis 8, 39–44

46. Koch, P., Bohlmann, I., Schafer, M., Hansen-Hagge, T. E., Kiyoi, H., Wilda,M., Hameister, H., Bartram, C. R., and Janssen, J. W. (2000) Identifica-tion of a novel putative Ran-binding protein and its close homologue.Biochem. Biophys. Res. Commun. 278, 241–249

47. Makhnevych, T., Lusk, C. P., Anderson, A. M., Aitchison, J. D., andWozniak, R. W. (2003) Cell cycle regulated transport controlled by alter-ations in the nuclear pore complex. Cell 115, 813–823

48. Johansson, D., Lindgren, P., and Berglund, A. (2003) A multivariate ap-proach applied to microarray data for identification of genes with cellcycle-coupled transcription. Bioinformatics 19, 467–473

49. Luan, Y., and Li, H. (2004) Model-based methods for identifying periodicallyexpressed genes based on time course microarray gene expressiondata. Bioinformatics 20, 332–339

50. Lu, X., Zhang, W., Qin, Z. S., Kwast, K. E., and Liu, J. S. (2004) Statisticalresynchronization and Bayesian detection of periodically expressedgenes. Nucleic Acids Res. 32, 447–455

51. Bussemaker, H. J., Li, H., and Siggia, E. D. (2001) Regulatory elementdetection using correlation with expression. Nat. Genet. 27, 167–171

52. Liu, Y., Liu, X. S., Wei, L., Altman, R. B., and Batzoglou, S. (2004) Eukaryoticregulatory element conservation analysis and identification using com-parative genomics. Genome Res. 14, 451–458

53. Wray, G. A., Hahn, M. W., Abouheif, E., Balhoff, J. P., Pizer, M., Rockman,M. V., and Romano, L. A. (2003) The evolution of transcriptional regula-tion in eukaryotes. Mol. Biol. Evol. 20, 1377–1419

54. Davidson, I., Martianov, I., and Viville, S. (2004) TBP, a universal transcrip-tion factor? Med. Sci. (Paris) 20, 575–579

55. Johnson, S. A., Dubeau, L., White, R. J., and Johnson, D. L. (2003) TheTATA-binding protein as a regulator of cellular transformation. Cell Cycle2, 442–444

56. Wang, L. H., and Chen, L. (1996) Organization of the gene encoding humanprostacyclin synthase. Biochem. Biophys. Res. Commun. 226, 631–637

57. Azizkhan, J. C., Jensen, D. E., Pierce, A. J., and Wade, M. (1993) Tran-scription from TATA-less promoters: dihydrofolate reductase as a model.Crit. Rev. Eukaryot. Gene Expr. 3, 229–254

58. Zhao, J., and Ennion, S. J. (2006) Sp1/3 and NF-1 mediate basal transcrip-tion of the human P2X1 gene in megakaryoblastic MEG-01 cells. BMCMol. Biol. 7, 10

59. Emami, K. H., Burke, T. W., and Smale, S. T. (1998) Sp1 activation of aTATA-less promoter requires a species-specific interaction involvingtranscription factor IID. Nucleic Acids Res. 26, 839–846

60. Nichols, A. F., Itoh, T., Zolezzi, F., Hutsell, S., and Linn, S. (2003) Basaltranscriptional regulation of human damage-specific DNA-binding pro-tein genes DDB1 and DDB2 by Sp1, E2F, N-myc and NF1 elements.Nucleic Acids Res. 31, 562–569

61. Samson, S. L., and Wong, N. C. (2002) Role of Sp1 in insulin regulation ofgene expression. J. Mol. Endocrinol. 29, 265–279

62. Thomas, K., Wu, J., Sung, D.Y., Thompson, W., Powell, M., McCarrey, J.,Gibbs, R., and Walker, W. (2007) SP1 transcription factors in male germcell development and differentiation. Mol. Cell. Endocrinol. 270, 1–7

63. Bucher, P. (1990) Weight matrix descriptions of four eukaryotic RNA poly-merase II promoter elements derived from 502 unrelated promoter se-quences. J. Mol. Biol. 212, 563–578

64. Mantovani, R. (1999) The molecular biology of the CCAAT-binding factorNF-Y. Gene (Amst.) 239, 15–27

65. Berry, M., Grosveld, F., and Dillon, N. (1992) A single point mutation is thecause of the Greek form of hereditary persistence of fetal haemoglobin.Nature 358, 499–502

66. Mantovani, R. (1998) A survey of 178 NF-Y binding CCAAT boxes. NucleicAcids Res. 26, 1135–1143

67. Roy, B., and Lee, A. S. (1995) Transduction of calcium stress throughinteraction of the human transcription factor CBF with the proximalCCAAT regulatory element of the grp78/BiP promoter. Mol. Cell. Biol. 15,2263–2274

68. Roder, K., Wolf, S. S., Beck, K. F., and Schweizer, M. (1997) Cooperativebinding of NF-Y and Sp1 at the DNase I-hypersensitive site, fatty acidsynthase insulin-responsive element 1, located at �500 in the rat fattyacid synthase promoter. J. Biol. Chem. 272, 21616–21624

69. Wright, K. L., Moore, T. L., Vilen, B. J., Brown, A. M., and Ting, J. P. (1995)Major histocompatibility complex class II-associated invariant chaingene expression is up-regulated by cooperative interactions of Sp1 andNF-Y. J. Biol. Chem. 270, 20978–20986

70. Rosmarin, A. G., Resendes, K. K., Yang, Z., McMillan, J. N., and Fleming,S. L. (2004) GA-binding protein transcription factor: a review of GABP asan integrator of intracellular signaling and protein-protein interactions.Blood Cells Mol. Dis. 32, 143–154

71. Zhang, S., Qian, X., Redman, C., Bliskovski, V., Ramsay, E. S., Lowy, D. R.,and Mock, B. A. (2003) p16 INK4a gene promoter variation and differ-ential binding of a repressor, the ras-responsive zinc-finger transcriptionfactor, RREB. Oncogene 22, 2285–2295

72. Brown, L., and Baer, R. (1994) HEN1 encodes a 20-kilodalton phospho-protein that binds an extended E-box motif as a homodimer. Mol. Cell.Biol. 14, 1245–1255

73. Doniger, S. W., and Fay, J. C. (2007) Frequent gain and loss of functionaltranscription factor binding sites. PLoS Comput. Biol. 3, e99

74. Kadowaki, T., Goldfarb, D., Spitz, L. M., Tartakoff, A. M., and Ohno, M.(1993) Regulation of RNA processing and transport by a nuclear guaninenucleotide release protein and members of the Ras superfamily. EMBOJ. 12, 2929–2937

75. Bapteste, E., Charlebois, R. L., MacLeod, D., and Brochier, C. (2005) Thetwo tempos of nuclear pore complex evolution: highly adapting proteinsin an ancient frozen structure. Genome Biol. 6, R85

76. Denning, D. P., and Rexach, M. F. (2007) Rapid evolution exposes theboundaries of domain structure and function in natively unfolded FGnucleoporins. Mol. Cell. Proteomics 6, 272–282

77. Devos, D., Dokudovskaya, S., Williams, R., Alber, F., Eswar, N., Chait, B. T.,Rout, M. P., and Sali, A. (2006) Simple fold composition and modulararchitecture of the nuclear pore complex. Proc. Natl. Acad. Sci. U. S. A.103, 2172–2177

78. Stavru, F., Nautrup-Pedersen, G., Cordes, V. C., and Gorlich, D. (2006)Nuclear pore complex assembly and maintenance in POM121- andgp210-deficient cells. J. Cell Biol. 173, 477–483

79. Cingolani, G., Petosa, C., Weis, K., and Muller, C. W. (1999) Structure ofimportin-� bound to the IBB domain of importin-�. Nature 399, 221–229

80. Cuff, J. A., Clamp, M. E., Siddiqui, A. S., Finlay, M., and Barton, G. J. (1998)JPred: a consensus secondary structure prediction server. Bioinformat-ics 14, 892–893

81. Lee, S. J., Matsuura, Y., Liu, S. M., and Stewart, M. (2005) Structural basisfor nuclear import complex dissociation by RanGTP. Nature 435,693–696

Evolution and Expression of Karyopherin � Proteins

Molecular & Cellular Proteomics 7.7 1269