the mitochondrial genome of the gymnosperm cycas

13
The Mitochondrial Genome of the Gymnosperm Cycas taitungensis Contains a Novel Family of Short Interspersed Elements, Bpu Sequences, and Abundant RNA Editing Sites Shu-Miaw Chaw,* Arthur Chun-Chieh Shih, Daryi Wang,* Yu-Wei Wu, à Shu-Mei Liu,* and The- Yuan Chou§ *Research Center for Biodiversity, Academia Sinica, Taipei, Taiwan;  Institute of Information Science, Academia Sinica, Taipei, Taiwan; àDepartment of Informatics, Indiana University; and §Institute of Medical Biotechnology, Central Taiwan University of Science and Technology, Taichung City, Taiwan The mtDNA of Cycas taitungensis is a circular molecule of 414,903 bp, making it 2- to 6-fold larger than the known mtDNAs of charophytes and bryophytes, but similar to the average of 7 elucidated angiosperm mtDNAs. It is characterized by abundant RNA editing sites (1,084), more than twice the number found in the angiosperm mtDNAs. The A þ T content of Cycas mtDNA is 53.1%, the lowest among known land plants. About 5% of the Cycas mtDNA is composed of a novel family of mobile elements, which we designated as ‘‘Bpu sequences.’’ They share a consensus sequence of 36 bp with 2 terminal direct repeats (AAGG) and a recognition site for the Bpu 10I restriction endonuclease (CCTGAAGC). Comparison of the Cycas mtDNA with other plant mtDNAs revealed many new insights into the biology and evolution of land plant mtDNAs. For example, the noncoding sequences in mtDNAs have drastically expanded as land plants have evolved, with abrupt increases appearing in the bryophytes, and then in the seed plants. As a result, the genomic organizations of seed plant mtDNAs are much less compact than in other plants. Also, the Cycas mtDNA appears to have been exempted from the frequent gene loss observed in angiosperm mtDNAs. Similar to the angiosperms, the 3 Cycas genes nad1, nad2, and nad5 are disrupted by 5 group II intron squences, which have brought the genes into trans-splicing arrangements. The evolutionary origin and invasion/duplication mechanism of the Bpu sequences in Cycas mtDNA are hypothesized and discussed. Introduction Presently, the complete mitochondrial genomes (mtDNAs) of land plants are only known for 1 liverwort (Marchantia polymorpha, Oda et al. 1992), 1 moss (Physco- mitrella patents, Terasawa et al. 2007), and 7 angiosperms, including 4 eudicots (Arabidopsis thaliana, Unseld et al. 1997); sugar beet (Beta vulgaris, Kubo et al. 2000); rapeseed (Brassica napus, Handa 2003); Tobacco (Nicotiana taba- cum, Sugiyama et al. 2005), and 3 monocot crops (rice [Oryza sativa], Notsu et al. 2002) maize (Zea mays subsp. mays, Clifton et al. 2004); wheat (Triticum aestivum, Ogihara et al. 2005) (table 1). The evolution of mtDNAs is more com- plex than that of plastid genomes (cpDNAs) because mtDNAs have horizontal gene transfer, uptake of plastid and nuclear DNA sequences, extreme size and rate variations, rapid gene rearrangements, and more trans-splicing exons and RNA ed- iting sites (Bergthorsson et al. 2004; Knoop 2004; Sugiyama et al. 2005; Wang et al. 2007). However, to date, no compar- ative mtDNA study has been performed in the 1,000þ species of gymnosperms, which are considered to be ‘‘inextricably related to the origins and early divergence of seed plants’’ (Chaw et al. 1997). In an effort to expand our understanding of mtDNA organization and evolution, we herein determined and examined the first gymnosperm mtDNA from Taitung cycad (Cycas taitungensis), an endangered species restricted to the mountain valleys of eastern Taiwan. Cycads (Cycadales) appeared in the Pennsylvanian era, approximately 300 million years ago (MYA) and dom- inated the Mesozoic forests along with conifers and gink- gos. The extant cycads include 2 or 3 families with some 300 species in 10 genera (Chaw et al. 2005). They are largely confined to the tropics and subtropics in both the Old and New Worlds, except for the genus Cycas (the only genus in Cycadaceae), which has the widest distribution as far north as Japan (Jones 2002). Cycads are regarded as being closely linked with spore-producing ferns because their young leaves are circinate, their trunks lack axillary buds but show unique girdling leaf traces and dichotomous branching (vs. the axillary branching seen in other seed plants), their pollen tubes possess multiciliate sperms, and their ovules are borne on the margins of leaf-like megasporophylls (Stevenson 1990). However, recent phylogenetic analysis and compar- ative chloroplast genomics (e.g., Chaw et al. 1997; Wu et al. 2007) suggested that the cycads and Ginkgo are sister groups, reflecting the above characters should be regarded as plesio- morphic rather than apormorphic. Here we report the complete mtDNA sequence of Cycas taitungensis and its surprising organization. Notably, it contains abundant short interspersed repetitive elements and RNA editing sites. Compared with the known mtDNAs from other land plants, that of Cycas has a particularly low A þ T content, fewer gene losses, and no large repeats .2 kb. Moreover, we describe a novel family of mobile elements, herein termed Bpu sequences/elements, more than 500 var- iants of which are distributed across the Cycas mtDNA. Our evolutionary analysis further reveals that, within seed plants, the mtDNA of Cycas shows higher substitution rates for pro- tein-coding genes than in other known plants. The evolution- ary origin and invasion/duplication mechanisms in Cycas mtDNA are hypothesized and discussed. Materials and Methods Determination of the Complete mtDNA Sequence of C. taitungensis Young leaves (less than 10-day-old) were collected from an 8-year-old C. taitungensis tree grown in the green- house of the Academia Sinica. Intact mitochondria were Key words: mitochondrial genome, Cycas, RNA editing sites, repeats, mobile elements. E-mail: [email protected]. Mol. Biol. Evol. 25(3):603–615. 2008 doi:10.1093/molbev/msn009 Advance Access publication January 12, 2008 Ó 2008 The Authors. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/ uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: others

Post on 09-Feb-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

The Mitochondrial Genome of the Gymnosperm Cycas taitungensis Containsa Novel Family of Short Interspersed Elements Bpu Sequences and AbundantRNA Editing Sites

Shu-Miaw Chaw Arthur Chun-Chieh Shih Daryi Wang Yu-Wei Wu Shu-Mei Liu and The-Yuan Chousect

Research Center for Biodiversity Academia Sinica Taipei Taiwan Institute of Information Science Academia Sinica TaipeiTaiwan Department of Informatics Indiana University and sectInstitute of Medical Biotechnology Central Taiwan University ofScience and Technology Taichung City Taiwan

The mtDNA ofCycas taitungensis is a circular molecule of 414903 bp making it 2- to 6-fold larger than the known mtDNAsof charophytes and bryophytes but similar to the average of 7 elucidated angiosperm mtDNAs It is characterized byabundant RNA editing sites (1084) more than twice the number found in the angiosperm mtDNAs The A thorn T content ofCycasmtDNA is 531 the lowest among known land plants About 5 of theCycasmtDNA is composed of a novel familyof mobile elements which we designated as lsquolsquoBpu sequencesrsquorsquo They share a consensus sequence of 36 bp with 2 terminaldirect repeats (AAGG) and a recognition site for the Bpu 10I restriction endonuclease (CCTGAAGC) Comparison of theCycas mtDNA with other plant mtDNAs revealed many new insights into the biology and evolution of land plant mtDNAsFor example the noncoding sequences in mtDNAs have drastically expanded as land plants have evolved with abruptincreases appearing in the bryophytes and then in the seed plants As a result the genomic organizations of seed plantmtDNAs are much less compact than in other plants Also the Cycas mtDNA appears to have been exempted from thefrequent gene loss observed in angiosperm mtDNAs Similar to the angiosperms the 3Cycas genes nad1 nad2 and nad5 aredisrupted by 5 group II intron squences which have brought the genes into trans-splicing arrangements The evolutionaryorigin and invasionduplication mechanism of the Bpu sequences in Cycas mtDNA are hypothesized and discussed

Introduction

Presently the complete mitochondrial genomes(mtDNAs) of land plants are only known for 1 liverwort(Marchantia polymorpha Oda et al 1992) 1 moss (Physco-mitrella patents Terasawa et al 2007) and 7 angiospermsincluding 4 eudicots (Arabidopsis thaliana Unseld et al1997) sugar beet (Beta vulgaris Kubo et al 2000) rapeseed(Brassica napus Handa 2003) Tobacco (Nicotiana taba-cum Sugiyama et al 2005) and 3 monocot crops (rice [Oryzasativa] Notsu et al 2002) maize (Zea mays subsp maysClifton et al 2004) wheat (Triticum aestivum Ogiharaet al 2005) (table 1) The evolution of mtDNAs is more com-plexthanthatofplastidgenomes(cpDNAs)becausemtDNAshave horizontal gene transfer uptake of plastid and nuclearDNA sequences extreme size and rate variations rapid generearrangements and more trans-splicing exons and RNA ed-iting sites (Bergthorsson et al 2004 Knoop 2004 Sugiyamaet al 2005 Wang et al 2007) However to date no compar-ativemtDNAstudyhasbeenperformed in the1000thorn speciesof gymnosperms which are considered to be lsquolsquoinextricablyrelated to the origins and early divergence of seed plantsrsquorsquo(Chaw et al 1997) In an effort to expand our understandingof mtDNA organization and evolution we herein determinedand examined the first gymnosperm mtDNA from Taitungcycad (Cycas taitungensis) an endangered species restrictedto the mountain valleys of eastern Taiwan

Cycads (Cycadales) appeared in the Pennsylvanianera approximately 300 million years ago (MYA) and dom-inated the Mesozoic forests along with conifers and gink-gos The extant cycads include 2 or 3 families with some300 species in 10 genera (Chaw et al 2005) They are largely

confined to the tropics and subtropics in both the Old andNew Worlds except for the genus Cycas (the only genusin Cycadaceae) which has the widest distribution as far northas Japan (Jones 2002) Cycads are regarded as being closelylinked with spore-producing ferns because their youngleaves are circinate their trunks lack axillary buds but showunique girdling leaf traces and dichotomous branching (vsthe axillary branching seen in other seed plants) their pollentubes possess multiciliate sperms and their ovules are borneon the margins of leaf-like megasporophylls (Stevenson1990) However recent phylogenetic analysis and compar-ative chloroplast genomics (eg Chaw et al 1997 Wu et al2007) suggested that the cycads andGinkgo are sister groupsreflecting the above characters should be regarded as plesio-morphic rather than apormorphic

Here we report the complete mtDNA sequence ofCycas taitungensis and its surprising organization Notablyit contains abundant short interspersed repetitive elementsand RNA editing sites Compared with the known mtDNAsfrom other land plants that of Cycas has a particularly lowAthornT content fewer gene losses and no large repeats2 kbMoreover we describe a novel family of mobile elementsherein termed Bpu sequenceselements more than 500 var-iants of which are distributed across the Cycas mtDNA Ourevolutionary analysis further reveals that within seed plantsthe mtDNA ofCycas shows higher substitution rates for pro-tein-coding genes than in other known plants The evolution-ary origin and invasionduplication mechanisms in CycasmtDNA are hypothesized and discussed

Materials and MethodsDetermination of the Complete mtDNA Sequence ofC taitungensis

Young leaves (less than 10-day-old) were collectedfrom an 8-year-old C taitungensis tree grown in the green-house of the Academia Sinica Intact mitochondria were

Key words mitochondrial genome Cycas RNA editing sitesrepeats mobile elements

E-mail smchawsinicaedutw

Mol Biol Evol 25(3)603ndash615 2008doi101093molbevmsn009Advance Access publication January 12 2008

2008 The AuthorsThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicensesby-nc20uk) which permits unrestricted non-commercial use distribution and reproduction in any medium provided the original work is properly cited

isolated using the method described by Kadowaki et al(1996) Dnase I was used to digest any nuclear genome(nrDNA) contaminants mtDNA was isolated accordingto a cetyltrimethylammonium bromide-based protocol(Stewart and Via 1993) sheared into random fragmentsof 2ndash3 kb with a Hydroshear device (Genomic SolutionsInc Ann Arbor MI) and then directly cloned into theEcoRV site of the pBluescriptSK vector to generate a shot-gun library Shotgun clones were sequenced as previouslydescribed (Wang et al 2007) except that each nucleotidehad about 10 coverage Gaps were filled with specific pri-mers designed based on the sequenced clones The com-plete Cycas mtDNA sequence has been deposited inDNA Data Bank of Japan under NCBI accession numberBABI01000001

Sequence Data Analysis

Annotation of Protein-Coding rRNA and tRNA Genes

A database search was carried out by using the Na-tional Center for Biotechnology Informationrsquos (NCBI)Web-based Blast service (httpwwwncbinlmnihgovBLAST) and genes with e-values smaller than 0001 wereselected The exact gene and exon boundaries were deter-mined by alignment of homologous genes from availableand annotated plant mtDNAs (table 1) Multiple sequencealignments were performed using the MAP2 Web service(httpdeepc2psiiastateeduaatmap2map2html) Align-ments of both nucleotide sequences and the translatedamino acid sequences of the protein-coding genes weremanually inspected The tRNA genes were annotated usingthe tRNAscan-SE program (Lowe and Eddy 1997)

ORF Finding and Intron Identification

Identification of open reading frames (ORFs) was per-formed with the Web-based NCBI ORF Finder (httpwwwncbinlmnihgovprojectsgorf) The standard ge-netic code was applied The intron types were identified

on the basis of their sequences and secondary structures(Michel and Ferat 1995) The predicted introns were alsoverified by alignment of their sequences with orthologoussequences previously elucidated from other species

RNA Editing Sites

Putative RNA editing sites in protein-coding geneswere predicted using the PREP-mt Web-based program(httpwwwprep-mtnet) (Mower 2005) To achieve a bal-anced trade-off between the number of false positive andfalse negative sites the cutoff score (C-value) was set to06 as suggested by the author In addition the top 4 geneswith highest number of editing sites were verified by re-verse transcriptionndashpolymerase chain reaction (RTndashPCR)with gene-specific primers (supplementary table S7 Sup-plementary Material online)

Analysis of Repeats

Identification of repeats was carried out using the RE-Puter Web-based interface (httpbibiservtechfakuni-bielefelddereputer) (Kurtz et al 2001) Both sequencedirections (forward and reverse complement) weresearched The number of maximum computed repeatswas set to 5000 Overlapped repeating sequences weremanually removed from each result Information on tandemrepeats was obtained using the tandem repeats finder (httptandembuedutrftrfhtml) (Benson 1999) Searches for re-peated regions and tandem repeats were carried out on allelucidated plant mtDNAs

Phylogenetic Analysis

The 22 protein-coding genes common to the 11 sam-pled mtDNAs (atp1 atp4 atp6 atp8 atp9 ccmB cobcox1 cox2 cox3 nad1 nad2 nad3 nad4 nad4L nad5nad6 nad9 mttB rps3 rps4 and rps12) were extractedfor phylogenetic analysis The sequences were separately

Table 1MtDNAs of the 11 Plants Examined in This Study

Classification Scientific Name Common NamemtDNA Accession

Number Reference

AlgaeCharophyta

Chara vulgaris Green algae AY267353 Turmel et al (2003)Land plants

BryophytaPhyscomitrella patens Moss NC_007945 Terasawa et al (2007)Marchantia polymorpha Liverwort NC_001660 Oda et al (1992)

Seed plantsGymnosperm

Cycas taitungensis Taitung cycad BABI01000001 Present studyAngiospermMonocots Triticum aestivum Wheat NC_007579 Ogihara et al (2005)

Oryza sativa sp Japonica Rice BA000029 Notsu et al (2002)Zea mays Maize NC_007982 Clifton et al (2004)

Dicots Beta vulgaris L Sugar beet AP000396-7 Kubo et al (2000)Nicotiana tabacum Tobacco NC_006581 Sugiyama et al (2005)Brassica napus Oilseed rape AP006444 Handa (2003)Arabidopsis thaliana Mouse-ear cress NC_001284 Unseld et al (1997)

604 Chaw et al

aligned and then concatenated Gaps and stop codons wereremoved manually Divergence of nucleotide sequencebetween each pair of taxa was estimated in terms of thenumbers of substitutions per synonymous (Ks) or non-synonymous (Ka) site based on the PamilondashBianchindashLimethod implemented in MEGA 30 program (Kumaret al 2004) The Neighbor-Joining trees reconstructed withthe Ka values and Ks values were rooted at Chara The num-ber of bootstrap replicates was set to 500 All phylogeneticanalyses and tree reconstructions were performed usingMEGA 30

Results and DiscussionEvolution of mtDNA Organization in Land Plants

Characteristics of Cycas mtDNA and Insights into theEvolution of Land Plant mtDNAs

The complete mtDNA of C taitungensis is a circularmolecule of 414903 bp (fig 1) Table 2 which comparesthe main features of mtDNAs from a charophyte (Charavulgaris) 2 bryophytes (Marchantia and Physcomitrella)and 7 angiosperms shows that the Cycas mtDNA is about6- 22- and 40-fold larger than those of Chara Marchan-tia and Physcomitrella respectively but does not signifi-cantly differ from the average (414 plusmn 102 kb) of thepreviously elucidated angiosperm mtDNAs (P 5 0502)The Athorn T content of the Cycas mtDNA is 531 the low-est among known algae and land plants As further shown intable 2 the total numbers of protein- and tRNA-codinggenes decrease from charophytes (39 and 26 respectively)to seed plants (29ndash40 and 17ndash27 respectively) (detailedin supplementary table S2 Supplementary Materialonline) In contrast the numbers of rRNA gene speciesremain the same in all lineages with the exception ofobvious gene duplications in the Poaceae (grass family)and Beta lineages in which some tRNA genes are alsoduplicated

Table 2 also shows that noncoding sequences (spacersintrons and pseudogenes) account for 899 of the CycasmtDNA sequence consistent with the proportions found inother angiosperm mtDNAs (894 plusmn 31) As land plantsevolved from charophycean green algae (Chara 93) theclosest living relatives of land plants (Karol et al 2001)the noncoding sequences have drastically expanded in themtDNA showing abrupt increases in the bryophytes andthen in the seed plants As a result the genomic organiza-tions of mtDNAs are much less compact in seed plantswhen compared with lower plants

Repeated sequences that present in the genome as mul-tiple copies comprise approximately 151 of the CycasmtDNA (table 2) The repeats very few of which are over2-kb long are evenly distributed across the genome andmainly occur in the noncoding regions including the inter-genic spacers and introns (fig 1B) As shown in table 2most mtDNAs of land plants (except for rice) have 2ndash5times more repeated sequences than the mtDNA of Charawhereas more than a quarter of the rice mtDNA consists ofrepeated sequences Among the sampled plant mtDNAsthat of Cycas contains the highest percentage of tandem re-peats (497 fig 2A detailed in supplementary table S3

[Supplementary Material online]) these include a novelfamily of mobile elements the Bpu sequenceselements(see below) which are mainly found in 3- or 4-copy arrays(fig 2B) Ogihara et al (2005) noted the presence of manyrepeats in wheat mtDNA and hypothesized that alternativephysical structures may be adopted by wheat mtDNATherefore it seems logical to hypothesize that alternativecircular mtDNAs in various recombinant forms might co-exist in Cycas cells in vivo

No group I intron was detected in the mtDNA of Cycasor the other 7 previously elucidated angiosperm mtDNAs(table 2) This observation is consistent with Knooprsquos(2004) speculation that group I introns were lost fromthe common ancestor of hornworts and tracheophytes Al-though Cho et al (1998) demonstrated that a group I intronlocated in the cox1 gene is widespread among 48 angio-sperm genera the examined angiosperms represented anexceptionally patchy phylogenetic distribution and didnot include any of our sampled 7 angiosperms Further-more the authors estimated that the intron invaded thecox1 gene by cross-species horizontal transfer over 1000times during angiosperm evolution and lsquolsquois of entirely re-cent occurrencersquorsquo

In contrast we found 20ndash25 group II introns in themtDNAs of the examined land plants (table 2) nearly dou-ble the number found in those of their sister group the char-ophytes (13) Similar to the angiosperms the 3 genes nad1nad2 and nad5 in Cycas are disrupted by 5 group II intronsequences which have brought the genes into trans-splic-ing arrangements Studies of Malek et al (1997) and Malekand Knoop (1998) concluded that trans-spliced group II in-trons had evolved from formerly cis-spliced introns beforethe emergence of hornworts Most recently the evolvingdate was discovered to be even before the emergence ofmosses (Groth-Malonek et al 2005)

Gene Content and Evolution of Gene Loss in the mtDNAsof Land Plants

Our phylogenetic analysis using either nonsynony-mous (Ka) or synonymous (Ks) substitutions of 22 mito-chondrial protein-coding genes shared by the 11 studiedplants generated identical tree topologies Figure 3A showsthe Neighbor-Joining trees reconstructed using theKa andKs

values respectively with Chara being designated as theoutgroup The topologies of these 2 trees strongly indicatethat excluding the bryophytes Physcomitrella and March-antia the seed plants (including Cycas and the 7 angio-sperms) and the angiosperms form nested monophyleticclades Within the angiosperms the monocots and eudicotsconstitute 2 distinct subclades Noting that sisterhood rela-tionship of Physcomitrella and Marchantia has to be treatedwith caution because the sampled seed plants share a longbranch In addition a recent multigene analysis (Qiu et al2006) based on dense taxon sampling suggested that horn-wort (including Marchantia) diverged before moss (includ-ing Physcomitrella) The Ka- and Ks-derived branch lengthsleading to Cycas are nearly equal (fig 3A) whereas the Ka-based branch lengths for the other species are strikinglyshorter than the corresponding Ks-based branches

Complete Mitochondria Genome of Cycas 605

Statistical analysis using a Z-test further indicated that Ka

value for the Cycas branch is higher than the average Ka

of the 7 studied angiosperms (P 005 Z-test) The ele-vated Ka in Cycas suggests that a rapid evolutiondiver-gence may have occurred in some of the protein-coding

genes of the Cycas mtDNA This result is consistent withthe observation that some genes in the Cycas mtDNA con-tain abundant RNA editing sites (see below)

A total of 39 protein-coding genes were identified inthe Cycas mtDNA which is the highest gene number

FIG 1mdash(A) Gene map of the Cycas taitungensis mtDNA Genes are color coded into 12 groups according to their biological functions Genes onthe outside and inside of the 2 circles are transcribed clockwise and counterclockwise respectively Predicted mtpts are indicated in deep green betweenthe 2 circles Intron-containing genes are indicated by asterisks and marked with exon numbers Degrees on the circular genome correspond to the linearmaps in (B) (B) Lengths and distribution of repeat sequences and single Bpu sequences across the entire Cycas mtDNA Upper linear presentation of(A) Middle the position of each repeat sequence (in blue) mapped onto the genome Lower the locations and lengths of tandem repeats detected usingthe tandem repeats finder Single Bpu sequences (including variants) and their tandem repeats are shown in red other tandem repeats are shown inblack

606 Chaw et al

Table 2Comparison of Features among 11 Elucidated Land Plant Mitochondrial Genomes

Taxon

Features Chara Marchantia Physcomitrella Cycas Triticum Oryza Zea Beta Nicotiana Brassica Arabidopsis

Size (bp) 67737 186609 105340 414903 452528 490520 569630 368799 430597 221853 366924A thorn T content () 591 576 594 531 557 562 561 561 550 548 552Gene numbera (protein-codingtRNArRNA)

3926339263

4127341293

3924339243

3922339263

3317335258

3518340276

32173b

34225b2918329225

3621339224

3317334173

3217333223

Coding sequences ()a 907 203 370 101 86 111 62 103 99 173 106Repeat sequences ()c 32 101 116 151 101 288 114 125 108 55 78Tandem repeatsequences ()d 022 049 014 497 042 009 035 071 005 036 034Introns

Group I 14 7 2 0 0 0 0 0 0 0 0Group IICis-spliced 13 25 25 20 17 17 15 14 17 18 18Trans-spliced 0 0 0 5 6 6 7 6 6 5 5

Cp-derivedsequences (bp) mdashe mdashe mdashe 18113 13455 22593 25132 mdashe 9942 7950 3958() mdashe mdashe mdashe 44 30 63 44 21 25 36 11RNA editing sites 0 0 mdashe 1084f mdashe 491 mdashe 370g mdashe 427 441

a Pseudogenes and unique ORFs (such as ORF222 reported in Handa 2003) were excluded Upper duplicate genes were counted only once Lower within curly brackets all duplicate genes were includedb The plasmid-localized tRNA-Trp was included (Clifton et al 2004)c The REPuter program (httpbibiservtechfakuni-bielefelddereputer) was used to obtain the estimates All duplicate copies of repeats are included Overlapped sequences were counted only onced The tandem repeats finder program (Benson 1999) was used to obtain the estimatese Data were not available from original papersf The PREP-mt program (Mower 2005) was used to obtain the estimates The cutoff value of the reported score was set to 06g 370 C-to-U editing sites were identified in 28 geneORF transcripts of sugar beet (Kubo et al 2000)

Co

mp

leteM

itoch

on

dria

Gen

om

eo

fCycas

60

7

identified to date among the studied seed plant mtDNAs(fig 3B) When the distributions of conserved genes inthe mtDNA from the 11 sampled species (supplementarytable S2 Supplementary Material online) are mapped totheir respective branches in a maximum parsimony tree(fig 3B based on the data used in fig 3A) it is possiblethen to estimate the time of loss of a particular gene Asshown in figure 3B our analysis indicates that there havebeen at least 31 independent events of gene loss from all theland plant mtDNAs elucidated to date

When a gene is missing from the mtDNA of a givenspecies it is generally believed that the original copy hasbeen transferred to nucleus where it functions throughcytosolic protein synthesis followed by transit peptidendashassisted import back to the mitochondria (Adams andPalmer 2003 Knoop 2004) Frequent gene losses espe-cially of the ribosomal protein genes (30 of 34) appearto have occurred after the divergence of the angiosperm lin-eages approximately 150 MYA (Chaw et al 2004) How-ever only 1 gene loss was observed in the Cycas lineageafter it branched off from the common ancestor of angio-sperms approximately 300 MYA Adams and Palmer(2003) suggested that angiosperm mtDNAs have experi-enced a recent evolutionary surge of loss andor transferof genes (primarily those encoding ribosomal proteins) tothe nucleus Our data give additional support to their con-tention but suggest that the mtDNA of Cycas appears tohave been excluded from this surge Our results suggest thatthe Cycas mtDNA tends to evolutionarily maintain its genediversity andor enjoys less gene transfer than other angio-sperm mtDNAs (Selosse et al 2001 Adams et al 2002Adams and Palmer 2003) Coincidentally among the some40 published cpDNAs of seed plants that of Cycas alsoundergoes the least gene loss (Wu et al 2007 supplemen-tary fig 1 [Supplementary Material online]) Future workwill be required to examine why the genomes of these 2Cycas organelles appear to have frozen after the cycads di-vergence from angiosperms especially in comparison tomtDNAs from other major gymnosperm clades such asGinkgo gnetophytes and pines

A Novel Family of Short Interspersed MitochondrialElements

Numerous Short Interspersed Mitochondrial ElementsTermed Bpu Sequences Are Present in the Cycas mtDNA

Sequence analysis revealed that numerous copies ofa 36-nt repeat herein designated as a lsquolsquoBpu sequenceele-mentrsquorsquo are interspersed throughout the Cycas mtDNA(fig 1B) Figure 4A shows the characteristic sequences of500 Bpu elements having 0 to 4 mismatches If up to a 7-ntmismatchto thedominant type isallowed the totalcopynum-ber of Bpu sequences increases to 512 The Bpu sequencesfeature 2 conserved terminal direct repeats (AAGG) and therecognition sequence for the restriction endonucleaseBpu10I (CCTGAAGC nt 15ndash21) These repeat elementssequences do not appear to have coding potential

Because Bpu elements are extremely short in theirlengths and terminal repeats and contain 2 terminal directrepeats rather than the inverted repeats found in plant min-iature inverted-repeat transposable elements (MITEs) theyare here classified as short interspersed mitochondrial ele-ments (SIMEs) This distinguishes them from the MITEsand short interspersed nuclear elements (SINEs eg AluDNA repeats in the genomes of primates) that have beenextensively reported from the genomes of plants and ani-mals (Feschotte et al 2002) Based on the similarity be-tween SIMEs and SINEs however we can hypothesizethat Bpu elements are likely to be transposed throughRNA without a requirement for reverse transcriptase

The Bpu10I recognition site characteristic of the Bpuelements is highly conserved except at the very last basepair that is the 21st bp of Bpu sequence (fig 4A lowerchart) In contrast the 5 terminal direct repeat and itsdownstream 6 nt (nt 5ndash9) tend to be more variable thanthe 3 terminal direct repeat Enigmatically that elementscan form a secondary structure (fig 4B) with a predictedfree energy of 125 (kcalmole) as calculated by theMFOLD program (httpfrontendbioinforpieduzukermcommentsFAQshtml) The significance of this secondarystructure remains to be elucidated Comparison of the Bpu

FIG 2mdashComparison of copy numbers among various tandem repeats in the mtDNAs of Cycas and 10 other land plants (A) Percentage of totalgenome comprised of tandem repeat sequences (B) The copy numbers of tandem repeat sequences in each genome (x axis) are separated into 5 groups(y axis) and their histograms (z axis) are shown The single-copy Bpu sequences of Cycas taitungensis described in figure 1 are not included in thecounts The copy numbers may not be integers because the boundaries of tandem repeats were determined by a probabilistic method described byBenson (1999)

608 Chaw et al

element insertion sites in the Cycas mtDNA with corre-sponding sites in the mtDNAs of other species reveals thatthe target sites for transpositions of Bpu elements are iden-tical (within 1ndash2 mismatched base pair) to the 5 terminaldirect repeat (fig 4CF)

Bpu Sequences Distinguish Cycas from Other Cycadsand Seed Plants

Bpu sequences are present exclusively in the noncod-ing regions of the Cycas mtDNA that is within the intronsand intergenic spacers with 1 exception a Bpu sequence ispresent within the coding region of rrn 18 which codes the18S rRNA (fig 4C) Among the many mitochodrial rrn18(mt-rrn18) genes available in GenBank only those of Cy-cas (including Cycas revoluta GenBank accession numberAB029356) and Ginkgo biloba (the only living species ofthe order Ginkgoales) have Bpu elements (fig 4C) TheBpu insertion sites of the 2 Cycas taxa are orthologous (datanot shown) whereas that of Ginkgo is different indicatingthat the insertions of Bpu elements in rrn18 did not occur in

the common ancestor of cycads and ginkgo In sequencedmtDNAs the oldest cpDNA-derived sequences (alsotermed mtpt) cluster trnV(uac)-trnM(cau)-atpE-atpB-rbcL is reported to have existed since the common ancestorof seed plants (Wang et al 2007 fig 3) Bpu elements arepresent in the Cycas mtpt-atpB (fig 4D) and mtpt-rbcL(fig 4E) sequences but not in the corresponding sequencesof the 7 angiosperm mtDNAs elucidated to date (Wang et al2007) suggesting that these Bpu insertions took place afterthe split of the Cycas lineage from the other seed plants

We further examined the occurrence of Bpu elementsin 1 of the other 2 cycad families Zamiaceae (including 1species from each of Dioon Macrozamia and Zamia) bysequencing the exons 1 and 2 of their nad2 genes Whereasthe nad2i1 gene of Cycas mtDNA contains 5 Bpu ele-ments such elements are absent from the nad2 genes ofthe 3 sampled Zamiaceae genera However we found 12 and 6 Bpu elements in mitochondrial nad5i1 of Cycaspanzhihuaensis (GenBank accession number AF43425)and mitochondrial nad1i1 and cp rps19-rpl16 spacersin-tron of C revoluta (GenBank accession number

FIG 3mdashPhylogenies inferred from concatenated data from 22 protein-coding genes (see supplementary table S2 Supplementary Material online)common to the sequenced mtDNAs from 10 land plants and Chara (the outgroup) Nodes received 100 bootstrap replicates unless indicated Branchlengths are drawn to scale except as otherwise noted (A) Two superimposed Neighbor-Joining trees based on the Ka and Ks values respectively (B)Scenarios of gene losses (open bar) and splits (hatch triangle) along the single maximum parsimony tree (9857 steps) The total numbers of protein-coding genes in each mtDNA species are given within parentheses

Complete Mitochondria Genome of Cycas 609

FIG 4mdashBpu sequences and examples of their insertion loci Target sites and terminal repeats are underlined (A) Upper a dominant sequencebased on 500 Bpu sequences (see text) Numbers along the abscissa indicate each nucleotide position within the prototype Bpu sequence Recognitionsites for the Bpu10I restriction enzyme are marked with asterisks The ordinate scales each by the total bits of information multiplied by its relativeoccurrence at that position (Wasserman and Sandelin 2004) Lower identities of Bpu sequences that differ from the dominant type by 0ndash4 bp (B)Secondary structure of the dominant Bpu sequence (C)ndash(F) lower cases periods and dashes denote mismatches identical bps and deletionsrespectively compared with the uppermost sequence (C) Partial alignment of mt-rrn18 sequences from Cycas and Ginkgo A Bpu sequence anda reversely complementary Bpu sequence are present in Ginkgo and Cycas respectively (D) Partial alignment of mtpt-atpB sequences extracted from

610 Chaw et al

AY354955 AY345867) respectively These data seem tosuggest that no Bpu sequence has successfully invaded themtDNAs of Cycadales genera other than Cycas

A Bpu-like sequence with a 1-bp insertion and 89similarity to the dominant type in Cycas mtDNA was re-trieved from the coding sequence of the mitochondrialrps11 gene from the core eudicot Weigela hortensis Align-ment of this Bpu-like sequence reveals that its predictedBpu10I endonuclease recognition site differs from that ofCycas by 2 bp (gCTGAGt) Because this Bpu element-likesequence does not interrupt the reading frame of rps11 andbecause the mitochondrial rps11 gene of Weigela lacksa target duplication site we do not consider that thisBpu-like element shares a common origin with the Bpu se-quences of Cycas We further postulate that Bpu elementsare likely absent from or very rare in angiosperms

Surprisingly the cpDNA of Cycas also contains 2 Bpuelements and each of them locates in the petN-psbM andpsbA-trnK spacers respectively (see Wu et al 2007) Ad-ditionally we also identified 1 Bpu sequence in theatpB-rbcL spacer of the cpDNA from Pinus luchuensis(GenBank accession number DQ196799) However noBpu sequence has been detected in the cpDNAs of the 3other Pinaceae genera and 3 gnetophyte orders we have se-quenced to date (Chaw SM Wu CS Lai YT Wang YN LinCP Liu SM unpublished data) Collectively the availableevidence inclines us to believe that the Bpu sequences haveproliferated specifically in the mtDNA of Cycas (or the Cy-cadaceae) The sporadic occurrence of Bpu elements in thecpDNAs of Cycas and Pinus suggests that they are likelyderived from nonhomologous recombination with DNAfragments that leaked out of mitochondria in the formercase and via lateral transfer in the latter case

The Tai Sequence and Its Association with BpuSequences

The second intron of the rps3 gene (rps3i2) is onlyfound in the mtDNAs of Cycas (Regina et al 2005) includ-ing those studied in the present work Regina et al (2005)suggested that rps3i2 is a group II intron that was indepen-dently gained in the gymnosperms likely at or just after thedivergence of the angiosperms Moreover the authors re-ported a high similarity between a partial segment of theCycas rps3i2 and orf760 of the Chara mtDNA (Turmelet al 2003) Orf760 harbors functional domains for a matur-ase and a reverse transcriptase prompting Regina et al(2005) to propose that the Cycas rps3i2 gene originally en-coded a maturase and a reverse transcriptase but evolvedover time into a partially degenerated ORF Here we fur-ther report that rps3i2 of the Cycas mtDNA contains a900-bp fragment comprising an array of 4 Bpu elementswith the Bpu elements on each end lacking 1 terminal re-peat followed by a 440-bp fragment (designated as lsquolsquoTairsquorsquo

sequence) a Worf760 sequence and 1 perfect Bpu se-quence (fig 5A) Most intriguingly additional Tai sequen-ces are scattered throughout the Cycas mtDNA they occurmore densely in longer spacers than in shorter ones (fig 5B)and are generally found in close association with repeatedBpu sequences These findings lend extra and robust sup-port to the proposal of Regina et al (2005)

In conclusion we hypothesize that a Tai sequence andan orf760 flanked by a Bpu sequence at each end mostlikely constitute an ancestral retrotransposon This hypoth-esis is founded on 2 observations 1) Tai sequences arehighly associated with Bpu elements and 2) small segmentsof Tai sequences are vigorously and diversely rearrangeddeleted or truncated as illustrated in the 3 examples shownin figure 5B These observations also suggest that an ances-tral retrotransposon in Cycas mtDNA spread very activelypresumably after Cycas branched off from the 10 other ex-tant cycad genera Afterward the offspring or duplicates ofthe ancestral transposon gradually may have lost their mo-bility through varying degrees of deletionstruncations fromthe 3 region which encoded both maturase and reversetranscriptase functions

We further speculate that even after the ancestral ret-rotransposon lost its transposon function the remainingBpu elements might have retained the ability to proliferateor amplify via unequal crossing-over or slipped-strand mis-pairing because their 2 direct terminal repeats can pair witheach otherrsquos complimentary strands Future studies and se-quencing of additional mtDNAs from basal Cycas will berequired to confirm this hypothesis and may provide addi-tional insight into the evolutionary origins of Bpu sequen-ces and the molecular mechanisms underlying theiramplification

The Fates of CpDNA-Derived Sequences (mtpts) inCycas mtDNA

Table 2 shows a comparative analysis of mtpts amongthe 11 studied plant mtDNAs The total percentage of mtptsin Cycas (44) is relatively high compared with that indicots (11ndash36) but falls within the range seen amongmonocots (30ndash63) We previously discovered that thefrequency of mtpt transfer is positively correlated with var-iations in mtDNA size (coefficient value r2 5 047) (Wanget al 2007) Here we report that the Cycas mtDNA con-tains 8 protein-coding genes as well as 2 rRNA and 5 tRNAgene sequences originating from the cpDNA (fig 1) How-ever frameshifts and indels within the protein-coding genessuggest that these Cycas mtpts have degenerated and arenonfunctional In contrast the 5 cpDNA-derived tRNAsin the Cycas mtDNA are able to fold into standard clover-leaf structures as shown by tRNAscan-SE analysis (Loweand Eddy 1997) and are thus likely to be functional Pre-viously Sugiyama et al (2005) used tRNAscan-SE to scan

Cycas Oryza Arabidopsis and Nicotiana Note that 2 Bpu sequence insertions are detected in Cycas but not in the other plants (E) Partial alignment ofmtpt-rbcL sequences extracted from Oryza Arabidopsis Nicotiana and Cycas The sequence from Cycas contains a Bpu-like sequence with a 5-bpinsertion (shaded box) versus the dominant type (F) Partial alignment of the mtpt-atpE and chloroplast atpE sequences of Cycas Two reverselycomplementary Bpu sequences are found in the mtDNA Note that the 5 Bpu sequence is partly degenerated at its 3 end

Complete Mitochondria Genome of Cycas 611

the tobacco mtDNA and concluded that the 6 cp-derivedtRNAs shared by angiosperms are functional Because nomtptwasobserved inCharaMarchantia orPhyscomitrellaWangetal (2007)concludedthatfrequentDNAtransferfromcpDNA to mtDNA has taken place no later than in the com-mon ancestor of seed plants approximately 300 MYA

Abundant RNA Editing Sites in Cycas mtDNA

Among the land plant DNAs Cycas has the most pre-dicted RNA editing sites (1084 sites supplementary tableS4 [Supplementary Material online]) It is commonly be-lieved that RNA editing arose together with the first terres-trial plants (Steinhauser et al 1999) By using the PREP-mtsoftware (Mower 2005) with the cutoff score set to 061084 sites within the protein-coding genes of the CycasmtDNA were predicted to be C-to-U RNA editing sitesThis is more than double the number of predicted siteswithin the elucidated mtDNAs of other land plants (table 2)

If the cutoff score which indicates the conservation degreeof each editing site compared those found in the other pub-lished plant mtDNAs is set to the most stringent criterion inPREP-mt (ie5 1) the number of editing sites decreases to738 which is still the largest number found among the landplant mtDNAs elucidated to date

It is believed that RNA editing is essential for func-tional protein expression as it is required to modify aminoacids or generate new start or stop codons (Hoch et al 1991Wintz and Hanson 1991 Kotera et al 2005 Shikanai2006) For this reason the large number of RNA editingsites in the Cycas mtDNA may indicate higher complexityat the DNA level and formation of various transcriptsthrough RNA editing thus potentially reflecting rapid di-vergence in Cycas

Although RNA editing sites are known to be sporad-ically distributed in the genomes of plant organellesmito-chondria from seed plants the mechanisms underlying thisdistribution pattern are not yet known (Shikanai 2006

FIG 5mdashGenomic organization of the second rps3 intron (rps3i2) and association of Tai sequences with Bpu sequences (A) Organization ofrps3i2 The segment between worf760 and the 4 tandem Bpu sequences is designated as a Tai sequence (gray) Arrowheads indicate a Bpu sequencelacking the 2 terminal repeats (banded bars) and its complementary sequence (B) Upper association of Tai sequences with Bpu tandem repeats acrossthe genome (abscissa) and their respective lengths (ordinate) Lower examples of 3 Tai variants (boxed) showing that the sequences are variouslydegenerate and truncated as compared with the typical Tai shown in (A) (broken line indicates mismatching sequences) Thin arrows indicate theorientations of homologous segments between Tai variants and the typical Tai

612 Chaw et al

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

isolated using the method described by Kadowaki et al(1996) Dnase I was used to digest any nuclear genome(nrDNA) contaminants mtDNA was isolated accordingto a cetyltrimethylammonium bromide-based protocol(Stewart and Via 1993) sheared into random fragmentsof 2ndash3 kb with a Hydroshear device (Genomic SolutionsInc Ann Arbor MI) and then directly cloned into theEcoRV site of the pBluescriptSK vector to generate a shot-gun library Shotgun clones were sequenced as previouslydescribed (Wang et al 2007) except that each nucleotidehad about 10 coverage Gaps were filled with specific pri-mers designed based on the sequenced clones The com-plete Cycas mtDNA sequence has been deposited inDNA Data Bank of Japan under NCBI accession numberBABI01000001

Sequence Data Analysis

Annotation of Protein-Coding rRNA and tRNA Genes

A database search was carried out by using the Na-tional Center for Biotechnology Informationrsquos (NCBI)Web-based Blast service (httpwwwncbinlmnihgovBLAST) and genes with e-values smaller than 0001 wereselected The exact gene and exon boundaries were deter-mined by alignment of homologous genes from availableand annotated plant mtDNAs (table 1) Multiple sequencealignments were performed using the MAP2 Web service(httpdeepc2psiiastateeduaatmap2map2html) Align-ments of both nucleotide sequences and the translatedamino acid sequences of the protein-coding genes weremanually inspected The tRNA genes were annotated usingthe tRNAscan-SE program (Lowe and Eddy 1997)

ORF Finding and Intron Identification

Identification of open reading frames (ORFs) was per-formed with the Web-based NCBI ORF Finder (httpwwwncbinlmnihgovprojectsgorf) The standard ge-netic code was applied The intron types were identified

on the basis of their sequences and secondary structures(Michel and Ferat 1995) The predicted introns were alsoverified by alignment of their sequences with orthologoussequences previously elucidated from other species

RNA Editing Sites

Putative RNA editing sites in protein-coding geneswere predicted using the PREP-mt Web-based program(httpwwwprep-mtnet) (Mower 2005) To achieve a bal-anced trade-off between the number of false positive andfalse negative sites the cutoff score (C-value) was set to06 as suggested by the author In addition the top 4 geneswith highest number of editing sites were verified by re-verse transcriptionndashpolymerase chain reaction (RTndashPCR)with gene-specific primers (supplementary table S7 Sup-plementary Material online)

Analysis of Repeats

Identification of repeats was carried out using the RE-Puter Web-based interface (httpbibiservtechfakuni-bielefelddereputer) (Kurtz et al 2001) Both sequencedirections (forward and reverse complement) weresearched The number of maximum computed repeatswas set to 5000 Overlapped repeating sequences weremanually removed from each result Information on tandemrepeats was obtained using the tandem repeats finder (httptandembuedutrftrfhtml) (Benson 1999) Searches for re-peated regions and tandem repeats were carried out on allelucidated plant mtDNAs

Phylogenetic Analysis

The 22 protein-coding genes common to the 11 sam-pled mtDNAs (atp1 atp4 atp6 atp8 atp9 ccmB cobcox1 cox2 cox3 nad1 nad2 nad3 nad4 nad4L nad5nad6 nad9 mttB rps3 rps4 and rps12) were extractedfor phylogenetic analysis The sequences were separately

Table 1MtDNAs of the 11 Plants Examined in This Study

Classification Scientific Name Common NamemtDNA Accession

Number Reference

AlgaeCharophyta

Chara vulgaris Green algae AY267353 Turmel et al (2003)Land plants

BryophytaPhyscomitrella patens Moss NC_007945 Terasawa et al (2007)Marchantia polymorpha Liverwort NC_001660 Oda et al (1992)

Seed plantsGymnosperm

Cycas taitungensis Taitung cycad BABI01000001 Present studyAngiospermMonocots Triticum aestivum Wheat NC_007579 Ogihara et al (2005)

Oryza sativa sp Japonica Rice BA000029 Notsu et al (2002)Zea mays Maize NC_007982 Clifton et al (2004)

Dicots Beta vulgaris L Sugar beet AP000396-7 Kubo et al (2000)Nicotiana tabacum Tobacco NC_006581 Sugiyama et al (2005)Brassica napus Oilseed rape AP006444 Handa (2003)Arabidopsis thaliana Mouse-ear cress NC_001284 Unseld et al (1997)

604 Chaw et al

aligned and then concatenated Gaps and stop codons wereremoved manually Divergence of nucleotide sequencebetween each pair of taxa was estimated in terms of thenumbers of substitutions per synonymous (Ks) or non-synonymous (Ka) site based on the PamilondashBianchindashLimethod implemented in MEGA 30 program (Kumaret al 2004) The Neighbor-Joining trees reconstructed withthe Ka values and Ks values were rooted at Chara The num-ber of bootstrap replicates was set to 500 All phylogeneticanalyses and tree reconstructions were performed usingMEGA 30

Results and DiscussionEvolution of mtDNA Organization in Land Plants

Characteristics of Cycas mtDNA and Insights into theEvolution of Land Plant mtDNAs

The complete mtDNA of C taitungensis is a circularmolecule of 414903 bp (fig 1) Table 2 which comparesthe main features of mtDNAs from a charophyte (Charavulgaris) 2 bryophytes (Marchantia and Physcomitrella)and 7 angiosperms shows that the Cycas mtDNA is about6- 22- and 40-fold larger than those of Chara Marchan-tia and Physcomitrella respectively but does not signifi-cantly differ from the average (414 plusmn 102 kb) of thepreviously elucidated angiosperm mtDNAs (P 5 0502)The Athorn T content of the Cycas mtDNA is 531 the low-est among known algae and land plants As further shown intable 2 the total numbers of protein- and tRNA-codinggenes decrease from charophytes (39 and 26 respectively)to seed plants (29ndash40 and 17ndash27 respectively) (detailedin supplementary table S2 Supplementary Materialonline) In contrast the numbers of rRNA gene speciesremain the same in all lineages with the exception ofobvious gene duplications in the Poaceae (grass family)and Beta lineages in which some tRNA genes are alsoduplicated

Table 2 also shows that noncoding sequences (spacersintrons and pseudogenes) account for 899 of the CycasmtDNA sequence consistent with the proportions found inother angiosperm mtDNAs (894 plusmn 31) As land plantsevolved from charophycean green algae (Chara 93) theclosest living relatives of land plants (Karol et al 2001)the noncoding sequences have drastically expanded in themtDNA showing abrupt increases in the bryophytes andthen in the seed plants As a result the genomic organiza-tions of mtDNAs are much less compact in seed plantswhen compared with lower plants

Repeated sequences that present in the genome as mul-tiple copies comprise approximately 151 of the CycasmtDNA (table 2) The repeats very few of which are over2-kb long are evenly distributed across the genome andmainly occur in the noncoding regions including the inter-genic spacers and introns (fig 1B) As shown in table 2most mtDNAs of land plants (except for rice) have 2ndash5times more repeated sequences than the mtDNA of Charawhereas more than a quarter of the rice mtDNA consists ofrepeated sequences Among the sampled plant mtDNAsthat of Cycas contains the highest percentage of tandem re-peats (497 fig 2A detailed in supplementary table S3

[Supplementary Material online]) these include a novelfamily of mobile elements the Bpu sequenceselements(see below) which are mainly found in 3- or 4-copy arrays(fig 2B) Ogihara et al (2005) noted the presence of manyrepeats in wheat mtDNA and hypothesized that alternativephysical structures may be adopted by wheat mtDNATherefore it seems logical to hypothesize that alternativecircular mtDNAs in various recombinant forms might co-exist in Cycas cells in vivo

No group I intron was detected in the mtDNA of Cycasor the other 7 previously elucidated angiosperm mtDNAs(table 2) This observation is consistent with Knooprsquos(2004) speculation that group I introns were lost fromthe common ancestor of hornworts and tracheophytes Al-though Cho et al (1998) demonstrated that a group I intronlocated in the cox1 gene is widespread among 48 angio-sperm genera the examined angiosperms represented anexceptionally patchy phylogenetic distribution and didnot include any of our sampled 7 angiosperms Further-more the authors estimated that the intron invaded thecox1 gene by cross-species horizontal transfer over 1000times during angiosperm evolution and lsquolsquois of entirely re-cent occurrencersquorsquo

In contrast we found 20ndash25 group II introns in themtDNAs of the examined land plants (table 2) nearly dou-ble the number found in those of their sister group the char-ophytes (13) Similar to the angiosperms the 3 genes nad1nad2 and nad5 in Cycas are disrupted by 5 group II intronsequences which have brought the genes into trans-splic-ing arrangements Studies of Malek et al (1997) and Malekand Knoop (1998) concluded that trans-spliced group II in-trons had evolved from formerly cis-spliced introns beforethe emergence of hornworts Most recently the evolvingdate was discovered to be even before the emergence ofmosses (Groth-Malonek et al 2005)

Gene Content and Evolution of Gene Loss in the mtDNAsof Land Plants

Our phylogenetic analysis using either nonsynony-mous (Ka) or synonymous (Ks) substitutions of 22 mito-chondrial protein-coding genes shared by the 11 studiedplants generated identical tree topologies Figure 3A showsthe Neighbor-Joining trees reconstructed using theKa andKs

values respectively with Chara being designated as theoutgroup The topologies of these 2 trees strongly indicatethat excluding the bryophytes Physcomitrella and March-antia the seed plants (including Cycas and the 7 angio-sperms) and the angiosperms form nested monophyleticclades Within the angiosperms the monocots and eudicotsconstitute 2 distinct subclades Noting that sisterhood rela-tionship of Physcomitrella and Marchantia has to be treatedwith caution because the sampled seed plants share a longbranch In addition a recent multigene analysis (Qiu et al2006) based on dense taxon sampling suggested that horn-wort (including Marchantia) diverged before moss (includ-ing Physcomitrella) The Ka- and Ks-derived branch lengthsleading to Cycas are nearly equal (fig 3A) whereas the Ka-based branch lengths for the other species are strikinglyshorter than the corresponding Ks-based branches

Complete Mitochondria Genome of Cycas 605

Statistical analysis using a Z-test further indicated that Ka

value for the Cycas branch is higher than the average Ka

of the 7 studied angiosperms (P 005 Z-test) The ele-vated Ka in Cycas suggests that a rapid evolutiondiver-gence may have occurred in some of the protein-coding

genes of the Cycas mtDNA This result is consistent withthe observation that some genes in the Cycas mtDNA con-tain abundant RNA editing sites (see below)

A total of 39 protein-coding genes were identified inthe Cycas mtDNA which is the highest gene number

FIG 1mdash(A) Gene map of the Cycas taitungensis mtDNA Genes are color coded into 12 groups according to their biological functions Genes onthe outside and inside of the 2 circles are transcribed clockwise and counterclockwise respectively Predicted mtpts are indicated in deep green betweenthe 2 circles Intron-containing genes are indicated by asterisks and marked with exon numbers Degrees on the circular genome correspond to the linearmaps in (B) (B) Lengths and distribution of repeat sequences and single Bpu sequences across the entire Cycas mtDNA Upper linear presentation of(A) Middle the position of each repeat sequence (in blue) mapped onto the genome Lower the locations and lengths of tandem repeats detected usingthe tandem repeats finder Single Bpu sequences (including variants) and their tandem repeats are shown in red other tandem repeats are shown inblack

606 Chaw et al

Table 2Comparison of Features among 11 Elucidated Land Plant Mitochondrial Genomes

Taxon

Features Chara Marchantia Physcomitrella Cycas Triticum Oryza Zea Beta Nicotiana Brassica Arabidopsis

Size (bp) 67737 186609 105340 414903 452528 490520 569630 368799 430597 221853 366924A thorn T content () 591 576 594 531 557 562 561 561 550 548 552Gene numbera (protein-codingtRNArRNA)

3926339263

4127341293

3924339243

3922339263

3317335258

3518340276

32173b

34225b2918329225

3621339224

3317334173

3217333223

Coding sequences ()a 907 203 370 101 86 111 62 103 99 173 106Repeat sequences ()c 32 101 116 151 101 288 114 125 108 55 78Tandem repeatsequences ()d 022 049 014 497 042 009 035 071 005 036 034Introns

Group I 14 7 2 0 0 0 0 0 0 0 0Group IICis-spliced 13 25 25 20 17 17 15 14 17 18 18Trans-spliced 0 0 0 5 6 6 7 6 6 5 5

Cp-derivedsequences (bp) mdashe mdashe mdashe 18113 13455 22593 25132 mdashe 9942 7950 3958() mdashe mdashe mdashe 44 30 63 44 21 25 36 11RNA editing sites 0 0 mdashe 1084f mdashe 491 mdashe 370g mdashe 427 441

a Pseudogenes and unique ORFs (such as ORF222 reported in Handa 2003) were excluded Upper duplicate genes were counted only once Lower within curly brackets all duplicate genes were includedb The plasmid-localized tRNA-Trp was included (Clifton et al 2004)c The REPuter program (httpbibiservtechfakuni-bielefelddereputer) was used to obtain the estimates All duplicate copies of repeats are included Overlapped sequences were counted only onced The tandem repeats finder program (Benson 1999) was used to obtain the estimatese Data were not available from original papersf The PREP-mt program (Mower 2005) was used to obtain the estimates The cutoff value of the reported score was set to 06g 370 C-to-U editing sites were identified in 28 geneORF transcripts of sugar beet (Kubo et al 2000)

Co

mp

leteM

itoch

on

dria

Gen

om

eo

fCycas

60

7

identified to date among the studied seed plant mtDNAs(fig 3B) When the distributions of conserved genes inthe mtDNA from the 11 sampled species (supplementarytable S2 Supplementary Material online) are mapped totheir respective branches in a maximum parsimony tree(fig 3B based on the data used in fig 3A) it is possiblethen to estimate the time of loss of a particular gene Asshown in figure 3B our analysis indicates that there havebeen at least 31 independent events of gene loss from all theland plant mtDNAs elucidated to date

When a gene is missing from the mtDNA of a givenspecies it is generally believed that the original copy hasbeen transferred to nucleus where it functions throughcytosolic protein synthesis followed by transit peptidendashassisted import back to the mitochondria (Adams andPalmer 2003 Knoop 2004) Frequent gene losses espe-cially of the ribosomal protein genes (30 of 34) appearto have occurred after the divergence of the angiosperm lin-eages approximately 150 MYA (Chaw et al 2004) How-ever only 1 gene loss was observed in the Cycas lineageafter it branched off from the common ancestor of angio-sperms approximately 300 MYA Adams and Palmer(2003) suggested that angiosperm mtDNAs have experi-enced a recent evolutionary surge of loss andor transferof genes (primarily those encoding ribosomal proteins) tothe nucleus Our data give additional support to their con-tention but suggest that the mtDNA of Cycas appears tohave been excluded from this surge Our results suggest thatthe Cycas mtDNA tends to evolutionarily maintain its genediversity andor enjoys less gene transfer than other angio-sperm mtDNAs (Selosse et al 2001 Adams et al 2002Adams and Palmer 2003) Coincidentally among the some40 published cpDNAs of seed plants that of Cycas alsoundergoes the least gene loss (Wu et al 2007 supplemen-tary fig 1 [Supplementary Material online]) Future workwill be required to examine why the genomes of these 2Cycas organelles appear to have frozen after the cycads di-vergence from angiosperms especially in comparison tomtDNAs from other major gymnosperm clades such asGinkgo gnetophytes and pines

A Novel Family of Short Interspersed MitochondrialElements

Numerous Short Interspersed Mitochondrial ElementsTermed Bpu Sequences Are Present in the Cycas mtDNA

Sequence analysis revealed that numerous copies ofa 36-nt repeat herein designated as a lsquolsquoBpu sequenceele-mentrsquorsquo are interspersed throughout the Cycas mtDNA(fig 1B) Figure 4A shows the characteristic sequences of500 Bpu elements having 0 to 4 mismatches If up to a 7-ntmismatchto thedominant type isallowed the totalcopynum-ber of Bpu sequences increases to 512 The Bpu sequencesfeature 2 conserved terminal direct repeats (AAGG) and therecognition sequence for the restriction endonucleaseBpu10I (CCTGAAGC nt 15ndash21) These repeat elementssequences do not appear to have coding potential

Because Bpu elements are extremely short in theirlengths and terminal repeats and contain 2 terminal directrepeats rather than the inverted repeats found in plant min-iature inverted-repeat transposable elements (MITEs) theyare here classified as short interspersed mitochondrial ele-ments (SIMEs) This distinguishes them from the MITEsand short interspersed nuclear elements (SINEs eg AluDNA repeats in the genomes of primates) that have beenextensively reported from the genomes of plants and ani-mals (Feschotte et al 2002) Based on the similarity be-tween SIMEs and SINEs however we can hypothesizethat Bpu elements are likely to be transposed throughRNA without a requirement for reverse transcriptase

The Bpu10I recognition site characteristic of the Bpuelements is highly conserved except at the very last basepair that is the 21st bp of Bpu sequence (fig 4A lowerchart) In contrast the 5 terminal direct repeat and itsdownstream 6 nt (nt 5ndash9) tend to be more variable thanthe 3 terminal direct repeat Enigmatically that elementscan form a secondary structure (fig 4B) with a predictedfree energy of 125 (kcalmole) as calculated by theMFOLD program (httpfrontendbioinforpieduzukermcommentsFAQshtml) The significance of this secondarystructure remains to be elucidated Comparison of the Bpu

FIG 2mdashComparison of copy numbers among various tandem repeats in the mtDNAs of Cycas and 10 other land plants (A) Percentage of totalgenome comprised of tandem repeat sequences (B) The copy numbers of tandem repeat sequences in each genome (x axis) are separated into 5 groups(y axis) and their histograms (z axis) are shown The single-copy Bpu sequences of Cycas taitungensis described in figure 1 are not included in thecounts The copy numbers may not be integers because the boundaries of tandem repeats were determined by a probabilistic method described byBenson (1999)

608 Chaw et al

element insertion sites in the Cycas mtDNA with corre-sponding sites in the mtDNAs of other species reveals thatthe target sites for transpositions of Bpu elements are iden-tical (within 1ndash2 mismatched base pair) to the 5 terminaldirect repeat (fig 4CF)

Bpu Sequences Distinguish Cycas from Other Cycadsand Seed Plants

Bpu sequences are present exclusively in the noncod-ing regions of the Cycas mtDNA that is within the intronsand intergenic spacers with 1 exception a Bpu sequence ispresent within the coding region of rrn 18 which codes the18S rRNA (fig 4C) Among the many mitochodrial rrn18(mt-rrn18) genes available in GenBank only those of Cy-cas (including Cycas revoluta GenBank accession numberAB029356) and Ginkgo biloba (the only living species ofthe order Ginkgoales) have Bpu elements (fig 4C) TheBpu insertion sites of the 2 Cycas taxa are orthologous (datanot shown) whereas that of Ginkgo is different indicatingthat the insertions of Bpu elements in rrn18 did not occur in

the common ancestor of cycads and ginkgo In sequencedmtDNAs the oldest cpDNA-derived sequences (alsotermed mtpt) cluster trnV(uac)-trnM(cau)-atpE-atpB-rbcL is reported to have existed since the common ancestorof seed plants (Wang et al 2007 fig 3) Bpu elements arepresent in the Cycas mtpt-atpB (fig 4D) and mtpt-rbcL(fig 4E) sequences but not in the corresponding sequencesof the 7 angiosperm mtDNAs elucidated to date (Wang et al2007) suggesting that these Bpu insertions took place afterthe split of the Cycas lineage from the other seed plants

We further examined the occurrence of Bpu elementsin 1 of the other 2 cycad families Zamiaceae (including 1species from each of Dioon Macrozamia and Zamia) bysequencing the exons 1 and 2 of their nad2 genes Whereasthe nad2i1 gene of Cycas mtDNA contains 5 Bpu ele-ments such elements are absent from the nad2 genes ofthe 3 sampled Zamiaceae genera However we found 12 and 6 Bpu elements in mitochondrial nad5i1 of Cycaspanzhihuaensis (GenBank accession number AF43425)and mitochondrial nad1i1 and cp rps19-rpl16 spacersin-tron of C revoluta (GenBank accession number

FIG 3mdashPhylogenies inferred from concatenated data from 22 protein-coding genes (see supplementary table S2 Supplementary Material online)common to the sequenced mtDNAs from 10 land plants and Chara (the outgroup) Nodes received 100 bootstrap replicates unless indicated Branchlengths are drawn to scale except as otherwise noted (A) Two superimposed Neighbor-Joining trees based on the Ka and Ks values respectively (B)Scenarios of gene losses (open bar) and splits (hatch triangle) along the single maximum parsimony tree (9857 steps) The total numbers of protein-coding genes in each mtDNA species are given within parentheses

Complete Mitochondria Genome of Cycas 609

FIG 4mdashBpu sequences and examples of their insertion loci Target sites and terminal repeats are underlined (A) Upper a dominant sequencebased on 500 Bpu sequences (see text) Numbers along the abscissa indicate each nucleotide position within the prototype Bpu sequence Recognitionsites for the Bpu10I restriction enzyme are marked with asterisks The ordinate scales each by the total bits of information multiplied by its relativeoccurrence at that position (Wasserman and Sandelin 2004) Lower identities of Bpu sequences that differ from the dominant type by 0ndash4 bp (B)Secondary structure of the dominant Bpu sequence (C)ndash(F) lower cases periods and dashes denote mismatches identical bps and deletionsrespectively compared with the uppermost sequence (C) Partial alignment of mt-rrn18 sequences from Cycas and Ginkgo A Bpu sequence anda reversely complementary Bpu sequence are present in Ginkgo and Cycas respectively (D) Partial alignment of mtpt-atpB sequences extracted from

610 Chaw et al

AY354955 AY345867) respectively These data seem tosuggest that no Bpu sequence has successfully invaded themtDNAs of Cycadales genera other than Cycas

A Bpu-like sequence with a 1-bp insertion and 89similarity to the dominant type in Cycas mtDNA was re-trieved from the coding sequence of the mitochondrialrps11 gene from the core eudicot Weigela hortensis Align-ment of this Bpu-like sequence reveals that its predictedBpu10I endonuclease recognition site differs from that ofCycas by 2 bp (gCTGAGt) Because this Bpu element-likesequence does not interrupt the reading frame of rps11 andbecause the mitochondrial rps11 gene of Weigela lacksa target duplication site we do not consider that thisBpu-like element shares a common origin with the Bpu se-quences of Cycas We further postulate that Bpu elementsare likely absent from or very rare in angiosperms

Surprisingly the cpDNA of Cycas also contains 2 Bpuelements and each of them locates in the petN-psbM andpsbA-trnK spacers respectively (see Wu et al 2007) Ad-ditionally we also identified 1 Bpu sequence in theatpB-rbcL spacer of the cpDNA from Pinus luchuensis(GenBank accession number DQ196799) However noBpu sequence has been detected in the cpDNAs of the 3other Pinaceae genera and 3 gnetophyte orders we have se-quenced to date (Chaw SM Wu CS Lai YT Wang YN LinCP Liu SM unpublished data) Collectively the availableevidence inclines us to believe that the Bpu sequences haveproliferated specifically in the mtDNA of Cycas (or the Cy-cadaceae) The sporadic occurrence of Bpu elements in thecpDNAs of Cycas and Pinus suggests that they are likelyderived from nonhomologous recombination with DNAfragments that leaked out of mitochondria in the formercase and via lateral transfer in the latter case

The Tai Sequence and Its Association with BpuSequences

The second intron of the rps3 gene (rps3i2) is onlyfound in the mtDNAs of Cycas (Regina et al 2005) includ-ing those studied in the present work Regina et al (2005)suggested that rps3i2 is a group II intron that was indepen-dently gained in the gymnosperms likely at or just after thedivergence of the angiosperms Moreover the authors re-ported a high similarity between a partial segment of theCycas rps3i2 and orf760 of the Chara mtDNA (Turmelet al 2003) Orf760 harbors functional domains for a matur-ase and a reverse transcriptase prompting Regina et al(2005) to propose that the Cycas rps3i2 gene originally en-coded a maturase and a reverse transcriptase but evolvedover time into a partially degenerated ORF Here we fur-ther report that rps3i2 of the Cycas mtDNA contains a900-bp fragment comprising an array of 4 Bpu elementswith the Bpu elements on each end lacking 1 terminal re-peat followed by a 440-bp fragment (designated as lsquolsquoTairsquorsquo

sequence) a Worf760 sequence and 1 perfect Bpu se-quence (fig 5A) Most intriguingly additional Tai sequen-ces are scattered throughout the Cycas mtDNA they occurmore densely in longer spacers than in shorter ones (fig 5B)and are generally found in close association with repeatedBpu sequences These findings lend extra and robust sup-port to the proposal of Regina et al (2005)

In conclusion we hypothesize that a Tai sequence andan orf760 flanked by a Bpu sequence at each end mostlikely constitute an ancestral retrotransposon This hypoth-esis is founded on 2 observations 1) Tai sequences arehighly associated with Bpu elements and 2) small segmentsof Tai sequences are vigorously and diversely rearrangeddeleted or truncated as illustrated in the 3 examples shownin figure 5B These observations also suggest that an ances-tral retrotransposon in Cycas mtDNA spread very activelypresumably after Cycas branched off from the 10 other ex-tant cycad genera Afterward the offspring or duplicates ofthe ancestral transposon gradually may have lost their mo-bility through varying degrees of deletionstruncations fromthe 3 region which encoded both maturase and reversetranscriptase functions

We further speculate that even after the ancestral ret-rotransposon lost its transposon function the remainingBpu elements might have retained the ability to proliferateor amplify via unequal crossing-over or slipped-strand mis-pairing because their 2 direct terminal repeats can pair witheach otherrsquos complimentary strands Future studies and se-quencing of additional mtDNAs from basal Cycas will berequired to confirm this hypothesis and may provide addi-tional insight into the evolutionary origins of Bpu sequen-ces and the molecular mechanisms underlying theiramplification

The Fates of CpDNA-Derived Sequences (mtpts) inCycas mtDNA

Table 2 shows a comparative analysis of mtpts amongthe 11 studied plant mtDNAs The total percentage of mtptsin Cycas (44) is relatively high compared with that indicots (11ndash36) but falls within the range seen amongmonocots (30ndash63) We previously discovered that thefrequency of mtpt transfer is positively correlated with var-iations in mtDNA size (coefficient value r2 5 047) (Wanget al 2007) Here we report that the Cycas mtDNA con-tains 8 protein-coding genes as well as 2 rRNA and 5 tRNAgene sequences originating from the cpDNA (fig 1) How-ever frameshifts and indels within the protein-coding genessuggest that these Cycas mtpts have degenerated and arenonfunctional In contrast the 5 cpDNA-derived tRNAsin the Cycas mtDNA are able to fold into standard clover-leaf structures as shown by tRNAscan-SE analysis (Loweand Eddy 1997) and are thus likely to be functional Pre-viously Sugiyama et al (2005) used tRNAscan-SE to scan

Cycas Oryza Arabidopsis and Nicotiana Note that 2 Bpu sequence insertions are detected in Cycas but not in the other plants (E) Partial alignment ofmtpt-rbcL sequences extracted from Oryza Arabidopsis Nicotiana and Cycas The sequence from Cycas contains a Bpu-like sequence with a 5-bpinsertion (shaded box) versus the dominant type (F) Partial alignment of the mtpt-atpE and chloroplast atpE sequences of Cycas Two reverselycomplementary Bpu sequences are found in the mtDNA Note that the 5 Bpu sequence is partly degenerated at its 3 end

Complete Mitochondria Genome of Cycas 611

the tobacco mtDNA and concluded that the 6 cp-derivedtRNAs shared by angiosperms are functional Because nomtptwasobserved inCharaMarchantia orPhyscomitrellaWangetal (2007)concludedthatfrequentDNAtransferfromcpDNA to mtDNA has taken place no later than in the com-mon ancestor of seed plants approximately 300 MYA

Abundant RNA Editing Sites in Cycas mtDNA

Among the land plant DNAs Cycas has the most pre-dicted RNA editing sites (1084 sites supplementary tableS4 [Supplementary Material online]) It is commonly be-lieved that RNA editing arose together with the first terres-trial plants (Steinhauser et al 1999) By using the PREP-mtsoftware (Mower 2005) with the cutoff score set to 061084 sites within the protein-coding genes of the CycasmtDNA were predicted to be C-to-U RNA editing sitesThis is more than double the number of predicted siteswithin the elucidated mtDNAs of other land plants (table 2)

If the cutoff score which indicates the conservation degreeof each editing site compared those found in the other pub-lished plant mtDNAs is set to the most stringent criterion inPREP-mt (ie5 1) the number of editing sites decreases to738 which is still the largest number found among the landplant mtDNAs elucidated to date

It is believed that RNA editing is essential for func-tional protein expression as it is required to modify aminoacids or generate new start or stop codons (Hoch et al 1991Wintz and Hanson 1991 Kotera et al 2005 Shikanai2006) For this reason the large number of RNA editingsites in the Cycas mtDNA may indicate higher complexityat the DNA level and formation of various transcriptsthrough RNA editing thus potentially reflecting rapid di-vergence in Cycas

Although RNA editing sites are known to be sporad-ically distributed in the genomes of plant organellesmito-chondria from seed plants the mechanisms underlying thisdistribution pattern are not yet known (Shikanai 2006

FIG 5mdashGenomic organization of the second rps3 intron (rps3i2) and association of Tai sequences with Bpu sequences (A) Organization ofrps3i2 The segment between worf760 and the 4 tandem Bpu sequences is designated as a Tai sequence (gray) Arrowheads indicate a Bpu sequencelacking the 2 terminal repeats (banded bars) and its complementary sequence (B) Upper association of Tai sequences with Bpu tandem repeats acrossthe genome (abscissa) and their respective lengths (ordinate) Lower examples of 3 Tai variants (boxed) showing that the sequences are variouslydegenerate and truncated as compared with the typical Tai shown in (A) (broken line indicates mismatching sequences) Thin arrows indicate theorientations of homologous segments between Tai variants and the typical Tai

612 Chaw et al

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

aligned and then concatenated Gaps and stop codons wereremoved manually Divergence of nucleotide sequencebetween each pair of taxa was estimated in terms of thenumbers of substitutions per synonymous (Ks) or non-synonymous (Ka) site based on the PamilondashBianchindashLimethod implemented in MEGA 30 program (Kumaret al 2004) The Neighbor-Joining trees reconstructed withthe Ka values and Ks values were rooted at Chara The num-ber of bootstrap replicates was set to 500 All phylogeneticanalyses and tree reconstructions were performed usingMEGA 30

Results and DiscussionEvolution of mtDNA Organization in Land Plants

Characteristics of Cycas mtDNA and Insights into theEvolution of Land Plant mtDNAs

The complete mtDNA of C taitungensis is a circularmolecule of 414903 bp (fig 1) Table 2 which comparesthe main features of mtDNAs from a charophyte (Charavulgaris) 2 bryophytes (Marchantia and Physcomitrella)and 7 angiosperms shows that the Cycas mtDNA is about6- 22- and 40-fold larger than those of Chara Marchan-tia and Physcomitrella respectively but does not signifi-cantly differ from the average (414 plusmn 102 kb) of thepreviously elucidated angiosperm mtDNAs (P 5 0502)The Athorn T content of the Cycas mtDNA is 531 the low-est among known algae and land plants As further shown intable 2 the total numbers of protein- and tRNA-codinggenes decrease from charophytes (39 and 26 respectively)to seed plants (29ndash40 and 17ndash27 respectively) (detailedin supplementary table S2 Supplementary Materialonline) In contrast the numbers of rRNA gene speciesremain the same in all lineages with the exception ofobvious gene duplications in the Poaceae (grass family)and Beta lineages in which some tRNA genes are alsoduplicated

Table 2 also shows that noncoding sequences (spacersintrons and pseudogenes) account for 899 of the CycasmtDNA sequence consistent with the proportions found inother angiosperm mtDNAs (894 plusmn 31) As land plantsevolved from charophycean green algae (Chara 93) theclosest living relatives of land plants (Karol et al 2001)the noncoding sequences have drastically expanded in themtDNA showing abrupt increases in the bryophytes andthen in the seed plants As a result the genomic organiza-tions of mtDNAs are much less compact in seed plantswhen compared with lower plants

Repeated sequences that present in the genome as mul-tiple copies comprise approximately 151 of the CycasmtDNA (table 2) The repeats very few of which are over2-kb long are evenly distributed across the genome andmainly occur in the noncoding regions including the inter-genic spacers and introns (fig 1B) As shown in table 2most mtDNAs of land plants (except for rice) have 2ndash5times more repeated sequences than the mtDNA of Charawhereas more than a quarter of the rice mtDNA consists ofrepeated sequences Among the sampled plant mtDNAsthat of Cycas contains the highest percentage of tandem re-peats (497 fig 2A detailed in supplementary table S3

[Supplementary Material online]) these include a novelfamily of mobile elements the Bpu sequenceselements(see below) which are mainly found in 3- or 4-copy arrays(fig 2B) Ogihara et al (2005) noted the presence of manyrepeats in wheat mtDNA and hypothesized that alternativephysical structures may be adopted by wheat mtDNATherefore it seems logical to hypothesize that alternativecircular mtDNAs in various recombinant forms might co-exist in Cycas cells in vivo

No group I intron was detected in the mtDNA of Cycasor the other 7 previously elucidated angiosperm mtDNAs(table 2) This observation is consistent with Knooprsquos(2004) speculation that group I introns were lost fromthe common ancestor of hornworts and tracheophytes Al-though Cho et al (1998) demonstrated that a group I intronlocated in the cox1 gene is widespread among 48 angio-sperm genera the examined angiosperms represented anexceptionally patchy phylogenetic distribution and didnot include any of our sampled 7 angiosperms Further-more the authors estimated that the intron invaded thecox1 gene by cross-species horizontal transfer over 1000times during angiosperm evolution and lsquolsquois of entirely re-cent occurrencersquorsquo

In contrast we found 20ndash25 group II introns in themtDNAs of the examined land plants (table 2) nearly dou-ble the number found in those of their sister group the char-ophytes (13) Similar to the angiosperms the 3 genes nad1nad2 and nad5 in Cycas are disrupted by 5 group II intronsequences which have brought the genes into trans-splic-ing arrangements Studies of Malek et al (1997) and Malekand Knoop (1998) concluded that trans-spliced group II in-trons had evolved from formerly cis-spliced introns beforethe emergence of hornworts Most recently the evolvingdate was discovered to be even before the emergence ofmosses (Groth-Malonek et al 2005)

Gene Content and Evolution of Gene Loss in the mtDNAsof Land Plants

Our phylogenetic analysis using either nonsynony-mous (Ka) or synonymous (Ks) substitutions of 22 mito-chondrial protein-coding genes shared by the 11 studiedplants generated identical tree topologies Figure 3A showsthe Neighbor-Joining trees reconstructed using theKa andKs

values respectively with Chara being designated as theoutgroup The topologies of these 2 trees strongly indicatethat excluding the bryophytes Physcomitrella and March-antia the seed plants (including Cycas and the 7 angio-sperms) and the angiosperms form nested monophyleticclades Within the angiosperms the monocots and eudicotsconstitute 2 distinct subclades Noting that sisterhood rela-tionship of Physcomitrella and Marchantia has to be treatedwith caution because the sampled seed plants share a longbranch In addition a recent multigene analysis (Qiu et al2006) based on dense taxon sampling suggested that horn-wort (including Marchantia) diverged before moss (includ-ing Physcomitrella) The Ka- and Ks-derived branch lengthsleading to Cycas are nearly equal (fig 3A) whereas the Ka-based branch lengths for the other species are strikinglyshorter than the corresponding Ks-based branches

Complete Mitochondria Genome of Cycas 605

Statistical analysis using a Z-test further indicated that Ka

value for the Cycas branch is higher than the average Ka

of the 7 studied angiosperms (P 005 Z-test) The ele-vated Ka in Cycas suggests that a rapid evolutiondiver-gence may have occurred in some of the protein-coding

genes of the Cycas mtDNA This result is consistent withthe observation that some genes in the Cycas mtDNA con-tain abundant RNA editing sites (see below)

A total of 39 protein-coding genes were identified inthe Cycas mtDNA which is the highest gene number

FIG 1mdash(A) Gene map of the Cycas taitungensis mtDNA Genes are color coded into 12 groups according to their biological functions Genes onthe outside and inside of the 2 circles are transcribed clockwise and counterclockwise respectively Predicted mtpts are indicated in deep green betweenthe 2 circles Intron-containing genes are indicated by asterisks and marked with exon numbers Degrees on the circular genome correspond to the linearmaps in (B) (B) Lengths and distribution of repeat sequences and single Bpu sequences across the entire Cycas mtDNA Upper linear presentation of(A) Middle the position of each repeat sequence (in blue) mapped onto the genome Lower the locations and lengths of tandem repeats detected usingthe tandem repeats finder Single Bpu sequences (including variants) and their tandem repeats are shown in red other tandem repeats are shown inblack

606 Chaw et al

Table 2Comparison of Features among 11 Elucidated Land Plant Mitochondrial Genomes

Taxon

Features Chara Marchantia Physcomitrella Cycas Triticum Oryza Zea Beta Nicotiana Brassica Arabidopsis

Size (bp) 67737 186609 105340 414903 452528 490520 569630 368799 430597 221853 366924A thorn T content () 591 576 594 531 557 562 561 561 550 548 552Gene numbera (protein-codingtRNArRNA)

3926339263

4127341293

3924339243

3922339263

3317335258

3518340276

32173b

34225b2918329225

3621339224

3317334173

3217333223

Coding sequences ()a 907 203 370 101 86 111 62 103 99 173 106Repeat sequences ()c 32 101 116 151 101 288 114 125 108 55 78Tandem repeatsequences ()d 022 049 014 497 042 009 035 071 005 036 034Introns

Group I 14 7 2 0 0 0 0 0 0 0 0Group IICis-spliced 13 25 25 20 17 17 15 14 17 18 18Trans-spliced 0 0 0 5 6 6 7 6 6 5 5

Cp-derivedsequences (bp) mdashe mdashe mdashe 18113 13455 22593 25132 mdashe 9942 7950 3958() mdashe mdashe mdashe 44 30 63 44 21 25 36 11RNA editing sites 0 0 mdashe 1084f mdashe 491 mdashe 370g mdashe 427 441

a Pseudogenes and unique ORFs (such as ORF222 reported in Handa 2003) were excluded Upper duplicate genes were counted only once Lower within curly brackets all duplicate genes were includedb The plasmid-localized tRNA-Trp was included (Clifton et al 2004)c The REPuter program (httpbibiservtechfakuni-bielefelddereputer) was used to obtain the estimates All duplicate copies of repeats are included Overlapped sequences were counted only onced The tandem repeats finder program (Benson 1999) was used to obtain the estimatese Data were not available from original papersf The PREP-mt program (Mower 2005) was used to obtain the estimates The cutoff value of the reported score was set to 06g 370 C-to-U editing sites were identified in 28 geneORF transcripts of sugar beet (Kubo et al 2000)

Co

mp

leteM

itoch

on

dria

Gen

om

eo

fCycas

60

7

identified to date among the studied seed plant mtDNAs(fig 3B) When the distributions of conserved genes inthe mtDNA from the 11 sampled species (supplementarytable S2 Supplementary Material online) are mapped totheir respective branches in a maximum parsimony tree(fig 3B based on the data used in fig 3A) it is possiblethen to estimate the time of loss of a particular gene Asshown in figure 3B our analysis indicates that there havebeen at least 31 independent events of gene loss from all theland plant mtDNAs elucidated to date

When a gene is missing from the mtDNA of a givenspecies it is generally believed that the original copy hasbeen transferred to nucleus where it functions throughcytosolic protein synthesis followed by transit peptidendashassisted import back to the mitochondria (Adams andPalmer 2003 Knoop 2004) Frequent gene losses espe-cially of the ribosomal protein genes (30 of 34) appearto have occurred after the divergence of the angiosperm lin-eages approximately 150 MYA (Chaw et al 2004) How-ever only 1 gene loss was observed in the Cycas lineageafter it branched off from the common ancestor of angio-sperms approximately 300 MYA Adams and Palmer(2003) suggested that angiosperm mtDNAs have experi-enced a recent evolutionary surge of loss andor transferof genes (primarily those encoding ribosomal proteins) tothe nucleus Our data give additional support to their con-tention but suggest that the mtDNA of Cycas appears tohave been excluded from this surge Our results suggest thatthe Cycas mtDNA tends to evolutionarily maintain its genediversity andor enjoys less gene transfer than other angio-sperm mtDNAs (Selosse et al 2001 Adams et al 2002Adams and Palmer 2003) Coincidentally among the some40 published cpDNAs of seed plants that of Cycas alsoundergoes the least gene loss (Wu et al 2007 supplemen-tary fig 1 [Supplementary Material online]) Future workwill be required to examine why the genomes of these 2Cycas organelles appear to have frozen after the cycads di-vergence from angiosperms especially in comparison tomtDNAs from other major gymnosperm clades such asGinkgo gnetophytes and pines

A Novel Family of Short Interspersed MitochondrialElements

Numerous Short Interspersed Mitochondrial ElementsTermed Bpu Sequences Are Present in the Cycas mtDNA

Sequence analysis revealed that numerous copies ofa 36-nt repeat herein designated as a lsquolsquoBpu sequenceele-mentrsquorsquo are interspersed throughout the Cycas mtDNA(fig 1B) Figure 4A shows the characteristic sequences of500 Bpu elements having 0 to 4 mismatches If up to a 7-ntmismatchto thedominant type isallowed the totalcopynum-ber of Bpu sequences increases to 512 The Bpu sequencesfeature 2 conserved terminal direct repeats (AAGG) and therecognition sequence for the restriction endonucleaseBpu10I (CCTGAAGC nt 15ndash21) These repeat elementssequences do not appear to have coding potential

Because Bpu elements are extremely short in theirlengths and terminal repeats and contain 2 terminal directrepeats rather than the inverted repeats found in plant min-iature inverted-repeat transposable elements (MITEs) theyare here classified as short interspersed mitochondrial ele-ments (SIMEs) This distinguishes them from the MITEsand short interspersed nuclear elements (SINEs eg AluDNA repeats in the genomes of primates) that have beenextensively reported from the genomes of plants and ani-mals (Feschotte et al 2002) Based on the similarity be-tween SIMEs and SINEs however we can hypothesizethat Bpu elements are likely to be transposed throughRNA without a requirement for reverse transcriptase

The Bpu10I recognition site characteristic of the Bpuelements is highly conserved except at the very last basepair that is the 21st bp of Bpu sequence (fig 4A lowerchart) In contrast the 5 terminal direct repeat and itsdownstream 6 nt (nt 5ndash9) tend to be more variable thanthe 3 terminal direct repeat Enigmatically that elementscan form a secondary structure (fig 4B) with a predictedfree energy of 125 (kcalmole) as calculated by theMFOLD program (httpfrontendbioinforpieduzukermcommentsFAQshtml) The significance of this secondarystructure remains to be elucidated Comparison of the Bpu

FIG 2mdashComparison of copy numbers among various tandem repeats in the mtDNAs of Cycas and 10 other land plants (A) Percentage of totalgenome comprised of tandem repeat sequences (B) The copy numbers of tandem repeat sequences in each genome (x axis) are separated into 5 groups(y axis) and their histograms (z axis) are shown The single-copy Bpu sequences of Cycas taitungensis described in figure 1 are not included in thecounts The copy numbers may not be integers because the boundaries of tandem repeats were determined by a probabilistic method described byBenson (1999)

608 Chaw et al

element insertion sites in the Cycas mtDNA with corre-sponding sites in the mtDNAs of other species reveals thatthe target sites for transpositions of Bpu elements are iden-tical (within 1ndash2 mismatched base pair) to the 5 terminaldirect repeat (fig 4CF)

Bpu Sequences Distinguish Cycas from Other Cycadsand Seed Plants

Bpu sequences are present exclusively in the noncod-ing regions of the Cycas mtDNA that is within the intronsand intergenic spacers with 1 exception a Bpu sequence ispresent within the coding region of rrn 18 which codes the18S rRNA (fig 4C) Among the many mitochodrial rrn18(mt-rrn18) genes available in GenBank only those of Cy-cas (including Cycas revoluta GenBank accession numberAB029356) and Ginkgo biloba (the only living species ofthe order Ginkgoales) have Bpu elements (fig 4C) TheBpu insertion sites of the 2 Cycas taxa are orthologous (datanot shown) whereas that of Ginkgo is different indicatingthat the insertions of Bpu elements in rrn18 did not occur in

the common ancestor of cycads and ginkgo In sequencedmtDNAs the oldest cpDNA-derived sequences (alsotermed mtpt) cluster trnV(uac)-trnM(cau)-atpE-atpB-rbcL is reported to have existed since the common ancestorof seed plants (Wang et al 2007 fig 3) Bpu elements arepresent in the Cycas mtpt-atpB (fig 4D) and mtpt-rbcL(fig 4E) sequences but not in the corresponding sequencesof the 7 angiosperm mtDNAs elucidated to date (Wang et al2007) suggesting that these Bpu insertions took place afterthe split of the Cycas lineage from the other seed plants

We further examined the occurrence of Bpu elementsin 1 of the other 2 cycad families Zamiaceae (including 1species from each of Dioon Macrozamia and Zamia) bysequencing the exons 1 and 2 of their nad2 genes Whereasthe nad2i1 gene of Cycas mtDNA contains 5 Bpu ele-ments such elements are absent from the nad2 genes ofthe 3 sampled Zamiaceae genera However we found 12 and 6 Bpu elements in mitochondrial nad5i1 of Cycaspanzhihuaensis (GenBank accession number AF43425)and mitochondrial nad1i1 and cp rps19-rpl16 spacersin-tron of C revoluta (GenBank accession number

FIG 3mdashPhylogenies inferred from concatenated data from 22 protein-coding genes (see supplementary table S2 Supplementary Material online)common to the sequenced mtDNAs from 10 land plants and Chara (the outgroup) Nodes received 100 bootstrap replicates unless indicated Branchlengths are drawn to scale except as otherwise noted (A) Two superimposed Neighbor-Joining trees based on the Ka and Ks values respectively (B)Scenarios of gene losses (open bar) and splits (hatch triangle) along the single maximum parsimony tree (9857 steps) The total numbers of protein-coding genes in each mtDNA species are given within parentheses

Complete Mitochondria Genome of Cycas 609

FIG 4mdashBpu sequences and examples of their insertion loci Target sites and terminal repeats are underlined (A) Upper a dominant sequencebased on 500 Bpu sequences (see text) Numbers along the abscissa indicate each nucleotide position within the prototype Bpu sequence Recognitionsites for the Bpu10I restriction enzyme are marked with asterisks The ordinate scales each by the total bits of information multiplied by its relativeoccurrence at that position (Wasserman and Sandelin 2004) Lower identities of Bpu sequences that differ from the dominant type by 0ndash4 bp (B)Secondary structure of the dominant Bpu sequence (C)ndash(F) lower cases periods and dashes denote mismatches identical bps and deletionsrespectively compared with the uppermost sequence (C) Partial alignment of mt-rrn18 sequences from Cycas and Ginkgo A Bpu sequence anda reversely complementary Bpu sequence are present in Ginkgo and Cycas respectively (D) Partial alignment of mtpt-atpB sequences extracted from

610 Chaw et al

AY354955 AY345867) respectively These data seem tosuggest that no Bpu sequence has successfully invaded themtDNAs of Cycadales genera other than Cycas

A Bpu-like sequence with a 1-bp insertion and 89similarity to the dominant type in Cycas mtDNA was re-trieved from the coding sequence of the mitochondrialrps11 gene from the core eudicot Weigela hortensis Align-ment of this Bpu-like sequence reveals that its predictedBpu10I endonuclease recognition site differs from that ofCycas by 2 bp (gCTGAGt) Because this Bpu element-likesequence does not interrupt the reading frame of rps11 andbecause the mitochondrial rps11 gene of Weigela lacksa target duplication site we do not consider that thisBpu-like element shares a common origin with the Bpu se-quences of Cycas We further postulate that Bpu elementsare likely absent from or very rare in angiosperms

Surprisingly the cpDNA of Cycas also contains 2 Bpuelements and each of them locates in the petN-psbM andpsbA-trnK spacers respectively (see Wu et al 2007) Ad-ditionally we also identified 1 Bpu sequence in theatpB-rbcL spacer of the cpDNA from Pinus luchuensis(GenBank accession number DQ196799) However noBpu sequence has been detected in the cpDNAs of the 3other Pinaceae genera and 3 gnetophyte orders we have se-quenced to date (Chaw SM Wu CS Lai YT Wang YN LinCP Liu SM unpublished data) Collectively the availableevidence inclines us to believe that the Bpu sequences haveproliferated specifically in the mtDNA of Cycas (or the Cy-cadaceae) The sporadic occurrence of Bpu elements in thecpDNAs of Cycas and Pinus suggests that they are likelyderived from nonhomologous recombination with DNAfragments that leaked out of mitochondria in the formercase and via lateral transfer in the latter case

The Tai Sequence and Its Association with BpuSequences

The second intron of the rps3 gene (rps3i2) is onlyfound in the mtDNAs of Cycas (Regina et al 2005) includ-ing those studied in the present work Regina et al (2005)suggested that rps3i2 is a group II intron that was indepen-dently gained in the gymnosperms likely at or just after thedivergence of the angiosperms Moreover the authors re-ported a high similarity between a partial segment of theCycas rps3i2 and orf760 of the Chara mtDNA (Turmelet al 2003) Orf760 harbors functional domains for a matur-ase and a reverse transcriptase prompting Regina et al(2005) to propose that the Cycas rps3i2 gene originally en-coded a maturase and a reverse transcriptase but evolvedover time into a partially degenerated ORF Here we fur-ther report that rps3i2 of the Cycas mtDNA contains a900-bp fragment comprising an array of 4 Bpu elementswith the Bpu elements on each end lacking 1 terminal re-peat followed by a 440-bp fragment (designated as lsquolsquoTairsquorsquo

sequence) a Worf760 sequence and 1 perfect Bpu se-quence (fig 5A) Most intriguingly additional Tai sequen-ces are scattered throughout the Cycas mtDNA they occurmore densely in longer spacers than in shorter ones (fig 5B)and are generally found in close association with repeatedBpu sequences These findings lend extra and robust sup-port to the proposal of Regina et al (2005)

In conclusion we hypothesize that a Tai sequence andan orf760 flanked by a Bpu sequence at each end mostlikely constitute an ancestral retrotransposon This hypoth-esis is founded on 2 observations 1) Tai sequences arehighly associated with Bpu elements and 2) small segmentsof Tai sequences are vigorously and diversely rearrangeddeleted or truncated as illustrated in the 3 examples shownin figure 5B These observations also suggest that an ances-tral retrotransposon in Cycas mtDNA spread very activelypresumably after Cycas branched off from the 10 other ex-tant cycad genera Afterward the offspring or duplicates ofthe ancestral transposon gradually may have lost their mo-bility through varying degrees of deletionstruncations fromthe 3 region which encoded both maturase and reversetranscriptase functions

We further speculate that even after the ancestral ret-rotransposon lost its transposon function the remainingBpu elements might have retained the ability to proliferateor amplify via unequal crossing-over or slipped-strand mis-pairing because their 2 direct terminal repeats can pair witheach otherrsquos complimentary strands Future studies and se-quencing of additional mtDNAs from basal Cycas will berequired to confirm this hypothesis and may provide addi-tional insight into the evolutionary origins of Bpu sequen-ces and the molecular mechanisms underlying theiramplification

The Fates of CpDNA-Derived Sequences (mtpts) inCycas mtDNA

Table 2 shows a comparative analysis of mtpts amongthe 11 studied plant mtDNAs The total percentage of mtptsin Cycas (44) is relatively high compared with that indicots (11ndash36) but falls within the range seen amongmonocots (30ndash63) We previously discovered that thefrequency of mtpt transfer is positively correlated with var-iations in mtDNA size (coefficient value r2 5 047) (Wanget al 2007) Here we report that the Cycas mtDNA con-tains 8 protein-coding genes as well as 2 rRNA and 5 tRNAgene sequences originating from the cpDNA (fig 1) How-ever frameshifts and indels within the protein-coding genessuggest that these Cycas mtpts have degenerated and arenonfunctional In contrast the 5 cpDNA-derived tRNAsin the Cycas mtDNA are able to fold into standard clover-leaf structures as shown by tRNAscan-SE analysis (Loweand Eddy 1997) and are thus likely to be functional Pre-viously Sugiyama et al (2005) used tRNAscan-SE to scan

Cycas Oryza Arabidopsis and Nicotiana Note that 2 Bpu sequence insertions are detected in Cycas but not in the other plants (E) Partial alignment ofmtpt-rbcL sequences extracted from Oryza Arabidopsis Nicotiana and Cycas The sequence from Cycas contains a Bpu-like sequence with a 5-bpinsertion (shaded box) versus the dominant type (F) Partial alignment of the mtpt-atpE and chloroplast atpE sequences of Cycas Two reverselycomplementary Bpu sequences are found in the mtDNA Note that the 5 Bpu sequence is partly degenerated at its 3 end

Complete Mitochondria Genome of Cycas 611

the tobacco mtDNA and concluded that the 6 cp-derivedtRNAs shared by angiosperms are functional Because nomtptwasobserved inCharaMarchantia orPhyscomitrellaWangetal (2007)concludedthatfrequentDNAtransferfromcpDNA to mtDNA has taken place no later than in the com-mon ancestor of seed plants approximately 300 MYA

Abundant RNA Editing Sites in Cycas mtDNA

Among the land plant DNAs Cycas has the most pre-dicted RNA editing sites (1084 sites supplementary tableS4 [Supplementary Material online]) It is commonly be-lieved that RNA editing arose together with the first terres-trial plants (Steinhauser et al 1999) By using the PREP-mtsoftware (Mower 2005) with the cutoff score set to 061084 sites within the protein-coding genes of the CycasmtDNA were predicted to be C-to-U RNA editing sitesThis is more than double the number of predicted siteswithin the elucidated mtDNAs of other land plants (table 2)

If the cutoff score which indicates the conservation degreeof each editing site compared those found in the other pub-lished plant mtDNAs is set to the most stringent criterion inPREP-mt (ie5 1) the number of editing sites decreases to738 which is still the largest number found among the landplant mtDNAs elucidated to date

It is believed that RNA editing is essential for func-tional protein expression as it is required to modify aminoacids or generate new start or stop codons (Hoch et al 1991Wintz and Hanson 1991 Kotera et al 2005 Shikanai2006) For this reason the large number of RNA editingsites in the Cycas mtDNA may indicate higher complexityat the DNA level and formation of various transcriptsthrough RNA editing thus potentially reflecting rapid di-vergence in Cycas

Although RNA editing sites are known to be sporad-ically distributed in the genomes of plant organellesmito-chondria from seed plants the mechanisms underlying thisdistribution pattern are not yet known (Shikanai 2006

FIG 5mdashGenomic organization of the second rps3 intron (rps3i2) and association of Tai sequences with Bpu sequences (A) Organization ofrps3i2 The segment between worf760 and the 4 tandem Bpu sequences is designated as a Tai sequence (gray) Arrowheads indicate a Bpu sequencelacking the 2 terminal repeats (banded bars) and its complementary sequence (B) Upper association of Tai sequences with Bpu tandem repeats acrossthe genome (abscissa) and their respective lengths (ordinate) Lower examples of 3 Tai variants (boxed) showing that the sequences are variouslydegenerate and truncated as compared with the typical Tai shown in (A) (broken line indicates mismatching sequences) Thin arrows indicate theorientations of homologous segments between Tai variants and the typical Tai

612 Chaw et al

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

Statistical analysis using a Z-test further indicated that Ka

value for the Cycas branch is higher than the average Ka

of the 7 studied angiosperms (P 005 Z-test) The ele-vated Ka in Cycas suggests that a rapid evolutiondiver-gence may have occurred in some of the protein-coding

genes of the Cycas mtDNA This result is consistent withthe observation that some genes in the Cycas mtDNA con-tain abundant RNA editing sites (see below)

A total of 39 protein-coding genes were identified inthe Cycas mtDNA which is the highest gene number

FIG 1mdash(A) Gene map of the Cycas taitungensis mtDNA Genes are color coded into 12 groups according to their biological functions Genes onthe outside and inside of the 2 circles are transcribed clockwise and counterclockwise respectively Predicted mtpts are indicated in deep green betweenthe 2 circles Intron-containing genes are indicated by asterisks and marked with exon numbers Degrees on the circular genome correspond to the linearmaps in (B) (B) Lengths and distribution of repeat sequences and single Bpu sequences across the entire Cycas mtDNA Upper linear presentation of(A) Middle the position of each repeat sequence (in blue) mapped onto the genome Lower the locations and lengths of tandem repeats detected usingthe tandem repeats finder Single Bpu sequences (including variants) and their tandem repeats are shown in red other tandem repeats are shown inblack

606 Chaw et al

Table 2Comparison of Features among 11 Elucidated Land Plant Mitochondrial Genomes

Taxon

Features Chara Marchantia Physcomitrella Cycas Triticum Oryza Zea Beta Nicotiana Brassica Arabidopsis

Size (bp) 67737 186609 105340 414903 452528 490520 569630 368799 430597 221853 366924A thorn T content () 591 576 594 531 557 562 561 561 550 548 552Gene numbera (protein-codingtRNArRNA)

3926339263

4127341293

3924339243

3922339263

3317335258

3518340276

32173b

34225b2918329225

3621339224

3317334173

3217333223

Coding sequences ()a 907 203 370 101 86 111 62 103 99 173 106Repeat sequences ()c 32 101 116 151 101 288 114 125 108 55 78Tandem repeatsequences ()d 022 049 014 497 042 009 035 071 005 036 034Introns

Group I 14 7 2 0 0 0 0 0 0 0 0Group IICis-spliced 13 25 25 20 17 17 15 14 17 18 18Trans-spliced 0 0 0 5 6 6 7 6 6 5 5

Cp-derivedsequences (bp) mdashe mdashe mdashe 18113 13455 22593 25132 mdashe 9942 7950 3958() mdashe mdashe mdashe 44 30 63 44 21 25 36 11RNA editing sites 0 0 mdashe 1084f mdashe 491 mdashe 370g mdashe 427 441

a Pseudogenes and unique ORFs (such as ORF222 reported in Handa 2003) were excluded Upper duplicate genes were counted only once Lower within curly brackets all duplicate genes were includedb The plasmid-localized tRNA-Trp was included (Clifton et al 2004)c The REPuter program (httpbibiservtechfakuni-bielefelddereputer) was used to obtain the estimates All duplicate copies of repeats are included Overlapped sequences were counted only onced The tandem repeats finder program (Benson 1999) was used to obtain the estimatese Data were not available from original papersf The PREP-mt program (Mower 2005) was used to obtain the estimates The cutoff value of the reported score was set to 06g 370 C-to-U editing sites were identified in 28 geneORF transcripts of sugar beet (Kubo et al 2000)

Co

mp

leteM

itoch

on

dria

Gen

om

eo

fCycas

60

7

identified to date among the studied seed plant mtDNAs(fig 3B) When the distributions of conserved genes inthe mtDNA from the 11 sampled species (supplementarytable S2 Supplementary Material online) are mapped totheir respective branches in a maximum parsimony tree(fig 3B based on the data used in fig 3A) it is possiblethen to estimate the time of loss of a particular gene Asshown in figure 3B our analysis indicates that there havebeen at least 31 independent events of gene loss from all theland plant mtDNAs elucidated to date

When a gene is missing from the mtDNA of a givenspecies it is generally believed that the original copy hasbeen transferred to nucleus where it functions throughcytosolic protein synthesis followed by transit peptidendashassisted import back to the mitochondria (Adams andPalmer 2003 Knoop 2004) Frequent gene losses espe-cially of the ribosomal protein genes (30 of 34) appearto have occurred after the divergence of the angiosperm lin-eages approximately 150 MYA (Chaw et al 2004) How-ever only 1 gene loss was observed in the Cycas lineageafter it branched off from the common ancestor of angio-sperms approximately 300 MYA Adams and Palmer(2003) suggested that angiosperm mtDNAs have experi-enced a recent evolutionary surge of loss andor transferof genes (primarily those encoding ribosomal proteins) tothe nucleus Our data give additional support to their con-tention but suggest that the mtDNA of Cycas appears tohave been excluded from this surge Our results suggest thatthe Cycas mtDNA tends to evolutionarily maintain its genediversity andor enjoys less gene transfer than other angio-sperm mtDNAs (Selosse et al 2001 Adams et al 2002Adams and Palmer 2003) Coincidentally among the some40 published cpDNAs of seed plants that of Cycas alsoundergoes the least gene loss (Wu et al 2007 supplemen-tary fig 1 [Supplementary Material online]) Future workwill be required to examine why the genomes of these 2Cycas organelles appear to have frozen after the cycads di-vergence from angiosperms especially in comparison tomtDNAs from other major gymnosperm clades such asGinkgo gnetophytes and pines

A Novel Family of Short Interspersed MitochondrialElements

Numerous Short Interspersed Mitochondrial ElementsTermed Bpu Sequences Are Present in the Cycas mtDNA

Sequence analysis revealed that numerous copies ofa 36-nt repeat herein designated as a lsquolsquoBpu sequenceele-mentrsquorsquo are interspersed throughout the Cycas mtDNA(fig 1B) Figure 4A shows the characteristic sequences of500 Bpu elements having 0 to 4 mismatches If up to a 7-ntmismatchto thedominant type isallowed the totalcopynum-ber of Bpu sequences increases to 512 The Bpu sequencesfeature 2 conserved terminal direct repeats (AAGG) and therecognition sequence for the restriction endonucleaseBpu10I (CCTGAAGC nt 15ndash21) These repeat elementssequences do not appear to have coding potential

Because Bpu elements are extremely short in theirlengths and terminal repeats and contain 2 terminal directrepeats rather than the inverted repeats found in plant min-iature inverted-repeat transposable elements (MITEs) theyare here classified as short interspersed mitochondrial ele-ments (SIMEs) This distinguishes them from the MITEsand short interspersed nuclear elements (SINEs eg AluDNA repeats in the genomes of primates) that have beenextensively reported from the genomes of plants and ani-mals (Feschotte et al 2002) Based on the similarity be-tween SIMEs and SINEs however we can hypothesizethat Bpu elements are likely to be transposed throughRNA without a requirement for reverse transcriptase

The Bpu10I recognition site characteristic of the Bpuelements is highly conserved except at the very last basepair that is the 21st bp of Bpu sequence (fig 4A lowerchart) In contrast the 5 terminal direct repeat and itsdownstream 6 nt (nt 5ndash9) tend to be more variable thanthe 3 terminal direct repeat Enigmatically that elementscan form a secondary structure (fig 4B) with a predictedfree energy of 125 (kcalmole) as calculated by theMFOLD program (httpfrontendbioinforpieduzukermcommentsFAQshtml) The significance of this secondarystructure remains to be elucidated Comparison of the Bpu

FIG 2mdashComparison of copy numbers among various tandem repeats in the mtDNAs of Cycas and 10 other land plants (A) Percentage of totalgenome comprised of tandem repeat sequences (B) The copy numbers of tandem repeat sequences in each genome (x axis) are separated into 5 groups(y axis) and their histograms (z axis) are shown The single-copy Bpu sequences of Cycas taitungensis described in figure 1 are not included in thecounts The copy numbers may not be integers because the boundaries of tandem repeats were determined by a probabilistic method described byBenson (1999)

608 Chaw et al

element insertion sites in the Cycas mtDNA with corre-sponding sites in the mtDNAs of other species reveals thatthe target sites for transpositions of Bpu elements are iden-tical (within 1ndash2 mismatched base pair) to the 5 terminaldirect repeat (fig 4CF)

Bpu Sequences Distinguish Cycas from Other Cycadsand Seed Plants

Bpu sequences are present exclusively in the noncod-ing regions of the Cycas mtDNA that is within the intronsand intergenic spacers with 1 exception a Bpu sequence ispresent within the coding region of rrn 18 which codes the18S rRNA (fig 4C) Among the many mitochodrial rrn18(mt-rrn18) genes available in GenBank only those of Cy-cas (including Cycas revoluta GenBank accession numberAB029356) and Ginkgo biloba (the only living species ofthe order Ginkgoales) have Bpu elements (fig 4C) TheBpu insertion sites of the 2 Cycas taxa are orthologous (datanot shown) whereas that of Ginkgo is different indicatingthat the insertions of Bpu elements in rrn18 did not occur in

the common ancestor of cycads and ginkgo In sequencedmtDNAs the oldest cpDNA-derived sequences (alsotermed mtpt) cluster trnV(uac)-trnM(cau)-atpE-atpB-rbcL is reported to have existed since the common ancestorof seed plants (Wang et al 2007 fig 3) Bpu elements arepresent in the Cycas mtpt-atpB (fig 4D) and mtpt-rbcL(fig 4E) sequences but not in the corresponding sequencesof the 7 angiosperm mtDNAs elucidated to date (Wang et al2007) suggesting that these Bpu insertions took place afterthe split of the Cycas lineage from the other seed plants

We further examined the occurrence of Bpu elementsin 1 of the other 2 cycad families Zamiaceae (including 1species from each of Dioon Macrozamia and Zamia) bysequencing the exons 1 and 2 of their nad2 genes Whereasthe nad2i1 gene of Cycas mtDNA contains 5 Bpu ele-ments such elements are absent from the nad2 genes ofthe 3 sampled Zamiaceae genera However we found 12 and 6 Bpu elements in mitochondrial nad5i1 of Cycaspanzhihuaensis (GenBank accession number AF43425)and mitochondrial nad1i1 and cp rps19-rpl16 spacersin-tron of C revoluta (GenBank accession number

FIG 3mdashPhylogenies inferred from concatenated data from 22 protein-coding genes (see supplementary table S2 Supplementary Material online)common to the sequenced mtDNAs from 10 land plants and Chara (the outgroup) Nodes received 100 bootstrap replicates unless indicated Branchlengths are drawn to scale except as otherwise noted (A) Two superimposed Neighbor-Joining trees based on the Ka and Ks values respectively (B)Scenarios of gene losses (open bar) and splits (hatch triangle) along the single maximum parsimony tree (9857 steps) The total numbers of protein-coding genes in each mtDNA species are given within parentheses

Complete Mitochondria Genome of Cycas 609

FIG 4mdashBpu sequences and examples of their insertion loci Target sites and terminal repeats are underlined (A) Upper a dominant sequencebased on 500 Bpu sequences (see text) Numbers along the abscissa indicate each nucleotide position within the prototype Bpu sequence Recognitionsites for the Bpu10I restriction enzyme are marked with asterisks The ordinate scales each by the total bits of information multiplied by its relativeoccurrence at that position (Wasserman and Sandelin 2004) Lower identities of Bpu sequences that differ from the dominant type by 0ndash4 bp (B)Secondary structure of the dominant Bpu sequence (C)ndash(F) lower cases periods and dashes denote mismatches identical bps and deletionsrespectively compared with the uppermost sequence (C) Partial alignment of mt-rrn18 sequences from Cycas and Ginkgo A Bpu sequence anda reversely complementary Bpu sequence are present in Ginkgo and Cycas respectively (D) Partial alignment of mtpt-atpB sequences extracted from

610 Chaw et al

AY354955 AY345867) respectively These data seem tosuggest that no Bpu sequence has successfully invaded themtDNAs of Cycadales genera other than Cycas

A Bpu-like sequence with a 1-bp insertion and 89similarity to the dominant type in Cycas mtDNA was re-trieved from the coding sequence of the mitochondrialrps11 gene from the core eudicot Weigela hortensis Align-ment of this Bpu-like sequence reveals that its predictedBpu10I endonuclease recognition site differs from that ofCycas by 2 bp (gCTGAGt) Because this Bpu element-likesequence does not interrupt the reading frame of rps11 andbecause the mitochondrial rps11 gene of Weigela lacksa target duplication site we do not consider that thisBpu-like element shares a common origin with the Bpu se-quences of Cycas We further postulate that Bpu elementsare likely absent from or very rare in angiosperms

Surprisingly the cpDNA of Cycas also contains 2 Bpuelements and each of them locates in the petN-psbM andpsbA-trnK spacers respectively (see Wu et al 2007) Ad-ditionally we also identified 1 Bpu sequence in theatpB-rbcL spacer of the cpDNA from Pinus luchuensis(GenBank accession number DQ196799) However noBpu sequence has been detected in the cpDNAs of the 3other Pinaceae genera and 3 gnetophyte orders we have se-quenced to date (Chaw SM Wu CS Lai YT Wang YN LinCP Liu SM unpublished data) Collectively the availableevidence inclines us to believe that the Bpu sequences haveproliferated specifically in the mtDNA of Cycas (or the Cy-cadaceae) The sporadic occurrence of Bpu elements in thecpDNAs of Cycas and Pinus suggests that they are likelyderived from nonhomologous recombination with DNAfragments that leaked out of mitochondria in the formercase and via lateral transfer in the latter case

The Tai Sequence and Its Association with BpuSequences

The second intron of the rps3 gene (rps3i2) is onlyfound in the mtDNAs of Cycas (Regina et al 2005) includ-ing those studied in the present work Regina et al (2005)suggested that rps3i2 is a group II intron that was indepen-dently gained in the gymnosperms likely at or just after thedivergence of the angiosperms Moreover the authors re-ported a high similarity between a partial segment of theCycas rps3i2 and orf760 of the Chara mtDNA (Turmelet al 2003) Orf760 harbors functional domains for a matur-ase and a reverse transcriptase prompting Regina et al(2005) to propose that the Cycas rps3i2 gene originally en-coded a maturase and a reverse transcriptase but evolvedover time into a partially degenerated ORF Here we fur-ther report that rps3i2 of the Cycas mtDNA contains a900-bp fragment comprising an array of 4 Bpu elementswith the Bpu elements on each end lacking 1 terminal re-peat followed by a 440-bp fragment (designated as lsquolsquoTairsquorsquo

sequence) a Worf760 sequence and 1 perfect Bpu se-quence (fig 5A) Most intriguingly additional Tai sequen-ces are scattered throughout the Cycas mtDNA they occurmore densely in longer spacers than in shorter ones (fig 5B)and are generally found in close association with repeatedBpu sequences These findings lend extra and robust sup-port to the proposal of Regina et al (2005)

In conclusion we hypothesize that a Tai sequence andan orf760 flanked by a Bpu sequence at each end mostlikely constitute an ancestral retrotransposon This hypoth-esis is founded on 2 observations 1) Tai sequences arehighly associated with Bpu elements and 2) small segmentsof Tai sequences are vigorously and diversely rearrangeddeleted or truncated as illustrated in the 3 examples shownin figure 5B These observations also suggest that an ances-tral retrotransposon in Cycas mtDNA spread very activelypresumably after Cycas branched off from the 10 other ex-tant cycad genera Afterward the offspring or duplicates ofthe ancestral transposon gradually may have lost their mo-bility through varying degrees of deletionstruncations fromthe 3 region which encoded both maturase and reversetranscriptase functions

We further speculate that even after the ancestral ret-rotransposon lost its transposon function the remainingBpu elements might have retained the ability to proliferateor amplify via unequal crossing-over or slipped-strand mis-pairing because their 2 direct terminal repeats can pair witheach otherrsquos complimentary strands Future studies and se-quencing of additional mtDNAs from basal Cycas will berequired to confirm this hypothesis and may provide addi-tional insight into the evolutionary origins of Bpu sequen-ces and the molecular mechanisms underlying theiramplification

The Fates of CpDNA-Derived Sequences (mtpts) inCycas mtDNA

Table 2 shows a comparative analysis of mtpts amongthe 11 studied plant mtDNAs The total percentage of mtptsin Cycas (44) is relatively high compared with that indicots (11ndash36) but falls within the range seen amongmonocots (30ndash63) We previously discovered that thefrequency of mtpt transfer is positively correlated with var-iations in mtDNA size (coefficient value r2 5 047) (Wanget al 2007) Here we report that the Cycas mtDNA con-tains 8 protein-coding genes as well as 2 rRNA and 5 tRNAgene sequences originating from the cpDNA (fig 1) How-ever frameshifts and indels within the protein-coding genessuggest that these Cycas mtpts have degenerated and arenonfunctional In contrast the 5 cpDNA-derived tRNAsin the Cycas mtDNA are able to fold into standard clover-leaf structures as shown by tRNAscan-SE analysis (Loweand Eddy 1997) and are thus likely to be functional Pre-viously Sugiyama et al (2005) used tRNAscan-SE to scan

Cycas Oryza Arabidopsis and Nicotiana Note that 2 Bpu sequence insertions are detected in Cycas but not in the other plants (E) Partial alignment ofmtpt-rbcL sequences extracted from Oryza Arabidopsis Nicotiana and Cycas The sequence from Cycas contains a Bpu-like sequence with a 5-bpinsertion (shaded box) versus the dominant type (F) Partial alignment of the mtpt-atpE and chloroplast atpE sequences of Cycas Two reverselycomplementary Bpu sequences are found in the mtDNA Note that the 5 Bpu sequence is partly degenerated at its 3 end

Complete Mitochondria Genome of Cycas 611

the tobacco mtDNA and concluded that the 6 cp-derivedtRNAs shared by angiosperms are functional Because nomtptwasobserved inCharaMarchantia orPhyscomitrellaWangetal (2007)concludedthatfrequentDNAtransferfromcpDNA to mtDNA has taken place no later than in the com-mon ancestor of seed plants approximately 300 MYA

Abundant RNA Editing Sites in Cycas mtDNA

Among the land plant DNAs Cycas has the most pre-dicted RNA editing sites (1084 sites supplementary tableS4 [Supplementary Material online]) It is commonly be-lieved that RNA editing arose together with the first terres-trial plants (Steinhauser et al 1999) By using the PREP-mtsoftware (Mower 2005) with the cutoff score set to 061084 sites within the protein-coding genes of the CycasmtDNA were predicted to be C-to-U RNA editing sitesThis is more than double the number of predicted siteswithin the elucidated mtDNAs of other land plants (table 2)

If the cutoff score which indicates the conservation degreeof each editing site compared those found in the other pub-lished plant mtDNAs is set to the most stringent criterion inPREP-mt (ie5 1) the number of editing sites decreases to738 which is still the largest number found among the landplant mtDNAs elucidated to date

It is believed that RNA editing is essential for func-tional protein expression as it is required to modify aminoacids or generate new start or stop codons (Hoch et al 1991Wintz and Hanson 1991 Kotera et al 2005 Shikanai2006) For this reason the large number of RNA editingsites in the Cycas mtDNA may indicate higher complexityat the DNA level and formation of various transcriptsthrough RNA editing thus potentially reflecting rapid di-vergence in Cycas

Although RNA editing sites are known to be sporad-ically distributed in the genomes of plant organellesmito-chondria from seed plants the mechanisms underlying thisdistribution pattern are not yet known (Shikanai 2006

FIG 5mdashGenomic organization of the second rps3 intron (rps3i2) and association of Tai sequences with Bpu sequences (A) Organization ofrps3i2 The segment between worf760 and the 4 tandem Bpu sequences is designated as a Tai sequence (gray) Arrowheads indicate a Bpu sequencelacking the 2 terminal repeats (banded bars) and its complementary sequence (B) Upper association of Tai sequences with Bpu tandem repeats acrossthe genome (abscissa) and their respective lengths (ordinate) Lower examples of 3 Tai variants (boxed) showing that the sequences are variouslydegenerate and truncated as compared with the typical Tai shown in (A) (broken line indicates mismatching sequences) Thin arrows indicate theorientations of homologous segments between Tai variants and the typical Tai

612 Chaw et al

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

Table 2Comparison of Features among 11 Elucidated Land Plant Mitochondrial Genomes

Taxon

Features Chara Marchantia Physcomitrella Cycas Triticum Oryza Zea Beta Nicotiana Brassica Arabidopsis

Size (bp) 67737 186609 105340 414903 452528 490520 569630 368799 430597 221853 366924A thorn T content () 591 576 594 531 557 562 561 561 550 548 552Gene numbera (protein-codingtRNArRNA)

3926339263

4127341293

3924339243

3922339263

3317335258

3518340276

32173b

34225b2918329225

3621339224

3317334173

3217333223

Coding sequences ()a 907 203 370 101 86 111 62 103 99 173 106Repeat sequences ()c 32 101 116 151 101 288 114 125 108 55 78Tandem repeatsequences ()d 022 049 014 497 042 009 035 071 005 036 034Introns

Group I 14 7 2 0 0 0 0 0 0 0 0Group IICis-spliced 13 25 25 20 17 17 15 14 17 18 18Trans-spliced 0 0 0 5 6 6 7 6 6 5 5

Cp-derivedsequences (bp) mdashe mdashe mdashe 18113 13455 22593 25132 mdashe 9942 7950 3958() mdashe mdashe mdashe 44 30 63 44 21 25 36 11RNA editing sites 0 0 mdashe 1084f mdashe 491 mdashe 370g mdashe 427 441

a Pseudogenes and unique ORFs (such as ORF222 reported in Handa 2003) were excluded Upper duplicate genes were counted only once Lower within curly brackets all duplicate genes were includedb The plasmid-localized tRNA-Trp was included (Clifton et al 2004)c The REPuter program (httpbibiservtechfakuni-bielefelddereputer) was used to obtain the estimates All duplicate copies of repeats are included Overlapped sequences were counted only onced The tandem repeats finder program (Benson 1999) was used to obtain the estimatese Data were not available from original papersf The PREP-mt program (Mower 2005) was used to obtain the estimates The cutoff value of the reported score was set to 06g 370 C-to-U editing sites were identified in 28 geneORF transcripts of sugar beet (Kubo et al 2000)

Co

mp

leteM

itoch

on

dria

Gen

om

eo

fCycas

60

7

identified to date among the studied seed plant mtDNAs(fig 3B) When the distributions of conserved genes inthe mtDNA from the 11 sampled species (supplementarytable S2 Supplementary Material online) are mapped totheir respective branches in a maximum parsimony tree(fig 3B based on the data used in fig 3A) it is possiblethen to estimate the time of loss of a particular gene Asshown in figure 3B our analysis indicates that there havebeen at least 31 independent events of gene loss from all theland plant mtDNAs elucidated to date

When a gene is missing from the mtDNA of a givenspecies it is generally believed that the original copy hasbeen transferred to nucleus where it functions throughcytosolic protein synthesis followed by transit peptidendashassisted import back to the mitochondria (Adams andPalmer 2003 Knoop 2004) Frequent gene losses espe-cially of the ribosomal protein genes (30 of 34) appearto have occurred after the divergence of the angiosperm lin-eages approximately 150 MYA (Chaw et al 2004) How-ever only 1 gene loss was observed in the Cycas lineageafter it branched off from the common ancestor of angio-sperms approximately 300 MYA Adams and Palmer(2003) suggested that angiosperm mtDNAs have experi-enced a recent evolutionary surge of loss andor transferof genes (primarily those encoding ribosomal proteins) tothe nucleus Our data give additional support to their con-tention but suggest that the mtDNA of Cycas appears tohave been excluded from this surge Our results suggest thatthe Cycas mtDNA tends to evolutionarily maintain its genediversity andor enjoys less gene transfer than other angio-sperm mtDNAs (Selosse et al 2001 Adams et al 2002Adams and Palmer 2003) Coincidentally among the some40 published cpDNAs of seed plants that of Cycas alsoundergoes the least gene loss (Wu et al 2007 supplemen-tary fig 1 [Supplementary Material online]) Future workwill be required to examine why the genomes of these 2Cycas organelles appear to have frozen after the cycads di-vergence from angiosperms especially in comparison tomtDNAs from other major gymnosperm clades such asGinkgo gnetophytes and pines

A Novel Family of Short Interspersed MitochondrialElements

Numerous Short Interspersed Mitochondrial ElementsTermed Bpu Sequences Are Present in the Cycas mtDNA

Sequence analysis revealed that numerous copies ofa 36-nt repeat herein designated as a lsquolsquoBpu sequenceele-mentrsquorsquo are interspersed throughout the Cycas mtDNA(fig 1B) Figure 4A shows the characteristic sequences of500 Bpu elements having 0 to 4 mismatches If up to a 7-ntmismatchto thedominant type isallowed the totalcopynum-ber of Bpu sequences increases to 512 The Bpu sequencesfeature 2 conserved terminal direct repeats (AAGG) and therecognition sequence for the restriction endonucleaseBpu10I (CCTGAAGC nt 15ndash21) These repeat elementssequences do not appear to have coding potential

Because Bpu elements are extremely short in theirlengths and terminal repeats and contain 2 terminal directrepeats rather than the inverted repeats found in plant min-iature inverted-repeat transposable elements (MITEs) theyare here classified as short interspersed mitochondrial ele-ments (SIMEs) This distinguishes them from the MITEsand short interspersed nuclear elements (SINEs eg AluDNA repeats in the genomes of primates) that have beenextensively reported from the genomes of plants and ani-mals (Feschotte et al 2002) Based on the similarity be-tween SIMEs and SINEs however we can hypothesizethat Bpu elements are likely to be transposed throughRNA without a requirement for reverse transcriptase

The Bpu10I recognition site characteristic of the Bpuelements is highly conserved except at the very last basepair that is the 21st bp of Bpu sequence (fig 4A lowerchart) In contrast the 5 terminal direct repeat and itsdownstream 6 nt (nt 5ndash9) tend to be more variable thanthe 3 terminal direct repeat Enigmatically that elementscan form a secondary structure (fig 4B) with a predictedfree energy of 125 (kcalmole) as calculated by theMFOLD program (httpfrontendbioinforpieduzukermcommentsFAQshtml) The significance of this secondarystructure remains to be elucidated Comparison of the Bpu

FIG 2mdashComparison of copy numbers among various tandem repeats in the mtDNAs of Cycas and 10 other land plants (A) Percentage of totalgenome comprised of tandem repeat sequences (B) The copy numbers of tandem repeat sequences in each genome (x axis) are separated into 5 groups(y axis) and their histograms (z axis) are shown The single-copy Bpu sequences of Cycas taitungensis described in figure 1 are not included in thecounts The copy numbers may not be integers because the boundaries of tandem repeats were determined by a probabilistic method described byBenson (1999)

608 Chaw et al

element insertion sites in the Cycas mtDNA with corre-sponding sites in the mtDNAs of other species reveals thatthe target sites for transpositions of Bpu elements are iden-tical (within 1ndash2 mismatched base pair) to the 5 terminaldirect repeat (fig 4CF)

Bpu Sequences Distinguish Cycas from Other Cycadsand Seed Plants

Bpu sequences are present exclusively in the noncod-ing regions of the Cycas mtDNA that is within the intronsand intergenic spacers with 1 exception a Bpu sequence ispresent within the coding region of rrn 18 which codes the18S rRNA (fig 4C) Among the many mitochodrial rrn18(mt-rrn18) genes available in GenBank only those of Cy-cas (including Cycas revoluta GenBank accession numberAB029356) and Ginkgo biloba (the only living species ofthe order Ginkgoales) have Bpu elements (fig 4C) TheBpu insertion sites of the 2 Cycas taxa are orthologous (datanot shown) whereas that of Ginkgo is different indicatingthat the insertions of Bpu elements in rrn18 did not occur in

the common ancestor of cycads and ginkgo In sequencedmtDNAs the oldest cpDNA-derived sequences (alsotermed mtpt) cluster trnV(uac)-trnM(cau)-atpE-atpB-rbcL is reported to have existed since the common ancestorof seed plants (Wang et al 2007 fig 3) Bpu elements arepresent in the Cycas mtpt-atpB (fig 4D) and mtpt-rbcL(fig 4E) sequences but not in the corresponding sequencesof the 7 angiosperm mtDNAs elucidated to date (Wang et al2007) suggesting that these Bpu insertions took place afterthe split of the Cycas lineage from the other seed plants

We further examined the occurrence of Bpu elementsin 1 of the other 2 cycad families Zamiaceae (including 1species from each of Dioon Macrozamia and Zamia) bysequencing the exons 1 and 2 of their nad2 genes Whereasthe nad2i1 gene of Cycas mtDNA contains 5 Bpu ele-ments such elements are absent from the nad2 genes ofthe 3 sampled Zamiaceae genera However we found 12 and 6 Bpu elements in mitochondrial nad5i1 of Cycaspanzhihuaensis (GenBank accession number AF43425)and mitochondrial nad1i1 and cp rps19-rpl16 spacersin-tron of C revoluta (GenBank accession number

FIG 3mdashPhylogenies inferred from concatenated data from 22 protein-coding genes (see supplementary table S2 Supplementary Material online)common to the sequenced mtDNAs from 10 land plants and Chara (the outgroup) Nodes received 100 bootstrap replicates unless indicated Branchlengths are drawn to scale except as otherwise noted (A) Two superimposed Neighbor-Joining trees based on the Ka and Ks values respectively (B)Scenarios of gene losses (open bar) and splits (hatch triangle) along the single maximum parsimony tree (9857 steps) The total numbers of protein-coding genes in each mtDNA species are given within parentheses

Complete Mitochondria Genome of Cycas 609

FIG 4mdashBpu sequences and examples of their insertion loci Target sites and terminal repeats are underlined (A) Upper a dominant sequencebased on 500 Bpu sequences (see text) Numbers along the abscissa indicate each nucleotide position within the prototype Bpu sequence Recognitionsites for the Bpu10I restriction enzyme are marked with asterisks The ordinate scales each by the total bits of information multiplied by its relativeoccurrence at that position (Wasserman and Sandelin 2004) Lower identities of Bpu sequences that differ from the dominant type by 0ndash4 bp (B)Secondary structure of the dominant Bpu sequence (C)ndash(F) lower cases periods and dashes denote mismatches identical bps and deletionsrespectively compared with the uppermost sequence (C) Partial alignment of mt-rrn18 sequences from Cycas and Ginkgo A Bpu sequence anda reversely complementary Bpu sequence are present in Ginkgo and Cycas respectively (D) Partial alignment of mtpt-atpB sequences extracted from

610 Chaw et al

AY354955 AY345867) respectively These data seem tosuggest that no Bpu sequence has successfully invaded themtDNAs of Cycadales genera other than Cycas

A Bpu-like sequence with a 1-bp insertion and 89similarity to the dominant type in Cycas mtDNA was re-trieved from the coding sequence of the mitochondrialrps11 gene from the core eudicot Weigela hortensis Align-ment of this Bpu-like sequence reveals that its predictedBpu10I endonuclease recognition site differs from that ofCycas by 2 bp (gCTGAGt) Because this Bpu element-likesequence does not interrupt the reading frame of rps11 andbecause the mitochondrial rps11 gene of Weigela lacksa target duplication site we do not consider that thisBpu-like element shares a common origin with the Bpu se-quences of Cycas We further postulate that Bpu elementsare likely absent from or very rare in angiosperms

Surprisingly the cpDNA of Cycas also contains 2 Bpuelements and each of them locates in the petN-psbM andpsbA-trnK spacers respectively (see Wu et al 2007) Ad-ditionally we also identified 1 Bpu sequence in theatpB-rbcL spacer of the cpDNA from Pinus luchuensis(GenBank accession number DQ196799) However noBpu sequence has been detected in the cpDNAs of the 3other Pinaceae genera and 3 gnetophyte orders we have se-quenced to date (Chaw SM Wu CS Lai YT Wang YN LinCP Liu SM unpublished data) Collectively the availableevidence inclines us to believe that the Bpu sequences haveproliferated specifically in the mtDNA of Cycas (or the Cy-cadaceae) The sporadic occurrence of Bpu elements in thecpDNAs of Cycas and Pinus suggests that they are likelyderived from nonhomologous recombination with DNAfragments that leaked out of mitochondria in the formercase and via lateral transfer in the latter case

The Tai Sequence and Its Association with BpuSequences

The second intron of the rps3 gene (rps3i2) is onlyfound in the mtDNAs of Cycas (Regina et al 2005) includ-ing those studied in the present work Regina et al (2005)suggested that rps3i2 is a group II intron that was indepen-dently gained in the gymnosperms likely at or just after thedivergence of the angiosperms Moreover the authors re-ported a high similarity between a partial segment of theCycas rps3i2 and orf760 of the Chara mtDNA (Turmelet al 2003) Orf760 harbors functional domains for a matur-ase and a reverse transcriptase prompting Regina et al(2005) to propose that the Cycas rps3i2 gene originally en-coded a maturase and a reverse transcriptase but evolvedover time into a partially degenerated ORF Here we fur-ther report that rps3i2 of the Cycas mtDNA contains a900-bp fragment comprising an array of 4 Bpu elementswith the Bpu elements on each end lacking 1 terminal re-peat followed by a 440-bp fragment (designated as lsquolsquoTairsquorsquo

sequence) a Worf760 sequence and 1 perfect Bpu se-quence (fig 5A) Most intriguingly additional Tai sequen-ces are scattered throughout the Cycas mtDNA they occurmore densely in longer spacers than in shorter ones (fig 5B)and are generally found in close association with repeatedBpu sequences These findings lend extra and robust sup-port to the proposal of Regina et al (2005)

In conclusion we hypothesize that a Tai sequence andan orf760 flanked by a Bpu sequence at each end mostlikely constitute an ancestral retrotransposon This hypoth-esis is founded on 2 observations 1) Tai sequences arehighly associated with Bpu elements and 2) small segmentsof Tai sequences are vigorously and diversely rearrangeddeleted or truncated as illustrated in the 3 examples shownin figure 5B These observations also suggest that an ances-tral retrotransposon in Cycas mtDNA spread very activelypresumably after Cycas branched off from the 10 other ex-tant cycad genera Afterward the offspring or duplicates ofthe ancestral transposon gradually may have lost their mo-bility through varying degrees of deletionstruncations fromthe 3 region which encoded both maturase and reversetranscriptase functions

We further speculate that even after the ancestral ret-rotransposon lost its transposon function the remainingBpu elements might have retained the ability to proliferateor amplify via unequal crossing-over or slipped-strand mis-pairing because their 2 direct terminal repeats can pair witheach otherrsquos complimentary strands Future studies and se-quencing of additional mtDNAs from basal Cycas will berequired to confirm this hypothesis and may provide addi-tional insight into the evolutionary origins of Bpu sequen-ces and the molecular mechanisms underlying theiramplification

The Fates of CpDNA-Derived Sequences (mtpts) inCycas mtDNA

Table 2 shows a comparative analysis of mtpts amongthe 11 studied plant mtDNAs The total percentage of mtptsin Cycas (44) is relatively high compared with that indicots (11ndash36) but falls within the range seen amongmonocots (30ndash63) We previously discovered that thefrequency of mtpt transfer is positively correlated with var-iations in mtDNA size (coefficient value r2 5 047) (Wanget al 2007) Here we report that the Cycas mtDNA con-tains 8 protein-coding genes as well as 2 rRNA and 5 tRNAgene sequences originating from the cpDNA (fig 1) How-ever frameshifts and indels within the protein-coding genessuggest that these Cycas mtpts have degenerated and arenonfunctional In contrast the 5 cpDNA-derived tRNAsin the Cycas mtDNA are able to fold into standard clover-leaf structures as shown by tRNAscan-SE analysis (Loweand Eddy 1997) and are thus likely to be functional Pre-viously Sugiyama et al (2005) used tRNAscan-SE to scan

Cycas Oryza Arabidopsis and Nicotiana Note that 2 Bpu sequence insertions are detected in Cycas but not in the other plants (E) Partial alignment ofmtpt-rbcL sequences extracted from Oryza Arabidopsis Nicotiana and Cycas The sequence from Cycas contains a Bpu-like sequence with a 5-bpinsertion (shaded box) versus the dominant type (F) Partial alignment of the mtpt-atpE and chloroplast atpE sequences of Cycas Two reverselycomplementary Bpu sequences are found in the mtDNA Note that the 5 Bpu sequence is partly degenerated at its 3 end

Complete Mitochondria Genome of Cycas 611

the tobacco mtDNA and concluded that the 6 cp-derivedtRNAs shared by angiosperms are functional Because nomtptwasobserved inCharaMarchantia orPhyscomitrellaWangetal (2007)concludedthatfrequentDNAtransferfromcpDNA to mtDNA has taken place no later than in the com-mon ancestor of seed plants approximately 300 MYA

Abundant RNA Editing Sites in Cycas mtDNA

Among the land plant DNAs Cycas has the most pre-dicted RNA editing sites (1084 sites supplementary tableS4 [Supplementary Material online]) It is commonly be-lieved that RNA editing arose together with the first terres-trial plants (Steinhauser et al 1999) By using the PREP-mtsoftware (Mower 2005) with the cutoff score set to 061084 sites within the protein-coding genes of the CycasmtDNA were predicted to be C-to-U RNA editing sitesThis is more than double the number of predicted siteswithin the elucidated mtDNAs of other land plants (table 2)

If the cutoff score which indicates the conservation degreeof each editing site compared those found in the other pub-lished plant mtDNAs is set to the most stringent criterion inPREP-mt (ie5 1) the number of editing sites decreases to738 which is still the largest number found among the landplant mtDNAs elucidated to date

It is believed that RNA editing is essential for func-tional protein expression as it is required to modify aminoacids or generate new start or stop codons (Hoch et al 1991Wintz and Hanson 1991 Kotera et al 2005 Shikanai2006) For this reason the large number of RNA editingsites in the Cycas mtDNA may indicate higher complexityat the DNA level and formation of various transcriptsthrough RNA editing thus potentially reflecting rapid di-vergence in Cycas

Although RNA editing sites are known to be sporad-ically distributed in the genomes of plant organellesmito-chondria from seed plants the mechanisms underlying thisdistribution pattern are not yet known (Shikanai 2006

FIG 5mdashGenomic organization of the second rps3 intron (rps3i2) and association of Tai sequences with Bpu sequences (A) Organization ofrps3i2 The segment between worf760 and the 4 tandem Bpu sequences is designated as a Tai sequence (gray) Arrowheads indicate a Bpu sequencelacking the 2 terminal repeats (banded bars) and its complementary sequence (B) Upper association of Tai sequences with Bpu tandem repeats acrossthe genome (abscissa) and their respective lengths (ordinate) Lower examples of 3 Tai variants (boxed) showing that the sequences are variouslydegenerate and truncated as compared with the typical Tai shown in (A) (broken line indicates mismatching sequences) Thin arrows indicate theorientations of homologous segments between Tai variants and the typical Tai

612 Chaw et al

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

identified to date among the studied seed plant mtDNAs(fig 3B) When the distributions of conserved genes inthe mtDNA from the 11 sampled species (supplementarytable S2 Supplementary Material online) are mapped totheir respective branches in a maximum parsimony tree(fig 3B based on the data used in fig 3A) it is possiblethen to estimate the time of loss of a particular gene Asshown in figure 3B our analysis indicates that there havebeen at least 31 independent events of gene loss from all theland plant mtDNAs elucidated to date

When a gene is missing from the mtDNA of a givenspecies it is generally believed that the original copy hasbeen transferred to nucleus where it functions throughcytosolic protein synthesis followed by transit peptidendashassisted import back to the mitochondria (Adams andPalmer 2003 Knoop 2004) Frequent gene losses espe-cially of the ribosomal protein genes (30 of 34) appearto have occurred after the divergence of the angiosperm lin-eages approximately 150 MYA (Chaw et al 2004) How-ever only 1 gene loss was observed in the Cycas lineageafter it branched off from the common ancestor of angio-sperms approximately 300 MYA Adams and Palmer(2003) suggested that angiosperm mtDNAs have experi-enced a recent evolutionary surge of loss andor transferof genes (primarily those encoding ribosomal proteins) tothe nucleus Our data give additional support to their con-tention but suggest that the mtDNA of Cycas appears tohave been excluded from this surge Our results suggest thatthe Cycas mtDNA tends to evolutionarily maintain its genediversity andor enjoys less gene transfer than other angio-sperm mtDNAs (Selosse et al 2001 Adams et al 2002Adams and Palmer 2003) Coincidentally among the some40 published cpDNAs of seed plants that of Cycas alsoundergoes the least gene loss (Wu et al 2007 supplemen-tary fig 1 [Supplementary Material online]) Future workwill be required to examine why the genomes of these 2Cycas organelles appear to have frozen after the cycads di-vergence from angiosperms especially in comparison tomtDNAs from other major gymnosperm clades such asGinkgo gnetophytes and pines

A Novel Family of Short Interspersed MitochondrialElements

Numerous Short Interspersed Mitochondrial ElementsTermed Bpu Sequences Are Present in the Cycas mtDNA

Sequence analysis revealed that numerous copies ofa 36-nt repeat herein designated as a lsquolsquoBpu sequenceele-mentrsquorsquo are interspersed throughout the Cycas mtDNA(fig 1B) Figure 4A shows the characteristic sequences of500 Bpu elements having 0 to 4 mismatches If up to a 7-ntmismatchto thedominant type isallowed the totalcopynum-ber of Bpu sequences increases to 512 The Bpu sequencesfeature 2 conserved terminal direct repeats (AAGG) and therecognition sequence for the restriction endonucleaseBpu10I (CCTGAAGC nt 15ndash21) These repeat elementssequences do not appear to have coding potential

Because Bpu elements are extremely short in theirlengths and terminal repeats and contain 2 terminal directrepeats rather than the inverted repeats found in plant min-iature inverted-repeat transposable elements (MITEs) theyare here classified as short interspersed mitochondrial ele-ments (SIMEs) This distinguishes them from the MITEsand short interspersed nuclear elements (SINEs eg AluDNA repeats in the genomes of primates) that have beenextensively reported from the genomes of plants and ani-mals (Feschotte et al 2002) Based on the similarity be-tween SIMEs and SINEs however we can hypothesizethat Bpu elements are likely to be transposed throughRNA without a requirement for reverse transcriptase

The Bpu10I recognition site characteristic of the Bpuelements is highly conserved except at the very last basepair that is the 21st bp of Bpu sequence (fig 4A lowerchart) In contrast the 5 terminal direct repeat and itsdownstream 6 nt (nt 5ndash9) tend to be more variable thanthe 3 terminal direct repeat Enigmatically that elementscan form a secondary structure (fig 4B) with a predictedfree energy of 125 (kcalmole) as calculated by theMFOLD program (httpfrontendbioinforpieduzukermcommentsFAQshtml) The significance of this secondarystructure remains to be elucidated Comparison of the Bpu

FIG 2mdashComparison of copy numbers among various tandem repeats in the mtDNAs of Cycas and 10 other land plants (A) Percentage of totalgenome comprised of tandem repeat sequences (B) The copy numbers of tandem repeat sequences in each genome (x axis) are separated into 5 groups(y axis) and their histograms (z axis) are shown The single-copy Bpu sequences of Cycas taitungensis described in figure 1 are not included in thecounts The copy numbers may not be integers because the boundaries of tandem repeats were determined by a probabilistic method described byBenson (1999)

608 Chaw et al

element insertion sites in the Cycas mtDNA with corre-sponding sites in the mtDNAs of other species reveals thatthe target sites for transpositions of Bpu elements are iden-tical (within 1ndash2 mismatched base pair) to the 5 terminaldirect repeat (fig 4CF)

Bpu Sequences Distinguish Cycas from Other Cycadsand Seed Plants

Bpu sequences are present exclusively in the noncod-ing regions of the Cycas mtDNA that is within the intronsand intergenic spacers with 1 exception a Bpu sequence ispresent within the coding region of rrn 18 which codes the18S rRNA (fig 4C) Among the many mitochodrial rrn18(mt-rrn18) genes available in GenBank only those of Cy-cas (including Cycas revoluta GenBank accession numberAB029356) and Ginkgo biloba (the only living species ofthe order Ginkgoales) have Bpu elements (fig 4C) TheBpu insertion sites of the 2 Cycas taxa are orthologous (datanot shown) whereas that of Ginkgo is different indicatingthat the insertions of Bpu elements in rrn18 did not occur in

the common ancestor of cycads and ginkgo In sequencedmtDNAs the oldest cpDNA-derived sequences (alsotermed mtpt) cluster trnV(uac)-trnM(cau)-atpE-atpB-rbcL is reported to have existed since the common ancestorof seed plants (Wang et al 2007 fig 3) Bpu elements arepresent in the Cycas mtpt-atpB (fig 4D) and mtpt-rbcL(fig 4E) sequences but not in the corresponding sequencesof the 7 angiosperm mtDNAs elucidated to date (Wang et al2007) suggesting that these Bpu insertions took place afterthe split of the Cycas lineage from the other seed plants

We further examined the occurrence of Bpu elementsin 1 of the other 2 cycad families Zamiaceae (including 1species from each of Dioon Macrozamia and Zamia) bysequencing the exons 1 and 2 of their nad2 genes Whereasthe nad2i1 gene of Cycas mtDNA contains 5 Bpu ele-ments such elements are absent from the nad2 genes ofthe 3 sampled Zamiaceae genera However we found 12 and 6 Bpu elements in mitochondrial nad5i1 of Cycaspanzhihuaensis (GenBank accession number AF43425)and mitochondrial nad1i1 and cp rps19-rpl16 spacersin-tron of C revoluta (GenBank accession number

FIG 3mdashPhylogenies inferred from concatenated data from 22 protein-coding genes (see supplementary table S2 Supplementary Material online)common to the sequenced mtDNAs from 10 land plants and Chara (the outgroup) Nodes received 100 bootstrap replicates unless indicated Branchlengths are drawn to scale except as otherwise noted (A) Two superimposed Neighbor-Joining trees based on the Ka and Ks values respectively (B)Scenarios of gene losses (open bar) and splits (hatch triangle) along the single maximum parsimony tree (9857 steps) The total numbers of protein-coding genes in each mtDNA species are given within parentheses

Complete Mitochondria Genome of Cycas 609

FIG 4mdashBpu sequences and examples of their insertion loci Target sites and terminal repeats are underlined (A) Upper a dominant sequencebased on 500 Bpu sequences (see text) Numbers along the abscissa indicate each nucleotide position within the prototype Bpu sequence Recognitionsites for the Bpu10I restriction enzyme are marked with asterisks The ordinate scales each by the total bits of information multiplied by its relativeoccurrence at that position (Wasserman and Sandelin 2004) Lower identities of Bpu sequences that differ from the dominant type by 0ndash4 bp (B)Secondary structure of the dominant Bpu sequence (C)ndash(F) lower cases periods and dashes denote mismatches identical bps and deletionsrespectively compared with the uppermost sequence (C) Partial alignment of mt-rrn18 sequences from Cycas and Ginkgo A Bpu sequence anda reversely complementary Bpu sequence are present in Ginkgo and Cycas respectively (D) Partial alignment of mtpt-atpB sequences extracted from

610 Chaw et al

AY354955 AY345867) respectively These data seem tosuggest that no Bpu sequence has successfully invaded themtDNAs of Cycadales genera other than Cycas

A Bpu-like sequence with a 1-bp insertion and 89similarity to the dominant type in Cycas mtDNA was re-trieved from the coding sequence of the mitochondrialrps11 gene from the core eudicot Weigela hortensis Align-ment of this Bpu-like sequence reveals that its predictedBpu10I endonuclease recognition site differs from that ofCycas by 2 bp (gCTGAGt) Because this Bpu element-likesequence does not interrupt the reading frame of rps11 andbecause the mitochondrial rps11 gene of Weigela lacksa target duplication site we do not consider that thisBpu-like element shares a common origin with the Bpu se-quences of Cycas We further postulate that Bpu elementsare likely absent from or very rare in angiosperms

Surprisingly the cpDNA of Cycas also contains 2 Bpuelements and each of them locates in the petN-psbM andpsbA-trnK spacers respectively (see Wu et al 2007) Ad-ditionally we also identified 1 Bpu sequence in theatpB-rbcL spacer of the cpDNA from Pinus luchuensis(GenBank accession number DQ196799) However noBpu sequence has been detected in the cpDNAs of the 3other Pinaceae genera and 3 gnetophyte orders we have se-quenced to date (Chaw SM Wu CS Lai YT Wang YN LinCP Liu SM unpublished data) Collectively the availableevidence inclines us to believe that the Bpu sequences haveproliferated specifically in the mtDNA of Cycas (or the Cy-cadaceae) The sporadic occurrence of Bpu elements in thecpDNAs of Cycas and Pinus suggests that they are likelyderived from nonhomologous recombination with DNAfragments that leaked out of mitochondria in the formercase and via lateral transfer in the latter case

The Tai Sequence and Its Association with BpuSequences

The second intron of the rps3 gene (rps3i2) is onlyfound in the mtDNAs of Cycas (Regina et al 2005) includ-ing those studied in the present work Regina et al (2005)suggested that rps3i2 is a group II intron that was indepen-dently gained in the gymnosperms likely at or just after thedivergence of the angiosperms Moreover the authors re-ported a high similarity between a partial segment of theCycas rps3i2 and orf760 of the Chara mtDNA (Turmelet al 2003) Orf760 harbors functional domains for a matur-ase and a reverse transcriptase prompting Regina et al(2005) to propose that the Cycas rps3i2 gene originally en-coded a maturase and a reverse transcriptase but evolvedover time into a partially degenerated ORF Here we fur-ther report that rps3i2 of the Cycas mtDNA contains a900-bp fragment comprising an array of 4 Bpu elementswith the Bpu elements on each end lacking 1 terminal re-peat followed by a 440-bp fragment (designated as lsquolsquoTairsquorsquo

sequence) a Worf760 sequence and 1 perfect Bpu se-quence (fig 5A) Most intriguingly additional Tai sequen-ces are scattered throughout the Cycas mtDNA they occurmore densely in longer spacers than in shorter ones (fig 5B)and are generally found in close association with repeatedBpu sequences These findings lend extra and robust sup-port to the proposal of Regina et al (2005)

In conclusion we hypothesize that a Tai sequence andan orf760 flanked by a Bpu sequence at each end mostlikely constitute an ancestral retrotransposon This hypoth-esis is founded on 2 observations 1) Tai sequences arehighly associated with Bpu elements and 2) small segmentsof Tai sequences are vigorously and diversely rearrangeddeleted or truncated as illustrated in the 3 examples shownin figure 5B These observations also suggest that an ances-tral retrotransposon in Cycas mtDNA spread very activelypresumably after Cycas branched off from the 10 other ex-tant cycad genera Afterward the offspring or duplicates ofthe ancestral transposon gradually may have lost their mo-bility through varying degrees of deletionstruncations fromthe 3 region which encoded both maturase and reversetranscriptase functions

We further speculate that even after the ancestral ret-rotransposon lost its transposon function the remainingBpu elements might have retained the ability to proliferateor amplify via unequal crossing-over or slipped-strand mis-pairing because their 2 direct terminal repeats can pair witheach otherrsquos complimentary strands Future studies and se-quencing of additional mtDNAs from basal Cycas will berequired to confirm this hypothesis and may provide addi-tional insight into the evolutionary origins of Bpu sequen-ces and the molecular mechanisms underlying theiramplification

The Fates of CpDNA-Derived Sequences (mtpts) inCycas mtDNA

Table 2 shows a comparative analysis of mtpts amongthe 11 studied plant mtDNAs The total percentage of mtptsin Cycas (44) is relatively high compared with that indicots (11ndash36) but falls within the range seen amongmonocots (30ndash63) We previously discovered that thefrequency of mtpt transfer is positively correlated with var-iations in mtDNA size (coefficient value r2 5 047) (Wanget al 2007) Here we report that the Cycas mtDNA con-tains 8 protein-coding genes as well as 2 rRNA and 5 tRNAgene sequences originating from the cpDNA (fig 1) How-ever frameshifts and indels within the protein-coding genessuggest that these Cycas mtpts have degenerated and arenonfunctional In contrast the 5 cpDNA-derived tRNAsin the Cycas mtDNA are able to fold into standard clover-leaf structures as shown by tRNAscan-SE analysis (Loweand Eddy 1997) and are thus likely to be functional Pre-viously Sugiyama et al (2005) used tRNAscan-SE to scan

Cycas Oryza Arabidopsis and Nicotiana Note that 2 Bpu sequence insertions are detected in Cycas but not in the other plants (E) Partial alignment ofmtpt-rbcL sequences extracted from Oryza Arabidopsis Nicotiana and Cycas The sequence from Cycas contains a Bpu-like sequence with a 5-bpinsertion (shaded box) versus the dominant type (F) Partial alignment of the mtpt-atpE and chloroplast atpE sequences of Cycas Two reverselycomplementary Bpu sequences are found in the mtDNA Note that the 5 Bpu sequence is partly degenerated at its 3 end

Complete Mitochondria Genome of Cycas 611

the tobacco mtDNA and concluded that the 6 cp-derivedtRNAs shared by angiosperms are functional Because nomtptwasobserved inCharaMarchantia orPhyscomitrellaWangetal (2007)concludedthatfrequentDNAtransferfromcpDNA to mtDNA has taken place no later than in the com-mon ancestor of seed plants approximately 300 MYA

Abundant RNA Editing Sites in Cycas mtDNA

Among the land plant DNAs Cycas has the most pre-dicted RNA editing sites (1084 sites supplementary tableS4 [Supplementary Material online]) It is commonly be-lieved that RNA editing arose together with the first terres-trial plants (Steinhauser et al 1999) By using the PREP-mtsoftware (Mower 2005) with the cutoff score set to 061084 sites within the protein-coding genes of the CycasmtDNA were predicted to be C-to-U RNA editing sitesThis is more than double the number of predicted siteswithin the elucidated mtDNAs of other land plants (table 2)

If the cutoff score which indicates the conservation degreeof each editing site compared those found in the other pub-lished plant mtDNAs is set to the most stringent criterion inPREP-mt (ie5 1) the number of editing sites decreases to738 which is still the largest number found among the landplant mtDNAs elucidated to date

It is believed that RNA editing is essential for func-tional protein expression as it is required to modify aminoacids or generate new start or stop codons (Hoch et al 1991Wintz and Hanson 1991 Kotera et al 2005 Shikanai2006) For this reason the large number of RNA editingsites in the Cycas mtDNA may indicate higher complexityat the DNA level and formation of various transcriptsthrough RNA editing thus potentially reflecting rapid di-vergence in Cycas

Although RNA editing sites are known to be sporad-ically distributed in the genomes of plant organellesmito-chondria from seed plants the mechanisms underlying thisdistribution pattern are not yet known (Shikanai 2006

FIG 5mdashGenomic organization of the second rps3 intron (rps3i2) and association of Tai sequences with Bpu sequences (A) Organization ofrps3i2 The segment between worf760 and the 4 tandem Bpu sequences is designated as a Tai sequence (gray) Arrowheads indicate a Bpu sequencelacking the 2 terminal repeats (banded bars) and its complementary sequence (B) Upper association of Tai sequences with Bpu tandem repeats acrossthe genome (abscissa) and their respective lengths (ordinate) Lower examples of 3 Tai variants (boxed) showing that the sequences are variouslydegenerate and truncated as compared with the typical Tai shown in (A) (broken line indicates mismatching sequences) Thin arrows indicate theorientations of homologous segments between Tai variants and the typical Tai

612 Chaw et al

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

element insertion sites in the Cycas mtDNA with corre-sponding sites in the mtDNAs of other species reveals thatthe target sites for transpositions of Bpu elements are iden-tical (within 1ndash2 mismatched base pair) to the 5 terminaldirect repeat (fig 4CF)

Bpu Sequences Distinguish Cycas from Other Cycadsand Seed Plants

Bpu sequences are present exclusively in the noncod-ing regions of the Cycas mtDNA that is within the intronsand intergenic spacers with 1 exception a Bpu sequence ispresent within the coding region of rrn 18 which codes the18S rRNA (fig 4C) Among the many mitochodrial rrn18(mt-rrn18) genes available in GenBank only those of Cy-cas (including Cycas revoluta GenBank accession numberAB029356) and Ginkgo biloba (the only living species ofthe order Ginkgoales) have Bpu elements (fig 4C) TheBpu insertion sites of the 2 Cycas taxa are orthologous (datanot shown) whereas that of Ginkgo is different indicatingthat the insertions of Bpu elements in rrn18 did not occur in

the common ancestor of cycads and ginkgo In sequencedmtDNAs the oldest cpDNA-derived sequences (alsotermed mtpt) cluster trnV(uac)-trnM(cau)-atpE-atpB-rbcL is reported to have existed since the common ancestorof seed plants (Wang et al 2007 fig 3) Bpu elements arepresent in the Cycas mtpt-atpB (fig 4D) and mtpt-rbcL(fig 4E) sequences but not in the corresponding sequencesof the 7 angiosperm mtDNAs elucidated to date (Wang et al2007) suggesting that these Bpu insertions took place afterthe split of the Cycas lineage from the other seed plants

We further examined the occurrence of Bpu elementsin 1 of the other 2 cycad families Zamiaceae (including 1species from each of Dioon Macrozamia and Zamia) bysequencing the exons 1 and 2 of their nad2 genes Whereasthe nad2i1 gene of Cycas mtDNA contains 5 Bpu ele-ments such elements are absent from the nad2 genes ofthe 3 sampled Zamiaceae genera However we found 12 and 6 Bpu elements in mitochondrial nad5i1 of Cycaspanzhihuaensis (GenBank accession number AF43425)and mitochondrial nad1i1 and cp rps19-rpl16 spacersin-tron of C revoluta (GenBank accession number

FIG 3mdashPhylogenies inferred from concatenated data from 22 protein-coding genes (see supplementary table S2 Supplementary Material online)common to the sequenced mtDNAs from 10 land plants and Chara (the outgroup) Nodes received 100 bootstrap replicates unless indicated Branchlengths are drawn to scale except as otherwise noted (A) Two superimposed Neighbor-Joining trees based on the Ka and Ks values respectively (B)Scenarios of gene losses (open bar) and splits (hatch triangle) along the single maximum parsimony tree (9857 steps) The total numbers of protein-coding genes in each mtDNA species are given within parentheses

Complete Mitochondria Genome of Cycas 609

FIG 4mdashBpu sequences and examples of their insertion loci Target sites and terminal repeats are underlined (A) Upper a dominant sequencebased on 500 Bpu sequences (see text) Numbers along the abscissa indicate each nucleotide position within the prototype Bpu sequence Recognitionsites for the Bpu10I restriction enzyme are marked with asterisks The ordinate scales each by the total bits of information multiplied by its relativeoccurrence at that position (Wasserman and Sandelin 2004) Lower identities of Bpu sequences that differ from the dominant type by 0ndash4 bp (B)Secondary structure of the dominant Bpu sequence (C)ndash(F) lower cases periods and dashes denote mismatches identical bps and deletionsrespectively compared with the uppermost sequence (C) Partial alignment of mt-rrn18 sequences from Cycas and Ginkgo A Bpu sequence anda reversely complementary Bpu sequence are present in Ginkgo and Cycas respectively (D) Partial alignment of mtpt-atpB sequences extracted from

610 Chaw et al

AY354955 AY345867) respectively These data seem tosuggest that no Bpu sequence has successfully invaded themtDNAs of Cycadales genera other than Cycas

A Bpu-like sequence with a 1-bp insertion and 89similarity to the dominant type in Cycas mtDNA was re-trieved from the coding sequence of the mitochondrialrps11 gene from the core eudicot Weigela hortensis Align-ment of this Bpu-like sequence reveals that its predictedBpu10I endonuclease recognition site differs from that ofCycas by 2 bp (gCTGAGt) Because this Bpu element-likesequence does not interrupt the reading frame of rps11 andbecause the mitochondrial rps11 gene of Weigela lacksa target duplication site we do not consider that thisBpu-like element shares a common origin with the Bpu se-quences of Cycas We further postulate that Bpu elementsare likely absent from or very rare in angiosperms

Surprisingly the cpDNA of Cycas also contains 2 Bpuelements and each of them locates in the petN-psbM andpsbA-trnK spacers respectively (see Wu et al 2007) Ad-ditionally we also identified 1 Bpu sequence in theatpB-rbcL spacer of the cpDNA from Pinus luchuensis(GenBank accession number DQ196799) However noBpu sequence has been detected in the cpDNAs of the 3other Pinaceae genera and 3 gnetophyte orders we have se-quenced to date (Chaw SM Wu CS Lai YT Wang YN LinCP Liu SM unpublished data) Collectively the availableevidence inclines us to believe that the Bpu sequences haveproliferated specifically in the mtDNA of Cycas (or the Cy-cadaceae) The sporadic occurrence of Bpu elements in thecpDNAs of Cycas and Pinus suggests that they are likelyderived from nonhomologous recombination with DNAfragments that leaked out of mitochondria in the formercase and via lateral transfer in the latter case

The Tai Sequence and Its Association with BpuSequences

The second intron of the rps3 gene (rps3i2) is onlyfound in the mtDNAs of Cycas (Regina et al 2005) includ-ing those studied in the present work Regina et al (2005)suggested that rps3i2 is a group II intron that was indepen-dently gained in the gymnosperms likely at or just after thedivergence of the angiosperms Moreover the authors re-ported a high similarity between a partial segment of theCycas rps3i2 and orf760 of the Chara mtDNA (Turmelet al 2003) Orf760 harbors functional domains for a matur-ase and a reverse transcriptase prompting Regina et al(2005) to propose that the Cycas rps3i2 gene originally en-coded a maturase and a reverse transcriptase but evolvedover time into a partially degenerated ORF Here we fur-ther report that rps3i2 of the Cycas mtDNA contains a900-bp fragment comprising an array of 4 Bpu elementswith the Bpu elements on each end lacking 1 terminal re-peat followed by a 440-bp fragment (designated as lsquolsquoTairsquorsquo

sequence) a Worf760 sequence and 1 perfect Bpu se-quence (fig 5A) Most intriguingly additional Tai sequen-ces are scattered throughout the Cycas mtDNA they occurmore densely in longer spacers than in shorter ones (fig 5B)and are generally found in close association with repeatedBpu sequences These findings lend extra and robust sup-port to the proposal of Regina et al (2005)

In conclusion we hypothesize that a Tai sequence andan orf760 flanked by a Bpu sequence at each end mostlikely constitute an ancestral retrotransposon This hypoth-esis is founded on 2 observations 1) Tai sequences arehighly associated with Bpu elements and 2) small segmentsof Tai sequences are vigorously and diversely rearrangeddeleted or truncated as illustrated in the 3 examples shownin figure 5B These observations also suggest that an ances-tral retrotransposon in Cycas mtDNA spread very activelypresumably after Cycas branched off from the 10 other ex-tant cycad genera Afterward the offspring or duplicates ofthe ancestral transposon gradually may have lost their mo-bility through varying degrees of deletionstruncations fromthe 3 region which encoded both maturase and reversetranscriptase functions

We further speculate that even after the ancestral ret-rotransposon lost its transposon function the remainingBpu elements might have retained the ability to proliferateor amplify via unequal crossing-over or slipped-strand mis-pairing because their 2 direct terminal repeats can pair witheach otherrsquos complimentary strands Future studies and se-quencing of additional mtDNAs from basal Cycas will berequired to confirm this hypothesis and may provide addi-tional insight into the evolutionary origins of Bpu sequen-ces and the molecular mechanisms underlying theiramplification

The Fates of CpDNA-Derived Sequences (mtpts) inCycas mtDNA

Table 2 shows a comparative analysis of mtpts amongthe 11 studied plant mtDNAs The total percentage of mtptsin Cycas (44) is relatively high compared with that indicots (11ndash36) but falls within the range seen amongmonocots (30ndash63) We previously discovered that thefrequency of mtpt transfer is positively correlated with var-iations in mtDNA size (coefficient value r2 5 047) (Wanget al 2007) Here we report that the Cycas mtDNA con-tains 8 protein-coding genes as well as 2 rRNA and 5 tRNAgene sequences originating from the cpDNA (fig 1) How-ever frameshifts and indels within the protein-coding genessuggest that these Cycas mtpts have degenerated and arenonfunctional In contrast the 5 cpDNA-derived tRNAsin the Cycas mtDNA are able to fold into standard clover-leaf structures as shown by tRNAscan-SE analysis (Loweand Eddy 1997) and are thus likely to be functional Pre-viously Sugiyama et al (2005) used tRNAscan-SE to scan

Cycas Oryza Arabidopsis and Nicotiana Note that 2 Bpu sequence insertions are detected in Cycas but not in the other plants (E) Partial alignment ofmtpt-rbcL sequences extracted from Oryza Arabidopsis Nicotiana and Cycas The sequence from Cycas contains a Bpu-like sequence with a 5-bpinsertion (shaded box) versus the dominant type (F) Partial alignment of the mtpt-atpE and chloroplast atpE sequences of Cycas Two reverselycomplementary Bpu sequences are found in the mtDNA Note that the 5 Bpu sequence is partly degenerated at its 3 end

Complete Mitochondria Genome of Cycas 611

the tobacco mtDNA and concluded that the 6 cp-derivedtRNAs shared by angiosperms are functional Because nomtptwasobserved inCharaMarchantia orPhyscomitrellaWangetal (2007)concludedthatfrequentDNAtransferfromcpDNA to mtDNA has taken place no later than in the com-mon ancestor of seed plants approximately 300 MYA

Abundant RNA Editing Sites in Cycas mtDNA

Among the land plant DNAs Cycas has the most pre-dicted RNA editing sites (1084 sites supplementary tableS4 [Supplementary Material online]) It is commonly be-lieved that RNA editing arose together with the first terres-trial plants (Steinhauser et al 1999) By using the PREP-mtsoftware (Mower 2005) with the cutoff score set to 061084 sites within the protein-coding genes of the CycasmtDNA were predicted to be C-to-U RNA editing sitesThis is more than double the number of predicted siteswithin the elucidated mtDNAs of other land plants (table 2)

If the cutoff score which indicates the conservation degreeof each editing site compared those found in the other pub-lished plant mtDNAs is set to the most stringent criterion inPREP-mt (ie5 1) the number of editing sites decreases to738 which is still the largest number found among the landplant mtDNAs elucidated to date

It is believed that RNA editing is essential for func-tional protein expression as it is required to modify aminoacids or generate new start or stop codons (Hoch et al 1991Wintz and Hanson 1991 Kotera et al 2005 Shikanai2006) For this reason the large number of RNA editingsites in the Cycas mtDNA may indicate higher complexityat the DNA level and formation of various transcriptsthrough RNA editing thus potentially reflecting rapid di-vergence in Cycas

Although RNA editing sites are known to be sporad-ically distributed in the genomes of plant organellesmito-chondria from seed plants the mechanisms underlying thisdistribution pattern are not yet known (Shikanai 2006

FIG 5mdashGenomic organization of the second rps3 intron (rps3i2) and association of Tai sequences with Bpu sequences (A) Organization ofrps3i2 The segment between worf760 and the 4 tandem Bpu sequences is designated as a Tai sequence (gray) Arrowheads indicate a Bpu sequencelacking the 2 terminal repeats (banded bars) and its complementary sequence (B) Upper association of Tai sequences with Bpu tandem repeats acrossthe genome (abscissa) and their respective lengths (ordinate) Lower examples of 3 Tai variants (boxed) showing that the sequences are variouslydegenerate and truncated as compared with the typical Tai shown in (A) (broken line indicates mismatching sequences) Thin arrows indicate theorientations of homologous segments between Tai variants and the typical Tai

612 Chaw et al

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

FIG 4mdashBpu sequences and examples of their insertion loci Target sites and terminal repeats are underlined (A) Upper a dominant sequencebased on 500 Bpu sequences (see text) Numbers along the abscissa indicate each nucleotide position within the prototype Bpu sequence Recognitionsites for the Bpu10I restriction enzyme are marked with asterisks The ordinate scales each by the total bits of information multiplied by its relativeoccurrence at that position (Wasserman and Sandelin 2004) Lower identities of Bpu sequences that differ from the dominant type by 0ndash4 bp (B)Secondary structure of the dominant Bpu sequence (C)ndash(F) lower cases periods and dashes denote mismatches identical bps and deletionsrespectively compared with the uppermost sequence (C) Partial alignment of mt-rrn18 sequences from Cycas and Ginkgo A Bpu sequence anda reversely complementary Bpu sequence are present in Ginkgo and Cycas respectively (D) Partial alignment of mtpt-atpB sequences extracted from

610 Chaw et al

AY354955 AY345867) respectively These data seem tosuggest that no Bpu sequence has successfully invaded themtDNAs of Cycadales genera other than Cycas

A Bpu-like sequence with a 1-bp insertion and 89similarity to the dominant type in Cycas mtDNA was re-trieved from the coding sequence of the mitochondrialrps11 gene from the core eudicot Weigela hortensis Align-ment of this Bpu-like sequence reveals that its predictedBpu10I endonuclease recognition site differs from that ofCycas by 2 bp (gCTGAGt) Because this Bpu element-likesequence does not interrupt the reading frame of rps11 andbecause the mitochondrial rps11 gene of Weigela lacksa target duplication site we do not consider that thisBpu-like element shares a common origin with the Bpu se-quences of Cycas We further postulate that Bpu elementsare likely absent from or very rare in angiosperms

Surprisingly the cpDNA of Cycas also contains 2 Bpuelements and each of them locates in the petN-psbM andpsbA-trnK spacers respectively (see Wu et al 2007) Ad-ditionally we also identified 1 Bpu sequence in theatpB-rbcL spacer of the cpDNA from Pinus luchuensis(GenBank accession number DQ196799) However noBpu sequence has been detected in the cpDNAs of the 3other Pinaceae genera and 3 gnetophyte orders we have se-quenced to date (Chaw SM Wu CS Lai YT Wang YN LinCP Liu SM unpublished data) Collectively the availableevidence inclines us to believe that the Bpu sequences haveproliferated specifically in the mtDNA of Cycas (or the Cy-cadaceae) The sporadic occurrence of Bpu elements in thecpDNAs of Cycas and Pinus suggests that they are likelyderived from nonhomologous recombination with DNAfragments that leaked out of mitochondria in the formercase and via lateral transfer in the latter case

The Tai Sequence and Its Association with BpuSequences

The second intron of the rps3 gene (rps3i2) is onlyfound in the mtDNAs of Cycas (Regina et al 2005) includ-ing those studied in the present work Regina et al (2005)suggested that rps3i2 is a group II intron that was indepen-dently gained in the gymnosperms likely at or just after thedivergence of the angiosperms Moreover the authors re-ported a high similarity between a partial segment of theCycas rps3i2 and orf760 of the Chara mtDNA (Turmelet al 2003) Orf760 harbors functional domains for a matur-ase and a reverse transcriptase prompting Regina et al(2005) to propose that the Cycas rps3i2 gene originally en-coded a maturase and a reverse transcriptase but evolvedover time into a partially degenerated ORF Here we fur-ther report that rps3i2 of the Cycas mtDNA contains a900-bp fragment comprising an array of 4 Bpu elementswith the Bpu elements on each end lacking 1 terminal re-peat followed by a 440-bp fragment (designated as lsquolsquoTairsquorsquo

sequence) a Worf760 sequence and 1 perfect Bpu se-quence (fig 5A) Most intriguingly additional Tai sequen-ces are scattered throughout the Cycas mtDNA they occurmore densely in longer spacers than in shorter ones (fig 5B)and are generally found in close association with repeatedBpu sequences These findings lend extra and robust sup-port to the proposal of Regina et al (2005)

In conclusion we hypothesize that a Tai sequence andan orf760 flanked by a Bpu sequence at each end mostlikely constitute an ancestral retrotransposon This hypoth-esis is founded on 2 observations 1) Tai sequences arehighly associated with Bpu elements and 2) small segmentsof Tai sequences are vigorously and diversely rearrangeddeleted or truncated as illustrated in the 3 examples shownin figure 5B These observations also suggest that an ances-tral retrotransposon in Cycas mtDNA spread very activelypresumably after Cycas branched off from the 10 other ex-tant cycad genera Afterward the offspring or duplicates ofthe ancestral transposon gradually may have lost their mo-bility through varying degrees of deletionstruncations fromthe 3 region which encoded both maturase and reversetranscriptase functions

We further speculate that even after the ancestral ret-rotransposon lost its transposon function the remainingBpu elements might have retained the ability to proliferateor amplify via unequal crossing-over or slipped-strand mis-pairing because their 2 direct terminal repeats can pair witheach otherrsquos complimentary strands Future studies and se-quencing of additional mtDNAs from basal Cycas will berequired to confirm this hypothesis and may provide addi-tional insight into the evolutionary origins of Bpu sequen-ces and the molecular mechanisms underlying theiramplification

The Fates of CpDNA-Derived Sequences (mtpts) inCycas mtDNA

Table 2 shows a comparative analysis of mtpts amongthe 11 studied plant mtDNAs The total percentage of mtptsin Cycas (44) is relatively high compared with that indicots (11ndash36) but falls within the range seen amongmonocots (30ndash63) We previously discovered that thefrequency of mtpt transfer is positively correlated with var-iations in mtDNA size (coefficient value r2 5 047) (Wanget al 2007) Here we report that the Cycas mtDNA con-tains 8 protein-coding genes as well as 2 rRNA and 5 tRNAgene sequences originating from the cpDNA (fig 1) How-ever frameshifts and indels within the protein-coding genessuggest that these Cycas mtpts have degenerated and arenonfunctional In contrast the 5 cpDNA-derived tRNAsin the Cycas mtDNA are able to fold into standard clover-leaf structures as shown by tRNAscan-SE analysis (Loweand Eddy 1997) and are thus likely to be functional Pre-viously Sugiyama et al (2005) used tRNAscan-SE to scan

Cycas Oryza Arabidopsis and Nicotiana Note that 2 Bpu sequence insertions are detected in Cycas but not in the other plants (E) Partial alignment ofmtpt-rbcL sequences extracted from Oryza Arabidopsis Nicotiana and Cycas The sequence from Cycas contains a Bpu-like sequence with a 5-bpinsertion (shaded box) versus the dominant type (F) Partial alignment of the mtpt-atpE and chloroplast atpE sequences of Cycas Two reverselycomplementary Bpu sequences are found in the mtDNA Note that the 5 Bpu sequence is partly degenerated at its 3 end

Complete Mitochondria Genome of Cycas 611

the tobacco mtDNA and concluded that the 6 cp-derivedtRNAs shared by angiosperms are functional Because nomtptwasobserved inCharaMarchantia orPhyscomitrellaWangetal (2007)concludedthatfrequentDNAtransferfromcpDNA to mtDNA has taken place no later than in the com-mon ancestor of seed plants approximately 300 MYA

Abundant RNA Editing Sites in Cycas mtDNA

Among the land plant DNAs Cycas has the most pre-dicted RNA editing sites (1084 sites supplementary tableS4 [Supplementary Material online]) It is commonly be-lieved that RNA editing arose together with the first terres-trial plants (Steinhauser et al 1999) By using the PREP-mtsoftware (Mower 2005) with the cutoff score set to 061084 sites within the protein-coding genes of the CycasmtDNA were predicted to be C-to-U RNA editing sitesThis is more than double the number of predicted siteswithin the elucidated mtDNAs of other land plants (table 2)

If the cutoff score which indicates the conservation degreeof each editing site compared those found in the other pub-lished plant mtDNAs is set to the most stringent criterion inPREP-mt (ie5 1) the number of editing sites decreases to738 which is still the largest number found among the landplant mtDNAs elucidated to date

It is believed that RNA editing is essential for func-tional protein expression as it is required to modify aminoacids or generate new start or stop codons (Hoch et al 1991Wintz and Hanson 1991 Kotera et al 2005 Shikanai2006) For this reason the large number of RNA editingsites in the Cycas mtDNA may indicate higher complexityat the DNA level and formation of various transcriptsthrough RNA editing thus potentially reflecting rapid di-vergence in Cycas

Although RNA editing sites are known to be sporad-ically distributed in the genomes of plant organellesmito-chondria from seed plants the mechanisms underlying thisdistribution pattern are not yet known (Shikanai 2006

FIG 5mdashGenomic organization of the second rps3 intron (rps3i2) and association of Tai sequences with Bpu sequences (A) Organization ofrps3i2 The segment between worf760 and the 4 tandem Bpu sequences is designated as a Tai sequence (gray) Arrowheads indicate a Bpu sequencelacking the 2 terminal repeats (banded bars) and its complementary sequence (B) Upper association of Tai sequences with Bpu tandem repeats acrossthe genome (abscissa) and their respective lengths (ordinate) Lower examples of 3 Tai variants (boxed) showing that the sequences are variouslydegenerate and truncated as compared with the typical Tai shown in (A) (broken line indicates mismatching sequences) Thin arrows indicate theorientations of homologous segments between Tai variants and the typical Tai

612 Chaw et al

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

AY354955 AY345867) respectively These data seem tosuggest that no Bpu sequence has successfully invaded themtDNAs of Cycadales genera other than Cycas

A Bpu-like sequence with a 1-bp insertion and 89similarity to the dominant type in Cycas mtDNA was re-trieved from the coding sequence of the mitochondrialrps11 gene from the core eudicot Weigela hortensis Align-ment of this Bpu-like sequence reveals that its predictedBpu10I endonuclease recognition site differs from that ofCycas by 2 bp (gCTGAGt) Because this Bpu element-likesequence does not interrupt the reading frame of rps11 andbecause the mitochondrial rps11 gene of Weigela lacksa target duplication site we do not consider that thisBpu-like element shares a common origin with the Bpu se-quences of Cycas We further postulate that Bpu elementsare likely absent from or very rare in angiosperms

Surprisingly the cpDNA of Cycas also contains 2 Bpuelements and each of them locates in the petN-psbM andpsbA-trnK spacers respectively (see Wu et al 2007) Ad-ditionally we also identified 1 Bpu sequence in theatpB-rbcL spacer of the cpDNA from Pinus luchuensis(GenBank accession number DQ196799) However noBpu sequence has been detected in the cpDNAs of the 3other Pinaceae genera and 3 gnetophyte orders we have se-quenced to date (Chaw SM Wu CS Lai YT Wang YN LinCP Liu SM unpublished data) Collectively the availableevidence inclines us to believe that the Bpu sequences haveproliferated specifically in the mtDNA of Cycas (or the Cy-cadaceae) The sporadic occurrence of Bpu elements in thecpDNAs of Cycas and Pinus suggests that they are likelyderived from nonhomologous recombination with DNAfragments that leaked out of mitochondria in the formercase and via lateral transfer in the latter case

The Tai Sequence and Its Association with BpuSequences

The second intron of the rps3 gene (rps3i2) is onlyfound in the mtDNAs of Cycas (Regina et al 2005) includ-ing those studied in the present work Regina et al (2005)suggested that rps3i2 is a group II intron that was indepen-dently gained in the gymnosperms likely at or just after thedivergence of the angiosperms Moreover the authors re-ported a high similarity between a partial segment of theCycas rps3i2 and orf760 of the Chara mtDNA (Turmelet al 2003) Orf760 harbors functional domains for a matur-ase and a reverse transcriptase prompting Regina et al(2005) to propose that the Cycas rps3i2 gene originally en-coded a maturase and a reverse transcriptase but evolvedover time into a partially degenerated ORF Here we fur-ther report that rps3i2 of the Cycas mtDNA contains a900-bp fragment comprising an array of 4 Bpu elementswith the Bpu elements on each end lacking 1 terminal re-peat followed by a 440-bp fragment (designated as lsquolsquoTairsquorsquo

sequence) a Worf760 sequence and 1 perfect Bpu se-quence (fig 5A) Most intriguingly additional Tai sequen-ces are scattered throughout the Cycas mtDNA they occurmore densely in longer spacers than in shorter ones (fig 5B)and are generally found in close association with repeatedBpu sequences These findings lend extra and robust sup-port to the proposal of Regina et al (2005)

In conclusion we hypothesize that a Tai sequence andan orf760 flanked by a Bpu sequence at each end mostlikely constitute an ancestral retrotransposon This hypoth-esis is founded on 2 observations 1) Tai sequences arehighly associated with Bpu elements and 2) small segmentsof Tai sequences are vigorously and diversely rearrangeddeleted or truncated as illustrated in the 3 examples shownin figure 5B These observations also suggest that an ances-tral retrotransposon in Cycas mtDNA spread very activelypresumably after Cycas branched off from the 10 other ex-tant cycad genera Afterward the offspring or duplicates ofthe ancestral transposon gradually may have lost their mo-bility through varying degrees of deletionstruncations fromthe 3 region which encoded both maturase and reversetranscriptase functions

We further speculate that even after the ancestral ret-rotransposon lost its transposon function the remainingBpu elements might have retained the ability to proliferateor amplify via unequal crossing-over or slipped-strand mis-pairing because their 2 direct terminal repeats can pair witheach otherrsquos complimentary strands Future studies and se-quencing of additional mtDNAs from basal Cycas will berequired to confirm this hypothesis and may provide addi-tional insight into the evolutionary origins of Bpu sequen-ces and the molecular mechanisms underlying theiramplification

The Fates of CpDNA-Derived Sequences (mtpts) inCycas mtDNA

Table 2 shows a comparative analysis of mtpts amongthe 11 studied plant mtDNAs The total percentage of mtptsin Cycas (44) is relatively high compared with that indicots (11ndash36) but falls within the range seen amongmonocots (30ndash63) We previously discovered that thefrequency of mtpt transfer is positively correlated with var-iations in mtDNA size (coefficient value r2 5 047) (Wanget al 2007) Here we report that the Cycas mtDNA con-tains 8 protein-coding genes as well as 2 rRNA and 5 tRNAgene sequences originating from the cpDNA (fig 1) How-ever frameshifts and indels within the protein-coding genessuggest that these Cycas mtpts have degenerated and arenonfunctional In contrast the 5 cpDNA-derived tRNAsin the Cycas mtDNA are able to fold into standard clover-leaf structures as shown by tRNAscan-SE analysis (Loweand Eddy 1997) and are thus likely to be functional Pre-viously Sugiyama et al (2005) used tRNAscan-SE to scan

Cycas Oryza Arabidopsis and Nicotiana Note that 2 Bpu sequence insertions are detected in Cycas but not in the other plants (E) Partial alignment ofmtpt-rbcL sequences extracted from Oryza Arabidopsis Nicotiana and Cycas The sequence from Cycas contains a Bpu-like sequence with a 5-bpinsertion (shaded box) versus the dominant type (F) Partial alignment of the mtpt-atpE and chloroplast atpE sequences of Cycas Two reverselycomplementary Bpu sequences are found in the mtDNA Note that the 5 Bpu sequence is partly degenerated at its 3 end

Complete Mitochondria Genome of Cycas 611

the tobacco mtDNA and concluded that the 6 cp-derivedtRNAs shared by angiosperms are functional Because nomtptwasobserved inCharaMarchantia orPhyscomitrellaWangetal (2007)concludedthatfrequentDNAtransferfromcpDNA to mtDNA has taken place no later than in the com-mon ancestor of seed plants approximately 300 MYA

Abundant RNA Editing Sites in Cycas mtDNA

Among the land plant DNAs Cycas has the most pre-dicted RNA editing sites (1084 sites supplementary tableS4 [Supplementary Material online]) It is commonly be-lieved that RNA editing arose together with the first terres-trial plants (Steinhauser et al 1999) By using the PREP-mtsoftware (Mower 2005) with the cutoff score set to 061084 sites within the protein-coding genes of the CycasmtDNA were predicted to be C-to-U RNA editing sitesThis is more than double the number of predicted siteswithin the elucidated mtDNAs of other land plants (table 2)

If the cutoff score which indicates the conservation degreeof each editing site compared those found in the other pub-lished plant mtDNAs is set to the most stringent criterion inPREP-mt (ie5 1) the number of editing sites decreases to738 which is still the largest number found among the landplant mtDNAs elucidated to date

It is believed that RNA editing is essential for func-tional protein expression as it is required to modify aminoacids or generate new start or stop codons (Hoch et al 1991Wintz and Hanson 1991 Kotera et al 2005 Shikanai2006) For this reason the large number of RNA editingsites in the Cycas mtDNA may indicate higher complexityat the DNA level and formation of various transcriptsthrough RNA editing thus potentially reflecting rapid di-vergence in Cycas

Although RNA editing sites are known to be sporad-ically distributed in the genomes of plant organellesmito-chondria from seed plants the mechanisms underlying thisdistribution pattern are not yet known (Shikanai 2006

FIG 5mdashGenomic organization of the second rps3 intron (rps3i2) and association of Tai sequences with Bpu sequences (A) Organization ofrps3i2 The segment between worf760 and the 4 tandem Bpu sequences is designated as a Tai sequence (gray) Arrowheads indicate a Bpu sequencelacking the 2 terminal repeats (banded bars) and its complementary sequence (B) Upper association of Tai sequences with Bpu tandem repeats acrossthe genome (abscissa) and their respective lengths (ordinate) Lower examples of 3 Tai variants (boxed) showing that the sequences are variouslydegenerate and truncated as compared with the typical Tai shown in (A) (broken line indicates mismatching sequences) Thin arrows indicate theorientations of homologous segments between Tai variants and the typical Tai

612 Chaw et al

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

the tobacco mtDNA and concluded that the 6 cp-derivedtRNAs shared by angiosperms are functional Because nomtptwasobserved inCharaMarchantia orPhyscomitrellaWangetal (2007)concludedthatfrequentDNAtransferfromcpDNA to mtDNA has taken place no later than in the com-mon ancestor of seed plants approximately 300 MYA

Abundant RNA Editing Sites in Cycas mtDNA

Among the land plant DNAs Cycas has the most pre-dicted RNA editing sites (1084 sites supplementary tableS4 [Supplementary Material online]) It is commonly be-lieved that RNA editing arose together with the first terres-trial plants (Steinhauser et al 1999) By using the PREP-mtsoftware (Mower 2005) with the cutoff score set to 061084 sites within the protein-coding genes of the CycasmtDNA were predicted to be C-to-U RNA editing sitesThis is more than double the number of predicted siteswithin the elucidated mtDNAs of other land plants (table 2)

If the cutoff score which indicates the conservation degreeof each editing site compared those found in the other pub-lished plant mtDNAs is set to the most stringent criterion inPREP-mt (ie5 1) the number of editing sites decreases to738 which is still the largest number found among the landplant mtDNAs elucidated to date

It is believed that RNA editing is essential for func-tional protein expression as it is required to modify aminoacids or generate new start or stop codons (Hoch et al 1991Wintz and Hanson 1991 Kotera et al 2005 Shikanai2006) For this reason the large number of RNA editingsites in the Cycas mtDNA may indicate higher complexityat the DNA level and formation of various transcriptsthrough RNA editing thus potentially reflecting rapid di-vergence in Cycas

Although RNA editing sites are known to be sporad-ically distributed in the genomes of plant organellesmito-chondria from seed plants the mechanisms underlying thisdistribution pattern are not yet known (Shikanai 2006

FIG 5mdashGenomic organization of the second rps3 intron (rps3i2) and association of Tai sequences with Bpu sequences (A) Organization ofrps3i2 The segment between worf760 and the 4 tandem Bpu sequences is designated as a Tai sequence (gray) Arrowheads indicate a Bpu sequencelacking the 2 terminal repeats (banded bars) and its complementary sequence (B) Upper association of Tai sequences with Bpu tandem repeats acrossthe genome (abscissa) and their respective lengths (ordinate) Lower examples of 3 Tai variants (boxed) showing that the sequences are variouslydegenerate and truncated as compared with the typical Tai shown in (A) (broken line indicates mismatching sequences) Thin arrows indicate theorientations of homologous segments between Tai variants and the typical Tai

612 Chaw et al

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

Mulligan et al 2007) Supplementary table S4 (Supplemen-tary Material online) shows that within the Cycas mtDNAgene complex I (which includes 9 nad genes) is the mostextensively edited gene category Nad4 and nad5 show thefirst and second highest number of editing sites followedby the cox1 gene of complex III The least edited gene isrps10 which contains only 6 predicted sites even when thecutoff score is set to 06 Mulligan et al (2007) interpretedthe sporadic pattern as a result of lineage-specific loss ofediting sites through retroconversion which could removeadjacent editing sites by replacement with the edited se-quences The abundant RNA editing sites in the CycasmtDNA however are mysterious and completely deviatefrom the sporadic patterns reported in other seed plantmtDNAs Therefore the mechanism of RNA editing inCycas mtDNA may have a different evolutionary historyfrom those of other seed plants

Previous studies (Steinhauser et al 1999 Lurin et al2004 Shikanai 2006) led to the proposal that variation ormultiplication of trans-acting factors (eg members of thepentatricopeptide repeat family) may allow rapid increasesin the number of editing sites This could possibly be cor-related to the lineage-specific explosion of RNA editingsites observed in Cycas mtDNA Furthermore Handarsquos(2003) conclusion that the evolutionary speed in RNA edit-ing is higher at the level of gene regulation than at the pri-mary gene sequence level appears to provide additionalsupport for our speculation To improve understandingof the mechanisms involved in abundant RNA editing sitesin Cycas mtDNA critical analysis of mtDNAs from fernsconifers and other cycads will be required

In all plant lineages RNA editing often alters theamino acid sequence (Adams and Palmer 2003) Supple-mentary table S5 (Supplementary Material online) summa-rizes the number of predicted preediting codons andpotential edited codons in the Cycas mtDNA The 2 mostfrequent edited codons are TCA (Ser) and TCT (Ser) whichare predicted to be edited into TTA (Leu) and TTT (Phe)respectively The least frequently edited codon is CAA(Gln) which would putatively be edited in only 2 caseswherein the codonrsquos first position would be altered to yieldthe stop codon TAA Supplementary table S6 (Supplemen-tary Material online) further depicts the formation of 4 start(from ACG to ATG) and 7 stop codons (from CGA to TGA)through RNA editing To verify the accuracy of the editingsites predicted using PREP-mt partial transcripts of thecox1 atp1 atp6 and ccmB genes were experimentally as-sayed using RT-PCR Comparison of our RT-PCR se-quences with the editing sites predicted by PREP-mtindicated accuracy values of 9766 9913 9791and 9925 for cox1 atp1 atp6 and ccmB respectively

Our RT-PCR analysis additionally identified a U-to-Cediting type in atp1 this was missed in the prediction be-cause detection of the U-to-C editing type is limited whenusing PREP-mt (Mower 2005) Unfortunately programsfor predicting RNA editing sites in nonprotein-codinggenes such as ribosomal RNA and tRNA genes and thatof silent editing sites (which alter the mRNA but do notchange the translated amino acid) are not yet availableWe are currently examining these additional editing typesand the locations of such sites in the Cycas mtDNA

No Large Repeats in Gymnosperms but Some inAngiosperms

In the mtDNAs of angiosperms long repeated sequen-ces vary in size from 2427 bp in rapeseed to 127600 bp inrice and they show no homology to each other implyingthat repeated sequences were independently acquired duringthe evolution from Marchantia to angiosperms (Sugiyamaet al 2005) In Cycas mtDNA we only detected a few longrepeated sequences all less than 15 kb in length and mostlycomposed of Bpu sequences It has been demonstrated thatgenes are continuously transferred from the nuclear genomeand cpDNA to mtDNA (Knoop 2004) However the mech-anisms underlying the emergence of large repeats are not yetfully understood

Conclusions

The complete Cycas mtDNA shows a number of un-precedented features that are atypical of the mtDNAs pre-viously elucidated for other land plants including thelowest A thorn T content the highest proportion of tandem re-peat sequences (mainly Bpu sequences) and gene numberfound so far abundant RNA editing sites and an exception-ally elevated Ka value for the protein-coding genes The lat-ter 2 features might be correlated In comparison with theother known angiosperm organelle genomes the cpDNAand mtDNA of Cycas have experienced the least gene lossPeculiarly the cpDNA of Cycas has the fewest RNA edit-ing sites among land plants (Wu et al 2007) whereas itsmtDNA has the most On the other hand some character-istics are shared by mtDNAs of Cycas and angiosperms butnot bryophytes these include common gene transfers fromcpDNA similar ratios of noncoding sequences lack ofgroup I introns relatively low numbers of tRNAs and con-currence of trans-spliced group II introns Nevertheless theabove-mentioned Cycas-specific features suggest that theCycas mtDNA has a lineage-specific evolutionary history

A novel family of SIMEs designated as Bpu elementssequences represents another unique feature of the Cyca-daceae lineage Bpu elements are widely distributedthroughout the noncoding regions of Cycas mtDNA (over500 copies) with only 1 occurrence in a coding region (rrn18) In the Cycas mtDNA the highly conserved nature ofBpu sequences and their association with Tai sequences aswell as Worf760 (located in rps3i2) suggest that these se-quences may have collectively originated from an ancestralretrotransposon Further investigation into the origin pro-liferation and evolution of Bpu sequences will be desirableto help clarify the peculiar features of Cycas mtDNA

Supplementary Material

Supplementary tables S1ndashS7 and figure 1 are availableat Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by joint research grants fromthe Research Center for Biodiversity Academia Sinica to

Complete Mitochondria Genome of Cycas 613

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

S-MC and DW and partially by National Science Coun-cil grant to S-MC (94-2311-B001-059) and by a grantfrom the Institute of Information Science Academia Sinicato AC-CS We thank Chiao-Lei Cheng for sequencing theRT-PCR products We are grateful for the 3 anonymousreviewers who provided critical comments and valuablesuggestions

Literature Cited

Adams KL Palmer JD 2003 Evolution of mitochondrial genecontent gene loss and transfer to the nucleus Mol PhylogenetEvol 29380ndash395

Adams KL Qiu YL Stoutemyer M Palmer JD 2002Punctuated evolution of mitochondrial gene content highand variable rates of mitochondrial gene loss and transfer tothe nucleus during angiosperm evolution Proc Natl Acad SciUSA 999905ndash9912

Benson G 1999 Tandem repeats finder a program to analyzeDNA sequences Nucleic Acids Res 27573ndash580

Bergthorsson U Richardson AO Young GJ Goertzen LRPalmer JD 2004 Massive horizontal transfer of mitochon-drial genes from diverse land plant donors to the basalangiosperm Amborella Proc Natl Acad Sci USA 10117747ndash17752

Chaw SM Chang CC Chen HL Li WH 2004 Dating themonocot-dicot divergence and the origin of core eudicotsusing whole chloroplast genomes J Mol Evol 58424ndash441

Chaw SM Walters TW Chang CC Hu SH Chen SH 2005 Aphylogeny of cycads (Cycadales) inferred from chloroplastmatK gene trnK intron and nuclear rDNA ITS region MolPhylogenet Evol 37214ndash234

Chaw SM Zharkikh A Sung HM Lau TC Li WH 1997Molecular phylogeny of extant gymnosperms and seed plantevolution analysis of nuclear 18S rRNA sequences Mol BiolEvol 1456ndash68

Cho Y Qiu YL Kuhlman P Palmer JD 1998 Explosiveinvasion of plant mitochondria by a group I intron Proc NatlAcad Sci USA 9514244ndash14249

Clifton SW Minx P Fauron CM et al (13 co-authors) 2004Sequence and comparative analysis of the maize NBmitochondrial genome Plant Physiol 1363486ndash3503

Feschotte S Zhang X Wessler SR 2002 Miniature inverted-repeat transposable elements and their relationship toestablished DNA transposons Washington (DC) ASM Press

Groth-Malonek M Pruchner D Grewe F Knoop V 2005Ancestors of trans-splicing mitochondrial introns supportserial sister group relationships of hornworts and mosses withvascular plants Mol Biol Evol 22117ndash125

Handa H 2003 The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L) comparative analysis of the mitochon-drial genomes of rapeseed and Arabidopsis thaliana NucleicAcids Res 315907ndash5916

Hoch B Maier RM Appel K Igloi GL Kossel H 1991 Editingof a chloroplast mRNA by creation of an initiation codonNature 353178ndash180

Jones DL 2002 Cycads of the Worldmdashancient plant in todayrsquoslandscape 2nd ed Sydney (Australia) Reed New Holland

Kadowaki K Kubo N Ozawa K Hirai A 1996 Targetingpresequence acquisition after mitochondrial gene transfer tothe nucleus occurs by duplication of existing targeting signalsEMBO J 156652ndash6661

Karol KG McCourt RM Cimino MT Delwiche CF 2001 Theclosest living relatives of land plants Science 2942351ndash2353

Knoop V 2004 The mitochondrial DNA of land plants peculiaritiesin phylogenetic perspective Curr Genet 46123ndash139

Kotera E Tasaka M Shikanai T 2005 A pentatricopeptiderepeat protein is essential for RNA editing in chloroplastsNature 433326ndash330

Kubo T Nishizawa S Sugawara A Itchoda N Estiati AMikami T 2000 The complete nucleotide sequence of themitochondrial genome of sugar beet (Beta vulgaris L) revealsa novel gene for tRNA(Cys)(GCA) Nucleic Acids Res282571ndash2576

Kumar S Tamura K Nei M 2004 MEGA3 integrated softwarefor molecular evolutionary genetics analysis and sequencealignment Brief Bioinform 5150ndash163

Kurtz S Choudhuri JV Ohlebusch E Schleiermacher C Stoye JGiegerich R 2001 REPuter the manifold applications ofrepeat analysis on a genomic scale Nucleic Acids Res294633ndash4642

Lowe TM Eddy SR 1997 tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 25955ndash964

Lurin C Andres C Aubourg S et al 19 co-authors 2004Genome-wide analysis of Arabidopsis pentatricopeptide re-peat proteins reveals their essential role in organelle bio-genesis Plant Cell 162089ndash2103

Malek O Brennicke A Knoop V 1997 Evolution of trans-splicing plant mitochondrial introns in pre-Permian timesProc Natl Acad Sci USA 94553ndash558

Malek O Knoop V 1998 Trans-splicing group II introns in plantmitochondria the complete set of cis-arranged homologs inferns fern allies and a hornwort RNA 41599ndash1609

Michel F Ferat JL 1995 Structure and activities of group IIintrons Annu Rev Biochem 64435ndash461

Mower JP 2005 PREP-Mt predictive RNA editor for plantmitochondrial genes BMC Bioinformatics 696

Mulligan RM Chang KL Chou CC 2007 Computationalanalysis of RNA editing sites in plant mitochondrial genomesreveals similar information content and a sporadic distributionof editing sites Mol Biol Evol 241971ndash1981

Notsu Y Masood S Nishikawa T Kubo N Akiduki GNakazono M Hirai A Kadowaki K 2002 The completesequence of the rice (Oryza sativa L) mitochondrial genomefrequent DNA sequence acquisition and loss during the evolutionof flowering plants Mol Genet Genomics 268434ndash445

Oda K Kohchi T Ohyama K 1992 Mitochondrial DNA ofMarchantia polymorpha as a single circular form with noincorporation of foreign DNA Biosci Biotechnol Biochem56132ndash135

Ogihara Y Yamazaki Y Murai K et al (14 co-authors) 2005Structural dynamics of cereal mitochondrial genomes asrevealed by complete nucleotide sequencing of the wheatmitochondrial genome Nucleic Acids Res 336235ndash6250

Qiu YL Li L Wang B et al (13 co-authors) 2006 The deepestdivergences in land plants inferred from phylogenomicevidence Proc Natl Acad Sci USA 10315511ndash15516

Regina TM Picardi E Lopez L Pesole G Quagliariello C 2005A novel additional group II intron distinguishes the mitochon-drial rps3 gene in gymnosperms J Mol Evol 60196ndash206

Selosse M Albert B Godelle B 2001 Reducing the genome sizeof organelles favours gene transfer to the nucleus Trends EcolEvol 16135ndash141

Shikanai T 2006 RNA editing in plant organelles machineryphysiological function and evolution Cell Mol Life Sci63698ndash708

Steinhauser S Beckert S Capesius I Malek O Knoop V 1999Plant mitochondrial RNA editing J Mol Evol 48303ndash312

Stevenson DW 1990 Morphology and systematics of theCycadales Mem N Y Bot Gard 578ndash15

614 Chaw et al

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615

Stewart CN Jr Via LE 1993 A rapid CTAB DNA isolationtechnique useful for RAPD fingerprinting and other PCRapplications Biotechniques 14748ndash750

Sugiyama Y Watase Y Nagase M Makita N Yagura S Hirai ASugiura M 2005 The complete nucleotide sequence andmultipartite organization of the tobacco mitochondrialgenome comparative analysis of mitochondrial genomes inhigher plants Mol Genet Genomics 272603ndash615

Terasawa K Odahara M Kabeya Y Kikugawa T Sekine YFujiwara M Sato N 2007 The mitochondrial genome of themoss Physcomitrella patens sheds new light on mitochondrialevolution in land plants Mol Biol Evol 24699ndash709

Turmel M Otis C Lemieux C 2003 The mitochondrial genomeof Chara vulgaris insights into the mitochondrial DNAarchitecture of the last common ancestor of green algae andland plants Plant Cell 151888ndash1903

Unseld M Marienfeld JR Brandt P Brennicke A 1997 Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366924 nucleotides Nat Genet 1557ndash61

Wang D Wu YW Chun-Chieh Shih A Wu CS Wang YNChaw SM 2007 Transfer of chloroplast genomic DNA tomitochondrial genome occurred at least 300 MYA Mol BiolEvol 242040ndash2048

Wasserman WW Sandelin A 2004 Applied bioinformatics forthe identification of regulatory elements Nat Rev Genet5276ndash287

Wintz H Hanson MR 1991 A termination codon is created by RNAediting in the petunia atp9 transcript Curr Genet 1961ndash64

Wu CS Wang YN Liu SM Chaw SM 2007 Chloroplastgenome (cpDNA) of Cycas taitungensis and 56 cp protein-coding genes of Gnetum parvifolium insights into cpDNAevolution and phylogeny of extant seed plants Mol Biol Evol241366ndash1379

William Martin Associate Editor

Accepted January 6 2008

Complete Mitochondria Genome of Cycas 615