RESEARCH ARTICLE
A genome-wide identification and analysis
of the DYW-deaminase genes in the
pentatricopeptide repeat gene family in
cotton (Gossypium spp.)
Bingbing Zhang1, Guoyuan Liu1, Xue Li1, Liping Guo1, Xuexian Zhang1, Tingxiang Qi1,
Hailin Wang1, Huini Tang1, Xiuqin Qiao1, Jinfa Zhang2, Chaozhu Xing1*, Jianyong Wu1*
1 State Key Laboratory of Cotton Biology (China)/Institute of Cotton Research of Chinese Academy of
Agricultural Science, Anyang, Henan, China, 2 Department of Plant and Environmental Sciences, New
Mexico State University, Las Cruces, New Mexico, United States of America
* [email protected] (JW); [email protected] (CX)
Abstract
The RNA editing occurring in plant organellar genomes mainly involves the change of cyti-
dine to uridine. This process involves a deamination reaction, with cytidine deaminase as
the catalyst. Pentatricopeptide repeat (PPR) proteins with a C-terminal DYW domain are
reportedly associated with cytidine deamination, similar to members of the deaminase
superfamily. PPR genes are involved in many cellular functions and biological processes
including fertility restoration to cytoplasmic male sterility (CMS) in plants. In this study, we
identified 227 and 211 DYW deaminase-coding PPR genes for the cultivated tetraploid cot-
ton species G. hirsutum and G. barbadense (2n = 4x = 52), respectively, as well as 126 and
97 DYW deaminase-coding PPR genes in the ancestral diploid species G. raimondii and
G. arboreum (2n = 26), respectively. The 227 G. hirsutum PPR genes were predicted to
encode 52–2016 amino acids, 203 of which were mapped onto 26 chromosomes. Most
DYW deaminase genes lacked introns, and their proteins were predicted to target the mito-
chondria or chloroplasts. Additionally, the DYW domain differed from the complete DYW
deaminase domain, which contained part of the E domain and the entire E+ domain. The
types and number of DYW tripeptides may have been influenced by evolutionary processes,
with some tripeptides being lost. Furthermore, a gene ontology analysis revealed that DYW
deaminase functions were mainly related to binding as well as hydrolase and transferase
activities. The G. hirsutum DYW deaminase expression profiles varied among different cot-
ton tissues and developmental stages, and no differentially expressed DYW deaminase-
coding PPRs were directly associated with the male sterility and restoration in the CMS-D2
system. Our current study provides an important piece of information regarding the struc-
tural and evolutionary characteristics of Gossypium DYW-containing PPR genes coding for
deaminases and will be useful for characterizing the DYW deaminase gene family in cotton
biology and breeding.
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 1 / 21
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPENACCESS
Citation: Zhang B, Liu G, Li X, Guo L, Zhang X, Qi
T, et al. (2017) A genome-wide identification and
analysis of the DYW-deaminase genes in the
pentatricopeptide repeat gene family in cotton
(Gossypium spp.). PLoS ONE 12(3): e0174201.
https://doi.org/10.1371/journal.pone.0174201
Editor: David D. Fang, USDA-ARS Southern
Regional Research Center, UNITED STATES
Received: November 30, 2016
Accepted: March 6, 2017
Published: March 24, 2017
Copyright: © 2017 Zhang et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: All relevant data are
within the paper and its Supporting Information
files. All of the raw sequence data from the Illumina
sequencing platform can be found in the Short
Read Archive (SRA) database of the National
Center for Biotechnology Information (NCBI) under
accession number SRX2578795.
Funding: This research was financed by funds
from the National Key Research and Development
program of China (2016YFD0101400) and Cotton
Germplasm Innovation and the Molecular breeding
Introduction
RNA editing is a post-transcriptional process affecting organellar transcripts that results in
nucleotide changes from DNA to RNA. There are various types of RNA editing in different
organisms. In mammals, a zinc-dependent enzyme related to cytidine deaminase is responsi-
ble for converting cytidines (C) to uredines (U) in apoB mRNA [1, 2]. In vascular plants, RNA
editing occurs in chloroplasts and mitochondria, and almost exclusively involves C-to-U
changes [2]. Additionally, nuclear genes encoding proteins involved in RNA editing belong to
the gene subfamily associated with pentatricopeptide repeat (PPR) proteins.
The PPR protein family is one of the largest protein families in higher plants [3]. The PPR
proteins usually have 2–27 repeating motifs consisting of 35 amino acids arranged in clusters
[3, 4]. These motifs are divided into three classes. The canonical PPR motifs (i.e., P motifs) are
common among eukaryotes, while the short PPR-like (i.e., S motifs) and long PPR-like motifs
(i.e., L motifs) are present only in higher plants [4]. The PPR protein family is divided into two
subfamilies (i.e., P and PLS subfamilies) based on the different motifs. The P subfamily mem-
bers contain only the P motif, while the PLS subfamily members usually consist of a tandem
array of degenerated P, L, and S motifs. The PLS subfamily proteins are plant-specific and usu-
ally contain an additional C-terminal extension with E, E+, and DYW domains. According to
the C-terminal extension, this subfamily can be divided into the following four subgroups:
PLS, E, E+, and DYW [4]. Arabidopsis thaliana carries 87 genes encoding PPR proteins with
the DYW domain. The DYW domain is named for the frequent presence of an Asp–Tyr–Trp
tripeptide at the C-terminal, and has been detected only in the proteins of upland plants. The
E and E+ domains are usually highly degraded and difficult to identify. The DYW domain
contains a highly conserved region with α-β-α secondary structure consisting of conserved
amino acids (e.g., Cys and His) [5].
The DYW domain is the candidate catalytic domain for cytidine deaminase. The HxExx. . .
CxxCH motifs in the DYW domain align with the active site of cytidine deaminase (i.e., C/
HxExx. . .xPCxxC). Additionally, the DYW domain is highly evolutionarily correlated with
RNA editing. Thus, proteins containing the DYW domain have been classified as deaminases
involved in RNA editing [6, 7]. As such, proteins are grouped into the nucleic acid deaminase
superfamily (i.e., DYW deaminase family; Pfam: Pf14432) according to DYW domain charac-
teristics. The DYW deaminase domain contains part of the E domain, the whole E+ domain,
and most of the DYW domain [7]. Recently, Hayes et al. (2015) suggested that the conserved
Glu residue of the HXE motif in the DYW domain is necessary for RNA editing. When the
codon for the Glu residue in the DYW domain of OTP84 and CREF7 was mutagenized, the
transgenic plants containing the mutated genes could not efficiently edit the cognate editing
sites [8]. Wagoner et al. (2015) proposed that the DYW domain and the E (or E+) domain are
essential for cytidine deaminase activity because truncating either domain resulted in a QED1
protein that completely lacked any editing activity. In contrast, the DYW domain was revealed
to be unnecessary for four RNA editing factors (i.e., CRR22, CRR28, OTP82, and ELI1) [9].
The DYW deaminase domain is different from the DYW domain in PPR proteins, which is
essential for the RNA editing activity of some enzymes but not for others. Thus, the role of the
DYW domain during RNA editing has not been fully characterized. Boussardon et al. (2012)
suggested that the interaction between CRR4 and DYW1 contributes to the editing of the
ndhD-1 site. The DYW domain, which functioned as the catalytic domain, may have been lost
in CRR4, which now requires the DYW domain from DYW1 to provide catalytic activity. The
DYW1 protein carrying the E+ and DYW domains may interact with CRR4 to edit the ndhD
C2 site [10]. In this case, CRR4 may use the DYW1 as the catalytic domain. In summary, as
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 2 / 21
of High yield varieties in National Natural Science
Foundation of China (31621005).
Competing interests: The authors have declared
that no competing interests exist.
one of the most specific subgroups of the PPR protein family, the DYW deaminase domain-
containing PPR proteins are particularly striking because of their RNA editing activities.
Cotton is an economically important fiber crop as the source of the most important natural
textile fiber. The genus Gossypium contains approximately 45 diploid species and five poly-
ploid species. Crossing between the extant of diploid species G. arboreum and G. raimondiiand chromosome doubling 1–2 million years ago resulted in the emergence of allotetraploid
cotton species, including cultivated G. hirsutum (accounting for ca. 95% world cotton produc-
tion) and G. barbadense [11]. The genomes of the two ancestral diploids (i.e., G. arboreum and
G. raimondii) and the two tetraploids (i.e., G. hirsutum and G. barbadense) have recently been
sequenced, providing an opportunity to study cotton gene families. PPR genes including these
responsible for fertility restoration to cytoplasmic male sterility (CMS) have been cloned with
functions analyzed in several plant species. For instance, Rf4 encoded a PPR protein for wild
abortive-type CMS of rice, was confirmed to reduce the orf352-containing transcripts to
restore pollen fertility [12], and a PPR protein Rf6 function with hexokinase 6 (OsHXK6) to
promote the processing of the aberrant CMS-associated transcript atp6-orfH79 to rescue
HL-CMS of rice [13]. In sorghum, two PPR genes were identified for fertility restoration of A1
cytoplasm [14, 15]. However, no such information is currently available in cotton. In this
study, we performed an in silico analysis of the four Gossypium genome sequences to identify
PPR genes encoding for the DYW-containing deaminases. We focused on gene structure and
expression with a specific focus on CMS and fertility restoration in G. hirsutum. The results
of this study will contribute to the understanding of the distribution and functions of DYW
deaminase genes in cotton.
Materials and methods
Materials
The Gossypium hirsutum variety Zhong10 was used for different tissue expression analysis.
Materials were planted under greenhouse. Roots, stems and leaves were collected in one-
month-old seedlings, flowers were collected from plants grown two-month-old on the day of
flowering, all tissue samples were obtained from 3 individual plants. Cytoplasm male sterility
(CMS-D2) A S(rf1rf1) with cytoplasm from Gossypium harknessii and its isogenic restorer line
R S(Rf1rf1) were planted under normal conditions in the farm. Floral buds about 3 mm in
length (nearly represent the stage of male meiosis) were collected from the near isogenic line
with three biological replicates as previous studies, respectively [16, 17]. All collected samples
were immediately frozen in liquid nitrogen and stored at -76˚C before use. Above the materials
were provided by Cotton Research Institute (CRI), Chinese Academy of Agricultural Science
(CAAS), Anyang, Henan, China.
Local databases of sequence data and RNA-seq
Local databases including genome sequence and annotation information files of Gossypiumspecies (G. raimondii, G. arboreum, G. hirsutum and G. barbadense (xinhai)) and other organ-
isms (Cyanidioschyzon merolae, Physcomitrella patens, Selaginella moellendorffii, Sorghumbicolor, Oryza sativa; Japonica Group) were established. The genome sequence and annota-
tion information of Gossypium species was downloaded from the Cottongen (https://www.
cottongen.org/). Other organism genome information was downloaded from the NCBI. Arabi-
dopsis thaliana genome sequencewere downloaded from TAIR (http://www.arabidopsis.org/).
The extraction of total RNA from flower buds about 3 mm in length (at roughly the stage of male
meiosis) of CMS cotton (CMS-D2) A and its isogenic restorer line R was performed by using
Spectrum™ Plant Total RNA Kit according to the manufacturer’s guide. RNA concentration was
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 3 / 21
measured using NANODROP 2000. Six transcriptome libraries (A1-3, R1-3) were con-
structed by using equal amounts of RNA from the three biological replicates. The libraries
preparation and sequencing were performed following standard Illumina methods on the
Illumina sequencing platform HiSeq™ 2500 system (2×150bp read length). All of the raw
sequence data from the Illumina sequencing platform can be found in the Short Read
Archive (SRA) database of the National Center for Biotechnology Information (NCBI)
under accession number SRX2578795. The FPKMs (mean value of three biological repli-
cates) of gene was further used to perform a differential expression analysis. The criterion
was set as: fold change = |log2 (sample1/sample2) |� 1 and P� 0.05. The differentially
expressed genes between CMS cotton (CMS-D2) A and its isogenic restorer line R were
shown in S1 Table. RNA-seq data of gene expression during fiber development stages at 5,
10, 20, 25 dpa and in root, stem, leaf, petal, stamen, and pistil in G. hirsutum were down-
loaded from ccNET database (http://structuralbiology.cau.edu.cn/gossypium/) [18].
DYW-deaminase proteins sequence identification and domain analysis
Download hidden Markov model information (PF14432) of the unique DYW deaminase
domain from Pfam29.0 database, then use Pfam file PF14432 to retrieve annotated protein
sequences file of Gossypium species and other organisms with Hmmsearch program (E value as
the default) in the HMMER 3.0 software, obtained sequences above the threshold from totally
protein sequences. Domain analysis manually one by one on candidate protein sequences were
performed by using online software SMART (http://smart.embl-heidelberg.de/), removed these
sequences without DYW deaminase domain.
Genome wide synteny analysis of DYW deaminase genes
A comparative syntenic map of the DYW deaminase genes from tetraploid cotton species and
two diploid cotton species was constructed by using circos-0.69–3 software package, with
parameters set as the default.
Conserved domain alignment and analysis of DYW deaminase proteins
in Gossypium
By using MEME online software analysis, DYW deaminase protein sequences was used to
obtain consensus sequence and Logo of DYW domain and DYW deaminase domain, and
compared consensus sequence with tomato and Arabidopsis thaliana.
Phylogenetic tree construction, gene structure analysis, chromosomal
and subcellular location of DYW-deaminase proteins
The multiple Sequence alignment of DYW deaminase domain sequence in four cotton species
was accomplished by using ClustalX2 software with parameters set as default; the nonroot phy-
logenetic tree was constructed by the neighbour joining tree (NJ), the bootstrap set for 1000
replications in MEGA 6 software.
According to GFF information, the gene structure display system of Peking University
(http://gsds.cbi.pku.edu.cn/) was used to draw the gene structure and DYW deaminase gene
were mapped on chromosomes by using Mapchart software.
Using the signal peptide prediction program Target P to predict the subcellular location of
DYW-deaminase proteins.
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 4 / 21
The GO and expression analysis of DYW-deaminase genes
GO analysis of DYW-deaminase genes completed by blast2go software, and the picture was
drawn by online software WEGO (http://wego.genomics.org.cn/cgi-bin/wego/index.pl).
The extraction of RNA from root, stem, leaf and flower was performed using RNAprep
Pure Plant kit (TIANGEN, China). Equal amounts of RNA from three biological replicates
were equally pooled together and constructed the total RNA of each sample. Reverse transcrip-
tion was conducted by using 0.5 μg total RNA with PrimeScriptTMRT reagent kit (Takara Bio
Inc., Dalian, China) to 10 μl reaction volume following the manufacturer’s guides and diluted
to 100 μl before use. For real-time PCR, 1 μl diluted cDNA was mixed with 10 μl 2 × SYBR
Green Mix (Takara) and 1.2 μl primers in a total volume of 20 μl. The PCR was performed
using SYBR Green as fluorescence dye and run on Eppendorf Real plex system with melt curve
program. The PCR amplification reaction was performed as follows: 94˚C for 3 min, 40 cycles
of 94˚C for 5 s, 56˚C for 20 s and 72˚C for 25 s, then ending with the melting curve. Ghhis3was used as the reference gene for normalization. All reactions were done in three biological
replicates, the melting curve was used for each qRT-PCR to determine PCR performance,
Ct value in three biological replicates of each gene in four tissues was calculated by 2-44Ct
method [19]. Primers of 16 selected DYW deaminase genes were designed through Primer
Premier 5.0 (Table 1).
Results
Identification and structural analysis of Gossypium genes encoding
DYW deaminases
A total of 227 G. hirsutum genes were identified as encoding DYW deaminases belonging to the
PPR family. Most of these proteins contained multiple PPR domains, and 190 proteins con-
tained intact DYW deaminase domain structures. An analysis of the 227 predicted G. hirsutumDYW deaminases indicated that the number of amino acids in the protein family ranged from
52 to 2016. Furthermore, we identified 126, 97, and 211 DYW deaminases in G. raimondii, G.
arboreum, and G. barbadense, respectively. Information about the DYW deaminase genes in
Table 1. The primers of 16 selected PPR DYW deaminase genes for quantitative RT-PCR.
Gene ID Forward primer(5’- 3’) Reverse primer(5’- 3’)
Gh_D04G0152 CTCCCCTTTGGATTCTCGTT ACCAAGTAAACCCAACTTCGC
Gh_D09G0761 AAATGCCTGGACGGAATGTT AACCGCCTCATCAACTCTACC
Gh_A02G0419 TCCCATCACCTAACATCGTCTC TGGTTGGGGAGAATAGGGTC
Gh_D08G2381 TTTACATGGTCGAGGAAGGGA TGAAGTGCTCAACTCCAGGCT
Gh_D05G0911 AAGCAGGTCTACTGGAAAACGC CGTATATTTTTTCAGATTCGGGGTG
Gh_A03G1665 TGTCGGAACCTGCCTCATT TGACCCGCTTCAACTAAACCT
Gh_A01G1752 CTTGTTCAAAATTGGGTGCC TGCTTTCTTGCCTTGACCAT
Gh_A07G1662 TGTTATCAGTGCTTGTGCGGG CCATTCTTACTGCCCCTGCT
Gh_A12G0594 AACAAACCCGAAACAGCCC CATCAAGTTCCAAGAAACGACG
Gh_D05G0556 TTGCTCTTATGAATACTCCACCAGG CCTTGAAATGGTGAAACCGACT
Gh_D02G1741 GTTACTTGAAGACGGGCGACA GTTTGCTTGATTACTCCCCGTT
Gh_D01G0153 TGGACCCTGATGATACTGCG TCTTGCACCCATCACGCTTC
Gh_A02G0212 TGGTCGTTGATTTGTTAGGCAG GCTCCCAGGATGTTTAGGTTCA
Gh_A12G0486 GCATACGGATGTAGAGGATTTGG ACGGCAGACCTGGCTGTTAT
Gh_A13G1386 AACCGAATGGAGGGCGTAA CCGATGTCCCACAAAGTTCA
Gh_A01G1103 GGGAAACTCTTCATGCTTACACC TGCTGCTTTACCTTGACCGT
https://doi.org/10.1371/journal.pone.0174201.t001
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 5 / 21
Gossypium is summarized in S2 Table. All of the DYW deaminases in the four sequenced cotton
species had similar protein structures and contained the DYW deaminase domain. Addition-
ally, most of the DYW deaminases also contained PPR domain tandem repeats. Most of the
PPR genes had no introns, which is also an important structural feature of the sequences in
DYW deaminase genes. In G. hirsutum, 81% of the genes lacked an intron, while 12% contained
only one intron and 7% consisted of two or more introns (S1 and S2 Figs). Similar proportions
were observed in G. raimondii (i.e., 75, 16, and 9%, respectively) and G. arboreum (i.e., 81, 12
and 7%, respectively). However, in G. barbadense, only 47% of the DYW deaminase genes
lacked introns, while the proportion of genes with introns was relatively high (i.e., 24% with one
intron and 27% with two or more introns).
Conserved sequences in the DYW and DYW deaminase domains
All of the 227 analyzed Upland cotton protein sequences contained the DYW deaminase domain.
Additionally, 190 proteins contained a complete and conserved DYW domain, while the remain-
ing 37 proteins contained only a degenerated E+ domain or part of the DYW domain. Using
Upland cotton as an example, we obtained the consensus sequences and motif logos for the DYW
domains of 68 randomly selected G. hirsutum protein sequences using the MEME online pro-
gram. We also compared the sequences of these proteins with the DYW domain consensus
sequences from tomato and Arabidopsis thaliana (Fig 1). The obtained consensus sequence of the
G. hirsutum DYW domain was as follows: AGYvPDTSFVLHDveEEeKEXMLXYHSEkLAiAFGlis
TPpGTPIRifKNLRVCGDCHtAiKlISKItGREIiVRDsNRFHHFKdGSCSCGDYW.
A comparison of the domain sequences revealed that the DYW domains of the above three
different species were similar (Fig 1). The sequences contained conserved HxEx. . .CxxCH
motifs, which corresponded to the cytidine deaminase active site (i.e., C/HxEx. . .CxxC). Cyti-
dine deaminase is a Zn-dependent enzyme in which the CxxCH motif functions as a binding
site [20]. The conserved His and Cys residues in the DYW domain combine with Zn ions,
while the conserved Glu is required for deamination [8]. Thus, the DYW domain may be
involved in nucleotide deaminations. Additionally, a previous study revealed that if CxxCH in
the DYW domain mutated into GxxGH, the RNA cleavage activity decreases significantly [5],
suggesting that the CxxCH motif in the DYW domain is necessary for endonuclease activity.
Intriguingly, we detected a CxCx motif near the conserved DYW tripeptide or its variants,
which may be crucial for protein functions. These results indicate that the DYW domain may
have multiple functions.
Fig 1. a: The Logo of DYW domain in G. hirsutum. The higher the letter, the higher the conservation; b: The comparison of DYW domain consensus
sequence in upland cotton, tomato and Arabidopsis. The capital letters represent highly conserved, lowercase letters indicate lower conservative.
https://doi.org/10.1371/journal.pone.0174201.g001
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 6 / 21
We also obtained the complete consensus sequence of the G. hirsutum DYW deaminase
domain containing part of the E domain, the whole E+ domain, and the DYW domain (Fig 2).
We compared this sequence with the DYW deaminase domain sequences from other three
cotton species. Furthermore, we obtained a conserved PG box sequence of the E and E+
domains as described by Hayes et al. (2013). The PG box may bind to different factors to
generate editing components, although the role of the PG box in editing events has not been
clarified [21]. The intact DYW deaminase domain sequences were highly similar among the
different analyzed cotton species, and contained the PG box, HxEx. . .CxxCH, and CxCxDYW
motifs. These results indicate the DYW deaminase and DYW domains may exhibit similar
molecular characteristics and functions.
Fig 2. a: The alignment of the DYW deaminase domain in G. hirsutum; b: The Logo of DYW deaminase domain in G. hirsutum; c: The
consensus sequence in G. hirsutum and comparison of DYW deaminase domain in Gossypium. The capital letters represent highly
conserved, lowercase letters indicate lower conservative.
https://doi.org/10.1371/journal.pone.0174201.g002
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 7 / 21
Chromosomal localization and evolutionary relationships among
Gossypium DYW deaminase genes
Of the identified 227 G. hirsutum DYW deaminase genes, 203 genes were mapped to chromo-
somes, with 97 genes localized in the A subgenome and 106 genes localized in the D subgenome
(Fig 3). Each chromosome contained an average of 7.8 DYW deaminase genes. In the A subge-
nome, chromosomes 4 and 10 carried the fewest DYW deaminase genes (i.e., only four each),
while chromosome 12 contained the most (i.e., 14). In the D subgenome, chromosome 3 and 10
had the fewest DYW deaminase genes (i.e., only 3 and 4, respectively), and chromosome 12 car-
ried the most DYW deaminase genes (i.e., 12). Furthermore, some genes clustered together on
chromosomes. For example, four DYW deaminase genes localized within a 3.2-Mb region on
chromosome 5 in the D subgenome (i.e., Gh_D05G0556,Gh_D05G0696,Gh_D05G0877, and
Gh_D05G0911), while five DYW deaminase genes were mapped to a 2.9-Mb region on chromo-
some 12 of the same subgenome. Additionally, in G. hirsutum, the location of genes on chromo-
somes 9, 10, and 13 in the A subgenome exhibited a good collinearity with that on homoeologous
chromosomes 9, 10, and 13 in the D subgenome. In contrast, the chromosomal localization of
DYW deaminase genes in G. raimondii and G. arboreum did not exhibit a good collinearity with
that in G. hirsutum because of differences in gene evolution (S2 Fig).
Due to the evolutionary relationships between diploid and tetraploid cotton species, The DYW
deaminases in the A and D subgenomes of the tetraploid species are homologous to those in dip-
loid species G. arboreuum and G. raimondii, respectively. We analyzed the evolutionary relation-
ships between the G. hirsutum and G. barbadense DYW deaminases and their corresponding
proteins in G. raimondii and G. arboreum. An analysis of the DYW deaminases in the two tetra-
ploid cotton species revealed that the G. hirsutum proteins had more tandem duplications. A com-
parative syntenic map of the DYW deaminase genes from tetraploid cotton species and the two
diploid cotton species was constructed for a further analysis of genetic origins and evolution (Fig
4). During the evolution of Gossypium DYW deaminase genes, one G. hirsutum gene and seven
G. barbadense genes did not have homologous genes in the two diploid cotton species (S2 Table).
These genes may be new, with distinct functions in tetraploid cotton species. Additionally, 19 and
36 G. raimondii genes and 11 and 25 G. arboreum genes are not present in the G. hirsutum and G.
Gh_A01G01441.3Gh_A01G02442.4
Gh_A01G087620.6
Gh_A01G110342.4
Gh_A01G142088.6Gh_A01G167495.3Gh_A01G175296.9Gh_A01G176697.1Gh_A01G180797.9
Gh_A02G00740.6Gh_A02G01762.0Gh_A02G02122.4Gh_A02G04195.4
Gh_A02G088930.7
Gh_A02G119968.3
Gh_A02G160682.6
Gh_A03G03777.2
Gh_A03G058715.5
Gh_A03G085047.3
Gh_A03G130490.8
Gh_A03G166597.8
Gh_A04G00460.5
Gh_A04G060040.9
Gh_A04G083454.3
Gh_A04G100460.0
Gh_A05G05676.1Gh_A05G07457.7
Gh_A05G163016.9
Gh_A05G291770.6
Gh_A05G344089.5
Gh_A06G00160.1
Gh_A06G056914.7
Gh_A06G100348.7
Gh_A06G112070.7
Gh_A06G153399.3Gh_A06G1611101.1
Gh_A07G01832.3Gh_A07G02903.5
Gh_A07G098918.8Gh_A07G099218.9Gh_A07G112523.8Gh_A07G126929.4
Gh_A07G166267.2
Gh_A07G188874.3Gh_A07G214378.1
Gh_A08G058210.2
Gh_A08G138288.6Gh_A08G149491.5Gh_A08G181098.7Gh_A08G1988101.0Gh_A08G1996101.1
Gh_A09G00260.6
Gh_A09G075953.7
Gh_A09G115663.0Gh_A09G117263.2Gh_A09G130365.7Gh_A09G163870.0Gh_A09G192172.4Gh_A09G2140 Gh_A09G214374.5
Gh_A10G03293.0
Gh_A10G06179.7
Gh_A10G104742.2
Gh_A10G177092.2
Gh_A11G05755.5Gh_A11G06326.1Gh_A11G07797.7Gh_A11G122415.1Gh_A11G132917.1Gh_A11G138418.0
Gh_A11G203360.8
Gh_A11G227878.2Gh_A11G234280.4
Gh_A11G265788.8Gh_A11G282191.5
Gh_A12G04298.8Gh_A12G04519.7Gh_A12G04589.8Gh_A12G048610.8Gh_A12G052412.8Gh_A12G059414.9Gh_A12G064018.2
Gh_A12G072433.6
Gh_A12G083753.8
Gh_A12G129568.6
Gh_A12G171378.9Gh_A12G222184.9Gh_A12G227385.4Gh_A12G229085.6
Gh_A13G03474.6
Gh_A13G066519.8
Gh_A13G076832.1
Gh_A13G094150.4
Gh_A13G103958.7Gh_A13G115563.8
Gh_A13G138670.5
Gh_A13G172276.2
Gh_D01G01531.1Gh_D01G01891.6Gh_D01G02422.1
Gh_D01G091215.3
Gh_D01G117826.9
Gh_D01G165852.4Gh_D01G199658.8Gh_D01G200958.9Gh_D01G204659.5
Gh_D02G00890.6Gh_D02G01291.1Gh_D02G01351.2Gh_D02G02402.8Gh_D02G04726.2
Gh_D02G116435.1
Gh_D02G174159.7
Gh_D02G208065.0
Gh_D03G01170.9
Gh_D03G087130.2
Gh_D03G116538.2
Gh_D04G01522.2
Gh_D04G070314.1
Gh_D04G106134.7
Gh_D04G133343.1Gh_D04G148346.7Gh_D04G155447.6
Gh_D05G00510.6Gh_D05G00780.8Gh_D05G05564.5Gh_D05G06965.7Gh_D05G08777.4Gh_D05G09117.7Gh_D05G181216.5
Gh_D05G368861.4
Gh_D06G02412.6
Gh_D06G064411.0
Gh_D06G137442.9
Gh_D06G190259.9Gh_D06G198161.4
Gh_D07G02382.6Gh_D07G03453.6
Gh_D07G106715.1Gh_D07G107015.2
Gh_D07G138022.3
Gh_D07G165834.2Gh_D07G166234.4
Gh_D07G187645.9Gh_D07G210351.2Gh_D07G235555.2
Gh_D08G03373.4
Gh_D08G06779.1
Gh_D08G167752.6Gh_D08G179254.5Gh_D08G200758.5Gh_D08G217061.2Gh_D08G238163.7Gh_D08G239063.8
Gh_D09G00240.7
Gh_D09G076131.9Gh_D09G116238.3Gh_D09G117838.6Gh_D09G130940.3Gh_D09G173245.1Gh_D09G212948.5Gh_D09G2346 Gh_D09G234850.5
Gh_D10G03352.9
Gh_D10G07839.3
Gh_D10G147233.6
Gh_D10G204256.2
Gh_D11G06605.7Gh_D11G07426.4Gh_D11G09087.8Gh_D11G137113.3Gh_D11G139813.8Gh_D11G148014.8Gh_D11G152615.3
Gh_D11G231244.0
Gh_D11G258453.6Gh_D11G265655.3Gh_D11G301961.7Gh_D11G317664.4
Gh_D12G02503.3Gh_D12G04257.0Gh_D12G04547.5Gh_D12G04617.6Gh_D12G04948.5Gh_D12G05389.9Gh_D12G068814.9Gh_D12G074818.3
Gh_D12G086028.2Gh_D12G090131.5
Gh_D12G141743.5
Gh_D12G187451.3Gh_D12G191852.1Gh_D12G2423 Gh_D12G242757.3Gh_D12G254658.3
Gh_D13G03904.3
Gh_D13G078013.2Gh_D13G090118.2
Gh_D13G119335.6Gh_D13G129140.4Gh_D13G144145.3
Gh_D13G169750.8
Gh_D13G207056.2
A1(Chr1) A2(Chr2) A3(Chr3) A4(Chr4) A5(Chr5) A6(Chr6) A7(Chr7) A8(Chr8) A9(Chr9) A10(Chr10) A11(Chr11) A12(Chr12) A13(Chr13)
D1(Chr15) D2(Chr14) D3(Chr17) D4(Chr22) D5(Chr19) D6(Chr25) D7(Chr16) D8(Chr24) D9(Chr23) D10(Chr20) D11(Chr21) D12(Chr26) D13(Chr18)
Fig 3. Mapping of the DYW deaminase genes in the chromosomes. Partial DYW deaminase genes localized in
scaffolds.
https://doi.org/10.1371/journal.pone.0174201.g003
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 8 / 21
barbadense genomes, respectively. Furthermore, 32 and 30 G. raimondii genes and 15 G. arboreumgenes had more than one homologous gene in G. hirsutum and G. barbadense, respectively. We
also observed that 75 and 60 G. raimondii genes and 71 and 57 G. arboreum genes each had one
homologous gene in G. hirsutum and G. barbadense, respectively. Further studies are necessary to
clarify the functions of the unique genes in the two tetraploid cotton species.
Conserved DYW tripeptides and disordered amino acids in the C-
terminal of DYW deaminases
In this study, we observed that the DYW and DFW tripeptides were the most frequently
observed tripeptides in the C-terminal regions of DYW deaminases (Table 2). For example,
there were 131 and 27 proteins that contained the DYW and DFW tripeptides in the C-termi-
nal of G. hirsutum DYW deaminases, respectively. There were 113 and 28 G. barbadense DYW
deaminases with DYW and DFW tripeptides, respectively. Of the diploid Gossypium species,
74, 19 and 58, 8 proteins contained the DYW and DFW tripeptide in G. raimondii and G. arbo-retum, respectively. Though the number of DYW tripeptides differed among Gossypium spe-
cies, its proportions in the whole gene family were similar, ranging from 53.55% to 59.79%.
The types and number of tripeptides present in the C-terminal of DYW deaminases vary
depending on organisms, including Cyanidioschyzon merolae, Physcomitrella patens, Selagi-nella moellendorffii, A. thaliana, Oryza sativa, and Sorghum bicolor (Table 2). Interestingly,
some sequences contained other types of C-terminal tripeptides (with disordered amino acid
residues), accounting for different proportions of the tripeptides in diverse organisms. For
example, there are no DYW deaminases in C. merolae, while P. patens contained only DYW
deaminases with C-terminal DYW and DFW tripeptides. However, 15 DYW deaminases in S.
moellendorffii (i.e., 8.67%) contained a disordered amino acid residue in the C-terminal region.
a bFig 4. Genome wide synteny analysis of DYW deaminase genes. a: Synteny analysis between G. hirsutum and two diploid cotton
species; b: Synteny analysis between G. barbadense and two diploid cotton species; Blue lines indicate syntenic regions between G.
arboretum and tetraploid cotton species, red lines between G. raimondii and tetraploid cotton species.
https://doi.org/10.1371/journal.pone.0174201.g004
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 9 / 21
Additionally, 17.7% and 5.38% of the Gossypium species and A. thaliana DYW deaminases
contained a disordered amino acid residue, respectively.
Subcellular localization and gene ontology analysis of DYW deaminases
To predict the subcellular locations of DYW deaminases, the Target P online program was
used to examine protein transport. The DYW deaminases were detected in similar subcellular
locations in all four analyzed cotton species. In G. hirsutum, 120 DYW deaminases (i.e., 53%)
Table 2. Summary of conserved tripeptides in cotton and other organisms.
G.
raimondii
G.
arboretum
G.
hirsutum
G.
barbadense
Cyanidioschyzo
Merolae
Physcomitrella
patens
Selaginella
moellendorffii
Arabidopsis
thaliana
Oryza
sativa
Sorghum
bicolor
Total 126 97 227 211 0 10 173 93 82 65
DYW 74 58 131 113 0 7 110 57 42 30
DFW 19 8 27 28 0 3 16 14 12 6
Disorder a 17 18 37 45 0 0 15 5 17 16
DCW 1 1 2 2 0 0 10 0 0 0
CXCX_ 0 0 0 0 0 0 8 1 0 1
DHW 1 2 3 2 0 0 6 1 0 1
DYC 2 1 4 5 0 0 2 0 0 0
NIW 0 0 0 0 0 0 2 0 0 0
DTW 0 0 0 0 0 0 2 0 0 0
NFW 2 0 3 2 0 0 1 1 0 0
DIW 1 1 1 0 0 0 1 0 0 0
GYW 1 1 2 1 0 0 0 2 3 2
EYW 0 0 0 0 0 0 0 1 2 2
GFW 1 1 2 2 0 0 0 2 1 1
CXCXG_ _ 1 1 2 1 0 0 0 0 1 1
EFW 0 0 0 0 0 0 0 1 1 1
NYW 0 0 1 1 0 0 0 1 1 1
CXCXD_ _ 0 1 0 0 0 0 0 1 0 1
DMW 0 0 0 0 0 0 0 0 1 0
DYG 2 1 2 2 0 0 0 0 0 0
DLW 1 0 1 1 0 0 0 1 0 0
DSW 1 1 2 0 0 0 0 2 0 0
DKW 1 0 2 2 0 0 0 0 0 0
DVE 1 0 2 2 0 0 0 0 0 0
DRW 0 0 0 0 0 0 0 1 0 0
NLW 0 0 0 0 0 0 0 1 0 0
DNW 0 1 0 0 0 0 0 0 0 0
GLW 0 1 1 1 0 0 0 0 0 0
YFW 0 0 1 0 0 0 0 0 0 0
DIC 0 0 1 0 0 0 0 0 0 0
DYR 0 0 0 1 0 0 0 0 0 0
DFC 0 0 0 0 0 0 0 1 0 0
NCW 0 0 0 0 0 0 0 0 0 1
DAW 0 0 0 0 0 0 0 0 0 1
DFG 0 0 0 0 0 0 0 0 1 0
a Disorder means disordered amino acid residues contained in C-terminal.
https://doi.org/10.1371/journal.pone.0174201.t002
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 10 / 21
were localized to the mitochondria and chloroplasts, while 16 proteins (i.e., 7%) containing an
N-terminal signal peptide were transported to other locations. However, we were unable to
predict the subcellular locations of 40% of the predicted DYW deaminases, suggesting the
need for further investigations. In other cotton species, the distribution of DYW deaminases
in specific subcellular locations was similar to that of G. hirsutum (S3 Fig).
Gene ontology (GO) analysis revealed that G. hirsutum DYW deaminases may be important
for developmental processes (Fig 5). The cellular component proteins were mainly associated
with intracellular or membrane-bound organelles, which is consistent with the subcellular local-
ization results for DYW deaminases. The primary molecular functions included binding as well
as hydrolase and transferase activities. The main biological process terms were related to cellular
metabolic processes, macromolecule metabolic processes, nitrogen compound metabolic pro-
cesses, and primary metabolic processes.
Phylogenetic of Gossypium DYW deaminases
To clarify the evolutionary relationships among Gossypium DYW deaminases, we constructed
a phylogenetic tree using the MEGA 6 program (Fig 6). The DYW deaminases were divided
into five groups (i.e., A–E) based on protein sequence similarities, with groups C and E con-
taining the most and fewest proteins, respectively.
Gossypium DYW deaminase gene expression patterns at different
developmental stages and tissues
To understand the temporal and spatial expression patterns of these PPR DYW deaminase
genes in Upland cotton, a publicly deposited RNA-seq data were used to assess the expression
across tissues and developmental stages. Among the 227 PPR DYW deaminase genes, there
Fig 5. The GO analysis of DYW deaminase proteins in G. hirsutum.
https://doi.org/10.1371/journal.pone.0174201.g005
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 11 / 21
were 219 genes with an FPKM>0 in at least one of the 10 investigated tissues and developmen-
tal stages (Fig 7). The expression for the remaining 8 genes was not detected and will not be dis-
cussed in this study. We observed that the expression of DYW deaminase domain-containing
PPR genes were different in specific tissues and developmental stages. For example, most of the
genes were up regulated in root, stem, leaf and pistil, and fiber tissues, while they were down
regulated in petal and stamen tissues except for Gh_A12G0429,Gh_A13G1155,Gh_D08G0677and Gh_D12G2546. Some genes were specifically expressed in different tissues. For example,
Gh_A13G1722 gene was petal-specific, while Gh_D05G3688was root-specific, because they
were highly expressed in petal and root, respectively.
Quantitative real-time PCR (qRT-PCR) analysis
A quantitative real-time PCR was used to investigate the expression levels of 16 randomly selected
PPR DYW deaminase genes in roots, stem, leaves, and flowers of Upland cotton (Fig 8). All of
Cotton A 07524
Gh A10G
0617
GOBAR AA02841
Gorai.011G
088900.1G
h D10G0783
Cotton A 22853
Gorai.005G
053800.1G
h A02G0419
Gh D
02G0472
GO
BAR AA25847
GO
BAR DD26018
Gh Sca246681G
01Cotton A 05775
GO
BAR AA33689G
h D08G
2381G
h A08G1988
Cotton A 07988
Gorai.008G
210900.1
Gorai.008G
210900.2G
h D12G
1918G
OBAR DD
10094G
OBAR AA22545
Cotton A 01069G
orai.009G072800.1
Gh D
05G0696
GO
BAR
DD
20705G
h A05G
0567G
orai.001G029200.1
Gh D
07G0238
GO
BAR
DD
30693G
h A07G
0183G
OB
AR AA
12061C
otton A 15541G
orai.013G158100.1
Gh D
13G1441
Gh A
13G1155
GO
BAR
AA38674
GO
BAR
DD
09400C
otton A 01590G
orai.003G012700.1
GO
BAR
DD
07549G
h D03G
0117G
h A02G
1606G
OB
AR AA
32594C
otton A 34976G
orai.011G165700.1
Gh D
10G1472
GO
BAR
DD
27125G
h A10G
1047G
OB
AR AA
11572C
otton A 30003G
h A07G
1125G
OB
AR AA
04161G
orai.001G138400.1
Gh D
07G2446
GO
BAR
DD
35880C
otton A 37812G
h A03G
0850G
OBAR
AA0 80 48G
orai.005G136900.1
Gh D
02G1164
GO
BAR
DD
24355G
orai.005G119200.1
Gh D
02G2396
GO
BAR
DD
03637G
OB
AR AA
12322
Gh A
02G0889
Cotton A 28557
Gh A
01G0876
GO
BAR
AA16153
Gorai.002G
120300.1
Gh D
01G0912
GO
BAR
DD
11066
Cotton A 30897
Gh A
03G1304
Gorai.005G
192500.1
Gorai.005G
192500.2
Gh D
02G1741
GO
BAR
DD
19796
GO
BAR
AA12387
Cotton A 20263
Gorai.012G
021300.1
Gh D
04G0152
GO
BAR
DD
15801
Gh A
05G3440
GO
BAR
AA09751
GO
BAR
DD
10853
Cot
ton
A 10
175
Gh
A01
G17
52G
OB
AR A
A40
102
Gor
ai.0
02G
2387
00.1
GO
BAR
DD
0050
2G
h D
01G
1996
Gor
ai.0
02G
1206
00.1
GO
BAR
AA16
151
Cotto
n A
1773
5G
h A1
2G22
73G
OBA
R AA
2474
0G
orai
.008
G26
9700
.1
GO
BAR
DD10
882
GO
BAR
AA24
739
Gh
D12
G24
23Co
tton
A 25
990
Cotto
n A
2599
2G
orai
.004
G07
6500
.1
GO
BAR
DD19
527
Gh
D08
G06
77G
h A0
8G05
82G
OBA
R AA
1381
3
Cotto
n A
0050
1G
h A0
1G18
07G
OBA
R AA
2816
7
Gor
ai.0
02G
2447
00.1
Gh
D01G
2046
Cotto
n A
3233
9
Gh
A11G
0779
GOB
AR A
A292
15
Gor
ai.0
07G
0962
00.1
Gh
D11G
0908
GOB
AR D
D130
13
Cotto
n A
0695
6
Gor
ai.0
09G
0970
00.1
Gh
D05G
0911
Cotto
n A
1689
6
Gh A05
G3969
Gh D05
G0051
GOBAR
AA39
257
Gorai.
009G
0068
00.1
Cotto
n A
2327
2
Gh A13
G0941
GOBAR A
A0658
8
Gorai.
013G
1314
00.1
Gh D13
G1193
GOBAR D
D0183
4
Cotton
A 34
884
GOBAR A
A2984
3
Gh A06
G1873
Gorai.0
10G03
2300
.2
Gorai.0
10G03
2300
.1
Gh D06
G0241
GOBAR DD13
595
Cotton
A 17
776
Gh A12
G2290
GOBAR AA11
285
Gorai.0
08G27
3500
.1
Gh D12
G2427
Cotton A
0957
5
Gh A07
G0989
Gorai.0
01G12
1000
.1
Gh D07
G1067
GOBAR AA1587
0
Cotton A
3772
6
Gorai.0
02G15
1700
.2
Gorai.0
02G15
1700
.1
Gh D01G117
8
Gh A01G1103
GOBAR AA11811
GOBAR DD33682
Gorai.009G198200.3
GOBAR DD04717
GOBAR AA23986
Gorai.009G198200.2
Gorai.009G198200.1
Gh D05G1812
GOBAR DD21204
Gh A05G1630
Cotton A 21752
Gh A01G1420
Gorai.002G200700.1
Gh D01G1658
GOBAR DD13440
Cotton A 08048
Gh A12G1713
GOBAR AA16579
Gorai.008G205900.2
Gorai.008G205900.1
Gh D12G1874
GOBAR DD06747
GOBAR DD37886
Cotton A 32575
GOBAR AA02650
Gorai.008G086300.1
Gh D12G0748
Gh Sca123232G01
Gh A12G0724
Cotton A 27301
Gh A09G1638
Gorai.006G199900.1
Gh D09G1732
GOBAR DD32180
Cotton A 32227
GOBAR AA15945
Gh A06G1533
Gorai.010G214600.1
Gh D06G1902
GOBAR DD19584
Cotton A 16255
Gh A05G3966
Gorai.009G009600.1
Gh D05G0078
GOBAR DD32649
Cotton A 13417
Gh A09G1303
GOBAR AA24276
Gorai.006G154800.1
Gh D09G1309
GOBAR DD06946
Cotton A 30825
Gh A05G2917
Gorai.012G082800.1
Gh D04G0703
Cotton A 36721
Gh A02G1199
GOBAR AA03201
Gorai.002G057400.1
Gh D03G1703
GOBAR DD09520
Cotton A 18227
Gh A06G0016
Gorai.010G001600.1
Gorai.008G028200.1
Gh D12G0250
GOBAR DD00255
GOBAR AA02374
Cotton A 27764
Gh D03G1165
Gh A03G0377GOBAR DD33973GOBAR AA12923Gorai.003G128500.1Cotton A 17523GOBAR AA30383Gh A10G0329Gorai.011G038100.1GOBAR DD29525Gh D10G0335Cotton A 22374Gh A13G1386Gorai.013G198600.1Gh D13G1697GOBAR AA11791GOBAR DD06602Gh Sca181024G01GOBAR DD17455Cotton A 04493Gh A12G1295GOBAR AA32865Gorai.008G156300.1Gh D12G1417GOBAR DD08185Cotton A 21338Gh A12G0486Gorai.008G055100.1Gh D12G0494GOBAR DD34053GOBAR AA08153Cotton A 20564GOBAR AA00732Gh A09G0026Gorai.006G003200.1
Gh D09G0024GOBAR DD24456Cotton A 04367GOBAR DD36164
Gh A02G0074Gorai.005G009200.1Gh D02G0089GOBAR AA24508
Cotton A 39671Gorai.010G158200.1
GOBAR DD03254Gorai.010G153300.1
Gh D06G1374GOBAR DD13959
Gh A06G1120GOBAR AA13100
Cotton A 37555Gorai.007G344400.1
Gh D11G3019GOBAR DD33910
Gh A11G2657GOBAR AA05324
Gorai.005G026900.1
Gh D02G0240GOBAR DD22361
GOBAR AA28361
Gh A02G0176GOBAR AA28362
Cotton A 07464
Gh A04G0046Gorai.009G451200.1
Gh D05G3688
GOBAR DD06176
GOBAR AA30883
Cotton A 10449
GOBAR AA03862
Gorai.011G229900.2
Gorai.011G229900.1
Gh D10G2042
Gh A10G1770
Gorai.011G229900.3
GOBAR DD07612
Cotton A 29248
Gh A06G1611
GOBAR AA32302
Gorai.010G224300.1
Gh D06G1981
GOBAR DD11714
GOBAR DD38242
Cotton A 28020
Gh Sca141989G01
Gh A07G0290
GOBAR AA17450
Gorai.001G040600.1
Gh D07G0345
GOBAR DD15895
Cotton A 06599
Gh A11G2821
GOBAR AA16106
Gorai.007G363000.1
Gh D11G3176
GOBAR DD12502
Cotton A 28368
Gh A11G2278
GOBAR AA24217
Gh D11G2584
GOBAR DD30304
Gorai.007G280600.1
Cotton A 39243
GOBAR AA29285
Gh A07G2181
Gh D07G1662
GOBAR DD36560
GOBAR DD09943
Gorai.001G191900.1
Gh D07G1658
Gh A07G2185
Cotton A 02968
Gh A11G3080
GOBAR AA26382
Gorai.007G152300.1
Gh D11G1398
GOBAR DD10574
Cotton A 01226
GOBAR AA16455
Gorai.009G
057300.1
Gh D05G
0556
GOBAR DD37403
GOBAR DD23854
Cotton A 25415
Gh A04G
0600
GOBAR AA34679
Gorai.009G
414200.1
Gh D04G
1061
Cotton A 16485
Gorai.006G
243300.1
Gh A09G
1921
Gh D
09G2129
GO
BAR DD20453
Cotton A 09744
GO
BAR AA02422
Gh A08G
1382
Gorai.004G
181600.1
Gh D
08G1677
GO
BAR DD19314
Cotton A 07815
Gh A09G
1156
Gorai.006G
137700.1
Gh D
09G1162
GO
BAR DD34428
Cotton A 03411
Gh A04G
0834
GO
BAR AA38340
Gorai.012G
121600.1
Gh D
04G1333
GO
BAR
DD
08834
Cotton A 05942
Gh A
07G1888
GO
BAR
AA24195
Gorai.001G
240800.1
Gh D
07G2103
GO
BAR
DD
20764
Gorai.008G
067300.1
GO
BAR
DD
00789
Gh A
12G0594
GO
BAR
AA12245
Gh S
ca084155G01
Cotton A 17392
GO
BAR
AA22083
Gorai.008G
264200.1
Gh A
12G2221
Gh D
12G2728
Cotton A 12931
GO
BAR
AA23966
Gh A
05G0745
Gorai.009G
092600.1
Gh D
05G0877
GO
BAR
DD
15599
GO
BAR
DD
37880
Gorai.006G
267000.1
Gh D
09G2348
Gh A
09G2143
GO
BAR
AA31790
Gorai.013G
086000.2G
h D13G
0780G
orai.013G086000.1
Gh A
13G0665
GO
BAR
DD
06715C
otton A 09458G
h A11G
0575G
h Sca010496G
01G
h D11G
0660G
OB
AR AA
20521G
orai.007G071200.1
GO
BAR
AA28628
GO
BAR
DD
31628C
otton A 17704G
h A12G
0429G
orai.008G048000.1
GO
BAR
DD
10316G
OB
AR AA
33487G
h D12G
0425G
h Sca157535G
01C
otton A 00058G
h A01G
0144G
OB
AR AA
28654G
orai.002G020900.1
Gh D
01G0189
GO
BAR
DD
24064G
orai.003G097100.1
Gh D
03G0871
GO
BAR
DD
01206G
h A03G
0587G
OB
AR AA
14902C
otton A 09637G
h A09G
2140G
OB
AR AA
31786G
orai.006G266600.1
Gh D
09G2346
GO
BAR
DD
24232
Cot
ton
A 09
578
Gh
A07
G09
92
GO
BAR
AA
2444
3
Gh
D07
G10
70
GO
BAR
DD09
425
Gor
ai.0
01G
1213
00.1
Cotto
n A
3691
0
Gh
A12G
2587
GO
BAR
AA18
315
Gor
ai.0
08G
0795
00.1
Gh
D12
G06
88
GO
BAR
DD04
482
Cotto
n A
0375
9
Gh
A11G
1329
GO
BAR
AA38
069
Gor
ai.0
07G
1608
00.1
Gh
D11
G14
80
GO
BAR
DD15
406
Cotto
n A
0440
7
Cotto
n A
0441
0
Gh
Sca0
0480
2G02
GOB
AR A
A365
68
Gor
ai.0
05G
0144
00.1
Gh
D02G
0135
GOB
AR D
D378
32
Gh
D02G
0129
GOB
AR D
D299
40
GOB
AR D
D299
44
Gor
ai.0
12G
1466
00.1
Gh
D04G
1554
Gh
A04G
1004
GOB
AR D
D315
20
Gor
ai.0
08G
0505
00.1
Gorai.
008G
0505
00.2
Gorai.
008G
0505
00.3
Gh A12
G0451
Gh D12
G0454
GOBAR
DD06
036
Gorai.
013G
2284
00.1
Gh A13
G1722
Gh D13
G2070
GOBAR D
D1701
1
GOBAR A
A1409
5
Gorai.0
13G04
3300
.1
Gh D13
G0390
Gh A13
G0347
Cotton
A 27
094
Gorai.0
05G03
1300
.1
Gh A02
G0212
GOBAR AA33
130
Cotton
A 38
340
Gh A06
G1003
GOBAR AA14
724
GOBAR AA14
723
GOBAR DD01
758
Cotton A
0779
3
Gh A09
G1172
GOBAR DD36
365
Gorai.0
06G13
9500
.1
Gh D09
G1178
GOBAR AA0862
9
GOBAR DD25
932
Gorai.0
06G13
9600
.1
Cotton A
3431
2
Gh A07G1269
Gh D07G1380
GOBAR DD10479
Gorai.001G155500.1
GOBAR AA36335
Cotton A 01429
Gh A12G2727
GOBAR AA20943
Gorai.008G289000.1
GOBAR DD18779
GOBAR DD37372
Gh D12G2546
Cotton A 24301
Gh A08G1810
GOBAR AA25515
Gorai.004G235300.1
Gh D08G2170
GOBAR DD03089
Cotton A 03702
GOBAR AA09700
Gh A11G1384
Gorai.007G166400.1
Gh D11G1526
GOBAR AA27240
GOBAR DD30172
Cotton A 18858
Gh A06G0569
GOBAR AA25278
Gorai.010G075400.2
Gorai.010G075400.1Gh D06G0644
GOBAR DD18750Cotton A 00793Gh A03G1665
Gorai.005G237500.1Gh D02G2080
GOBAR DD09792GOBAR DD37733GOBAR AA20269
Gh D08G2007Cotton A 36351
GOBAR AA00296Gh A11G2033
Gorai.007G249300.1Gh D11G2312Cotton A 33828Gh A13G0768GOBAR DD30932
Gorai.013G098500.1Gh D13G0901GOBAR AA12611Gorai.008G051300.1Gh D12G0461Gh A12G0458Cotton A 26302Gh A08G1494GOBAR AA23445GOBAR DD30508Gorai.004G194300.1Gh D08G1792GOBAR DD29858Gorai.008G073100.1Gh D12G0860Gh Sca082327G01Gh A12G0640GOBAR AA30065GOBAR DD02429
Cotton A 23711GOBAR AA21892Gh A09G0759Gorai.006G094600.1
GOBAR DD30523Gh D09G0761
Cotton A 23710GOBAR DD30522GOBAR AA21895
Cotton A 22715Gh A07G2143Gh D07G2355
GOBAR DD11850Gorai.001G275900.1
GOBAR AA26881Cotton A 30439Gh A13G1039
GOBAR AA04344
Gorai.013G142700.1
Gh D13G1291
GOBAR DD11547
Cotton A 05786
Gh A08G1996
GOBAR AA19031
Gorai.004G262700.1
Gh D08G2390
GOBAR DD31556
Gorai.002G032400.1
Gh D01G0242
Gh A01G0244
GOBAR AA25897
Cotton A 17617
GOBAR AA05746
Gorai.004G038400.1
Gh D08G0337
GOBAR DD28118
Gh A01G1674
GOBAR AA09921
Gorai.001G214700.1
Gh D07G1876
GOBAR DD08488
GOBAR AA21405
Gh A07G1662
Cotton A 15955
Gh A12G0837
Gorai.008G102200.1
Gh D12G0901
GOBAR DD04982
GOBAR AA17793
Cotton A 35950
Gh A01G1766
GOBAR AA28237
Gorai.002G240300.1
Gh D01G2009
GOBAR DD16403
Cotton A 15473
Gorai.007G079800.1
Gorai.007G079800.2
Gh D11G0742
GOBAR DD15244
Gh A11G0632
GOBAR AA38953
Gh Sca072123G01
Cotton A 02993
Gh A11G1224
Gorai.007G149300.1
Gh D11G1371
GOBAR DD03135
Cotton A 28264
Gh A04G1432
Gorai.012G138600.1
Gh D04G1483
GOBAR DD26562
Cotton A 15028
Gh A11G2342
Gorai.007G288400.1
Gh D11G2656
GOBAR DD33683
Cotton A 00096
Gh A01G2040
GOBAR AA34559
Gorai.002G016600.1
Gh D01G0153
GOBAR DD04999
Cotton A 11411
Gorai.008G059500.1
Gh D12G0538
GOBAR DD34817
GOBAR AA22792
Gh A12G0524
A
B
C
D
E
Fig 6. Phylogenetic tree of DYW deaminase genes in Gossypium.
https://doi.org/10.1371/journal.pone.0174201.g006
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 12 / 21
the 16 selected genes were expressed in the four analyzed tissues, with different tissue-specific
expression profiles. Among these genes, Gh_D09G0761,Gh_D05G0556, and Gh_A03G1665were
the most highly expressed genes in flowers. Gh_D09G0761 and Gh_A03G1665produced similar
expression profiles (i.e., gradual increase in the four examined tissues). The results imply that
these genes may have a special role during flower development. Gh_A01G1752,Gh_D08G2381,
Gh_D04G0152,Gh_A12G0594,Gh_A02G0419,Gh_A07G1662, and Gh_A13G1386were the most
highly expressed genes in leaves. Additionally, highest expression levels for Gh_D02G1741,
Gh_D01G0153,Gh_A12G0486,Gh_A01G1103, and Gh_A02G0212were observed in stems.
Finally, Gh_D05G0911was highly expressed in the roots. These results indicate that DYW deami-
nases may be important for cotton development.
Differentially expressed PPR genes between CMS-D2 and restorer
In a genomewide transcriptome analysis of floral buds between near-isogenic CMS-D2 (i.e., A
line) and its restorer (R line) differing in the Rf1 locus, 6 differentially expressed (DE) PPR
genes were identified (S1 Table). However, no DYW deaminase domain in C-terminal was
found in these 6 PPR genes. Among these 6 PPR genes, Gh_A02G1102 (PLS subgroup), Gh_A11G0734 and Gh_D11G0852 (E subgroup) belong to the PLS subfamily, while Gh_A06G0542,
Gh_D05G3392 and Gh_D06G0610belong to the P subfamily. It is interesting to note that, as com-
pared with the R line, 3 PLS subfamily genes (Gh_A02G1102,Gh_A11G0734 and Gh_D11G0852)
were up regulated and 3 P subfamily genes (Gh_A06G0542,Gh_D05G3392 and Gh_D06G0610)
were down regulated in the A line. Furthermore, the PPR gene Gh_D05G3392 is located on the
chromosome where the restorer gene Rf1 resides. The Gh_D05G3392 protein belongs to P sub-
family and contain 14 repeats of PPR motifs. According to the previous studies [22–26], we
root
stem
leaf
pistil
stamen
petal
5dpa_fibre
10dpa_fibre
20dpa_fibre
25dpa_fibre
Gh_A09G1303Gh_A05G0567Gh_A05G3969Gh_A02G0889Gh_A07G0183Gh_D02G1164Gh_D12G0860Gh_D13G0901Gh_A04G1004Gh_A09G1172Gh_A02G0212Gh_A11G0632Gh_A01G2040Gh_D03G0871Gh_D09G2346Gh_A02G0419Gh_A06G1533Gh_A04G1432Gh_A03G1304Gh_D01G0912Gh_A04G0834Gh_D12G1874Gh_D11G1480Gh_A13G0768Gh_D09G2129Gh_D07G2355Gh_D07G2446Gh_A11G0575Gh_D02G0089Gh_D07G0238Gh_A13G1386Gh_D06G1902Gh_A12G0429Gh_D03G1703Gh_A12G2727Gh_A02G0176Gh_D04G1333Gh_A08G0582Gh_D07G1658Gh_A02G0074Gh_A13G1155Gh_A12G2221Gh_A07G2181Gh_D13G1697Gh_A05G3966Gh_A12G0724Gh_A12G1295Gh_A01G1103Gh_A09G1156Gh_D05G0877Gh_A10G1770Gh_D01G0189Gh_A09G0026Gh_D08G0337Gh_D05G3688Gh_A10G1047Gh_D05G0556Gh_D08G2170Gh_D07G1876Gh_D01G0242Gh_A13G0347Gh_D13G0390Gh_A07G1662Gh_D04G0152Gh_D05G0078Gh_D08G2390Gh_D12G2546Gh_A11G0779Gh_A12G0640Gh_A13G0941Gh_A12G0458Gh_A02G1199Gh_D10G0783Gh_D11G3176Gh_A03G0377Gh_A07G2143Gh_D10G0335Gh_D12G0454Gh_D03G1165Gh_A10G0329Gh_A09G1921Gh_A12G0451Gh_A07G0290Gh_D10G2042Gh_A11G3080Gh_D12G0901Gh_D07G2103Gh_A08G1996Gh_D09G0024Gh_A06G1120Gh_D05G0051Gh_A12G0524Gh_A01G0244Gh_D05G0696Gh_A08G1494Gh_D10G1472Gh_D07G0345Gh_A04G0046Gh_D12G0461Gh_D01G2046Gh_D12G2423Gh_D04G1483Gh_D11G0742Gh_A06G1003Gh_D12G0688Gh_D01G1996Gh_A01G1752Gh_A08G1810Gh_D08G2381Gh_A12G2273Gh_A09G2140Gh_D11G0908Gh_A05G0745Gh_A11G1384Gh_D09G2348Gh_A01G0144Gh_D08G1792Gh_D13G2070Gh_A13G1722Gh_D09G1732Gh_A11G2821Gh_A12G0486Gh_D13G1291Gh_A11G1329Gh_A13G1039Gh_D06G1374Gh_A11G2278Gh_A01G1766Gh_D01G0153Gh_D11G2656Gh_A03G0587Gh_D12G0494Gh_A13G0665Gh_D11G2312Gh_D09G0761Gh_A11G2342Gh_D01G1658Gh_D02G0135Gh_Sca004802G02Gh_A11G2657Gh_D11G3019Gh_D04G1554Gh_D02G2396Gh_D08G0677Gh_D12G2728Gh_D12G1918Gh_D07G1380Gh_D13G1441Gh_D11G1371Gh_D01G1178Gh_D03G0117Gh_A11G1224Gh_A02G1606Gh_A04G0600Gh_A01G1674Gh_D12G0748Gh_D12G1417Gh_D02G0129Gh_A06G0569Gh_D04G1061Gh_A07G0992Gh_D09G1309Gh_A05G2917Gh_A01G1420Gh_D02G1741Gh_D08G2007Gh_D11G1526Gh_A05G1630Gh_D05G1812Gh_A12G2290Gh_A03G1665Gh_D02G2080Gh_A12G1713Gh_A07G1888Gh_D02G0472Gh_D11G1398Gh_A06G1611Gh_A08G1988Gh_Sca084155G01Gh_A12G2587Gh_A12G0594Gh_A09G1638Gh_D12G0425Gh_A07G2185Gh_A03G0850Gh_A08G1382Gh_D08G1677Gh_A01G0876Gh_A07G1125Gh_A06G0016Gh_A11G2033Gh_D05G0911Gh_D06G1981Gh_D06G0241Gh_D12G2427Gh_A06G1873Gh_D09G1178Gh_D04G0703Gh_D09G1162Gh_A10G0617Gh_D01G2009Gh_D13G1193Gh_A07G1269Gh_D07G1067Gh_A07G0989Gh_D13G0780Gh_A05G3440Gh_D11G2584Gh_A09G0759Gh_A09G2143Gh_A01G1807Gh_D12G0250Gh_D02G0240Gh_D06G0644Gh_D07G1662Gh_D12G0538Gh_D07G1070Gh_A12G0837Gh_Sca010496G01
−2
−1
0
1
2
Fig 7. Expression profiles of DYW deaminase genes during fiber development and in different
tissues. The color bar represents the expression values.
https://doi.org/10.1371/journal.pone.0174201.g007
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 13 / 21
adopted the three number PPR code as proposed by Yin et al [24] and predicted the amino acids
in 3 particular positions (1, 4 and ii; residues 2, 5 and 35, respectively, as used in this study) within
Gh_D05G3392 protein motifs. In Fig 9, we aligned the PPR motif and highlighted the residues
at 2, 5 and 35 positions. Additionally, we also predicted the RNA sequence (UUUUUUUUUU
CUU) recognized by the Gh_D05G3392 according to the rule proposed by Yagi et al [23]. In-
terestingly, one of the blast results in NCBI showed that this sequence (UUUUUUUUUUCUU)
is a partial sequence of a pseudogene in G. hirsutum and SSR marker GNCOT-C1-F (GenBank:
KX090570.1). However, it is outside of the Rf1 locus [27], while other five genes were even not
located on the Rf1-carrying chromosome (Gh_D05). Thus, with the introduction of the dominant
restorer gene Rf1 to the A line containing the CMS-inducing cytoplasm, the expression of the
Root stem leaf flower0
2
4
6
8
10
12noiss
erpx
e evit
ale
R
Gh_D09G0761a
Root stem leaf flower0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0Gh_D05G0556
Root stem leaf flower0
2
4
6
8
10
12
14
16Gh_A03G1665
Root stem leaf flower0
1
2
3
4
5
6noisser
pxe
evital
eR
Gh_A01G1752b
Root stem leaf flower0
1
2
3
4
5
6 Gh_D08G2381
Root stem leaf flower0
200
400
600
800
1000
1200 Gh_D04G0152
Root stem leaf flower0
50
100
150
200
250
Gh_A12G0594
Root stem leaf flower0
5
10
15
20
25
30
35noiss
erpx
e evit
ale
R
Gh_A02G0419
Root stem leaf flower0
2
4
6
8
10
12
14
16
18
20Gh_A07G1662
Root stem leaf flower0
10
20
30
40
50
60
70 Gh_A13G1386
Root stem leaf flower0
1
2
3
4
5
6
7noiss
erpx
e evit
ale
R
Gh_D02G1741c
Root stem leaf flower0
2
4
6
8
10
Gh_D01G0153
Root stem leaf flower0
10
20
30
40
50
60Gh_A12G0486
Root stem leaf flower0
1
2
3
4
5
6noiss
erpx
e evit
ale
R
Gh_A02G0212
Root stem leaf flower0
2
4
6
8
10
12 Gh_A01G1103
Root stem leaf flower0.0
0.5
1.0
1.5
2.0
noiss
erpx
e evit
a le
R
Gh_D05G0911d
Fig 8. Expression analysis of 16 selected DYW deaminase genes in different tissues through
qRT-PCR. a: High expression in the flower; b: Expression peaks in the leaf; c: High expression in the stem; d:
High expression in the root.
https://doi.org/10.1371/journal.pone.0174201.g008
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 14 / 21
three PLS subfamily genes (PLS and E subgroup) was suppressed, while other three genes in the
P subfamily were activated for male fertility restoration in the R line. These results indicated that
DYW deaminase domain-containing PPR genes may not directly function in the CMS occur-
rence or fertility restoration. This is consistent with previous results in cloned restorer genes from
other plant species.
Discussion
Relationship between DYW deaminase and DYW domains
In this study, we identified 227 DYW deaminase genes in the G. hirsutum genome. We also
obtained the consensus sequence of the Gossypium DYW deaminase domain. Compared with
the DYW domain, the complete DYW deaminase domain contained the E and E+ domains,
which was consistent with the findings of a previous study [7]. Additionally, a PG-box was
detected in the DYW deaminase domain, spanning the E and E+ domains. The functions of
this PG box have not been characterized.
Molecular evolution of DYW deaminases
Allotetraploid Gossypium species evolved from two diploid cotton species through hybridiza-
tions and genome doubling that occurred 1–2 million years ago [11]. Therefore, diploid and
allotetraploid cotton species are useful models for investigating the evolution of plants. The
whole genome sequences of Gossypium species have enabled researchers to study the molecu-
lar evolutionary characteristics of Gossypium DYW deaminases. In this study, we analyzed the
types and number of DYW tripeptides or their variants in four Gossypium species (Table 2).
We observed that DYW tripeptides are the most conserved and abundant, followed by the
DFW tripeptide. Additionally, some tripeptide types are specific to different Gossypium spe-
cies. Furthermore, in contrast to the diploid species, the number of tripeptide types increased
and the proportion of conserved tripeptides in the whole DYW deaminase family decreased in
allotetraploid Gossypium species.
By comparing the types and number of tripeptides present in the C-terminal of DYW
deaminases from red algae (C. merolae), bryophytes (P. patens), lycophyta (S. moellendorffii),monocots (Sorghum bicolor and Oryza sativa), and eudicots (A. thaliana and Gossypium spe-
cies), we determined that diverse plant species evolved differently in terms of the types and
number of tripeptides. For example, C. merolae lacks DYW deaminases, while P. patens pro-
duces 10 DYW deaminases with a conserved C-terminal tripeptide (i.e., either DYW or
DFW). Selaginella moellendorffii rapidly evolved 10 types of tripeptides, and the proportion of
Fig 9. Sequence alignment of the 14 repeats of Gh_D05G3392. The residues at 2, 5 and 35 positions
were predicted to be the molecular determinants for RNA binding specificity are red. The RNA sequences
recognized are listed on the right, 5’ to 3’ from top to bottom.
https://doi.org/10.1371/journal.pone.0174201.g009
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 15 / 21
conserved DYW tripeptides decreased. Additionally, a disordered amino acid residue is pres-
ent in the C-terminal of some S. moellendorffii DYW deaminases, similar to corresponding
DYW deaminases in monocots and eudicots. This indicates that the types of conserved DYW
tripeptide increased during the evolution from algae to angiosperms. Selection pressures may
have induced the adaptation of DYW tripeptide types to evolutionary processes. Previous stud-
ies indicated that the phylogenetic distribution of DYW domains is highly correlated with
RNA editing, with the number of RNA editing events increasing during evolution [6, 28].
Thus, the types and number of conserved tripeptides in DYW domains may be highly corre-
lated with RNA editing events.
The frequency of residue changes differs among conserved tripeptides. In the DYW tripep-
tide, Trp is the most conserved residue, followed by Asp. We observed that most DYW deami-
nases contain a CxCx motif near the conserved tripeptide. However, some protein sequences
lacking the CxCx motif may also be missing the HxExx. . .CxxCH motif. This may be because
during early plant evolutionary processes, the conserved DYW tripeptide sequence became
flexible and distinct tripeptides due to residue changes increased the number of tripeptide
types. Some flexible tripeptides were gradually lost during evolution, which resulted in the loss
of the CxCx motif and other sequences in the DYW domain following the tripeptide. Based on
the relationship between the conserved DYW tripeptide and CxCx motif, we hypothesized that
the conserved DYW tripeptide functions like a protective cap to prevent the loss of the CxCx
motif. As distinct parts of DYW deaminases, the roles of the CxCx motif and conserved DYW
tripeptide should be more thoroughly characterized in future studies.
PPR gene family and male fertility restoration
The PPR gene family constitutes a large family of RNA binding in plants, it involves in many cel-
lular functions and biological processes in organelles including gene expression, RNA stabiliza-
tion, RNA cleavage and RNA editing [25, 29]. Previous studies indicated the PPR proteins
involved in RNA editing events mostly present a characteristic motif, which contain a PPR tract
followed by C-terminal motifs (E, E+ and DYW), although the E+ and DYW domain are often
missing [30]. Additionally, a conserved motif HxE(x)nCxxCH exist in DYW domain is required
for C to U conversion, rather than for recognition the editing site [31]. In this study, we found
the complete DYW deaminase domain contained the part of E, E+ and DYW domains, and
most DYW deaminases contain a CxCx motif near the conserved tripeptide. However, some pro-
tein sequences lacking the CxCx motif may also be missing the whole or part of HxExx. . .CxxCH
motif. The DYW deaminase domain as the catalytic domain maybe promote C to U conversion.
As the tandem arrays of PPR motifs, Yagi et al. (2013) found amino acid residues at 3 particular
position (1, 4, and ii) within the PPR motif recognizing the upstream nucleotide sequence for the
editing sites [23]. Furthermore, Takenaka et al. (2013) found that combining the P, L and S
motifs will improve the prediction of RNA editing target sites [26]. In this study, we predicted
the PPR code in 3 particular positions (1, 4 and ii; residues 2, 5 and 35, respectively, as used in
this study) within Gh_D05G3392 motifs and predicted the RNA sequence (UUUUUUUUUU
CUU) recognized by Gh_D05G3392, which are part of a pseudogene in Gossypium hirsutum and
SSR marker GNCOT-C1-F (GenBank: KX090570.1).
CMS in flowering plants is characterized by a maternally inherited trait that lacks functional
pollen grains [32], and this trait can be restored by nuclear genes known as restorer-of-fertility
(Rf) genes [33]. Previous studies indicated that RNA editing events may have certain relation-
ship with CMS occurrence in many crops. For example, Wei et al. (2008) found that the editing
of atp9 change an arginine codon into a termination codon, shorting the protein to the stan-
dard in Ying xiang B while in Ying xiang A, which has no termination codon, cannot be a
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 16 / 21
normal protein [34]. Wang et al. (2009) found that RNA editing occurrence rates in atp6 and
cox2 generally increase in sterile lines [35]; Chakraborty et al. (2015) indicated that an over-
expression of the unedited mitochondrial orfB gene generated male sterility in fertile rice lines
in a dose-dependent manner, in which the unedited orfB gene expression increased, resulting
in a low level of ATP synthase activity of F1F0-ATP synthase (complex V) [36]. In cotton,
Suzuki et al. (2013) identified RNA editing sites in eight mitochondrial genes among CMS-D8
three line systems, and found that the restorer gene may change RNA editing efficiency [37].
However, little is known about the RNA editing events of mitochondrial genes in CMS-D2.
Wu et al. (2011) found a C-U and a C-G editing site at position 1178 and 1214 bp downstream
of the ATG codon in atpA and a A-U and a C-U editing site at position 18 and 35 bp upstream
of ATG codon in nad6 between maintainer and CMS-D2 lines, respectively [16]. Thus, we
speculate that RNA editing events in D2 mitochondrial genes may regulate ATP synthase
activity, and the impaired ATP synthesis results in CMS occurrence.
Fertility restoration generally requires the reduction of CMS-associated RNAs or proteins
by the action of the Rf gene (s). Most of the restorer genes cloned so far are reported to encode
a PPR protein. For instance, the radish Rfo [38], and two PPR genes in sorghum [14, 15], and
Rf4 and Rf6 gene in rice [12, 13] each encodes a different PPR protein. Additionally, previous
studies indicated that most of them belong to P subfamily, except for that fact that the Rf1 gene
in sorghum encoded a PPR protein which is classified as member of E subgroup of PLS sub-
family [14, 39][14, 33]. In this study, 6 differentially expressed (DE) PPR genes were identified
through a genomewide transcriptome analysis between A and R lines. Interestingly, 3 PLS sub-
family genes (Gh_A02G1102,Gh_A11G0734 and Gh_D11G0852) were up regulated and 3 P
subfamily genes (Gh_A06G0542,Gh_D05G3392 and Gh_D06G0610) were down regulated in
the A line. These results indicated that DYW deaminase domain-containing PPR genes may
not directly function in the CMS occurrence or fertility restoration, while P subfamily genes
might have critical role in the process of fertility restoration. And this is consistent with the
previous studies that most of Rf genes belong to P subfamily [39]. Previous studies indicated a
small subgroup of PPR genes existed in plant genomes that share high similarity with the iden-
tified Rf-PPRs, called as Rf-like PPRs (RFL) genes [38, 40]. For example, in radish Rfo loci,
three similar PPRs were identified, which may arise through gene duplication events [38]. In
this study, five genes were not located on the Rf1-carrying chromosome (Gh_D05), while the
PPR gene Gh_D05G3392 is located on the chromosome where the restorer gene Rf1 resides
[27], though it is outside of the Rf1 locus and its protein was predicted to target in the chloro-
plasts, and the physical distance is short. Therefore, we speculate that the Gh_D05G3392 gene
may share a high similarity with Rf1 in cotton but it is not the candidate gene for Rf1. As for
the mechanism of CMS restoration in cotton, due to the limited study, we speculate that the
Rf1 may be a PPR gene belonging to the P subfamily. Takenaka et al. (2013) proposed that the
P subfamily PPR proteins often bind tightly to their RNA targets and cannot be removed [26].
And Suzuki et al. (2013) indicated that the restorer gene may alter RNA editing efficiency
among the CMS-D8 three line systems [37]. Thus, Rf1-PPR protein maybe binds tightly to
their CMS gene or product, resulting in that the DYW deaminase gene cannot edit the CMS
gene product, so the male fertility is restored. However, the RNA editing events in CMS-D2
cotton and the role of Rf1 gene in RNA editing still need further studies.
Conclusion
We identified DYW deaminases based on four sequenced Gossypium species genomes. We
conducted chromosomal and subcellular localization experiments as well as gene structure,
expression, and GO analyses for the identified genes and proteins. Most of the Gossypium
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 17 / 21
DYW deaminases contained the conserved Asp-Tyr-Trp tripeptide (or variants) in the C ter-
minal. The DYW deaminases lacking the conserved tripeptide may have been gradually lost
during evolution. The PPR proteins containing the DYW domain are ancient RNA-editing
proteins. With increasing numbers of editing sites, some amino acids in the DYW domain
were gradually lost during evolution, resulting in an increase in the abundance of proteins
from other subgroups (e.g., when the DYW domain was converted to an E domain). There-
fore, the DYW deaminase domain might influence nucleotide deamination, which is consis-
tent with the predicted molecular function based on GO analysis. Furthermore, regarding
subcellular localizations, DYW deaminases were predicted to mainly localize in mitochondria
or chloroplasts, which was consistent with GO analysis results related to cell components. In
addition, the transcriptome analysis indicated no differentially expressed DYW deaminase-
coding PPRs were directly associated with male sterility and restoration in the CMS-D2 sys-
tem. In summary, our study provides basic information regarding the structural and evolu-
tionary characteristics of DYW deaminases in cotton. Our findings may be useful for
improving cotton production in future studies.
Supporting information
S1 Fig. The number of introns in DYW deaminase genes in Gossypium.
(TIF)
S2 Fig. The gene structure of DYW deaminase in Gossypium.
(EPS)
S3 Fig. Mapping of the DYW deaminase genes in the chromosomes of G. arboretum, G. rai-mondii and G. barbadense.
(EPS)
S4 Fig. Subcellular localization of DYW deaminases in Gossypium.
(TIF)
S1 Table. Differentially expressed genes between CMS-D2 and restorer.
(XLSX)
S2 Table. Information about the DYW deaminase genes in Gossypium.
(XLSX)
Acknowledgments
We thank Xihua Li and Nuohan Wang, Qi Gong for analyzing the genome data, figure and
helpful comments on the manuscript.
Author Contributions
Conceptualization: CX JW.
Data curation: CX JW JZ.
Formal analysis: BZ GL XL.
Funding acquisition: CX JW.
Investigation: BZ GL XL LG XZ.
Methodology: CX JW TQ HW.
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 18 / 21
Project administration: CX JW LG.
Resources: XZ HW HT XQ.
Supervision: LG TQ XZ.
Visualization: BZ GL XL JZ.
Writing – original draft: BZ JW JZ.
Writing – review & editing: BZ JW JZ.
References
1. Wedekind JE, Dance GSC, Sowden MP, Smith HC. Messenger RNA editing in mammals: new mem-
bers of the APOBEC family seeking roles in the family business. Trends in Genetics. 2003; 19(4):207–
16. https://doi.org/10.1016/S0168-9525(03)00054-4 PMID: 12683974
2. Shikanai T. RNA editing in plants: Machinery and flexibility of site recognition. Biochimica et Biophysica
Acta (BBA)—Bioenergetics. 2015; 1847(9):779–85.
3. Small ID, Peeters N. The PPR motif–a TPR-related motif prevalent in plant organellar proteins. Trends
in Biochemical Sciences. 2000; 25(2):45–7.
4. Lurin C, Andres C, Aubourg S, Bellaoui M, Bitton F, Bruyere C, et al. Genome-wide analysis of Arabi-
dopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis. Plant Cell.
2004; 16(8):2089–103. Epub 2004/07/23. PubMed Central PMCID: PMCPmc519200. https://doi.org/
10.1105/tpc.104.022236 PMID: 15269332
5. Nakamura T, Sugita M. A conserved DYW domain of the pentatricopeptide repeat protein possesses a
novel endoribonuclease activity. FEBS Lett. 2008; 582(30):4163–8. Epub 2008/12/02. https://doi.org/
10.1016/j.febslet.2008.11.017 PMID: 19041647
6. Salone V, Rudinger M, Polsakiewicz M, Hoffmann B, Groth-Malonek M, Szurek B, et al. A hypothesis
on the identification of the editing enzyme in plant organelles. FEBS Lett. 2007; 581(22):4132–8. Epub
2007/08/21. https://doi.org/10.1016/j.febslet.2007.07.075 PMID: 17707818
7. Iyer LM, Zhang D, Rogozin IB, Aravind L. Evolution of the deaminase fold and multiple origins of eukary-
otic editing and mutagenic nucleic acid deaminases from bacterial toxin systems. Nucleic Acids Res.
2011; 39(22):9473–97. Epub 2011/09/06. PubMed Central PMCID: PMCPmc3239186. https://doi.org/
10.1093/nar/gkr691 PMID: 21890906
8. Hayes ML, Dang KN, Diaz MF, Mulligan RM. A conserved glutamate residue in the C-terminal deami-
nase domain of pentatricopeptide repeat proteins is required for RNA editing activity. J Biol Chem.
2015; 290(16):10136–42. Epub 2015/03/06. PubMed Central PMCID: PMCPmc4400329. https://doi.
org/10.1074/jbc.M114.631630 PMID: 25739442
9. Wagoner JA, Sun T, Lin L, Hanson MR. Cytidine deaminase motifs within the DYW domain of two pen-
tatricopeptide repeat-containing proteins are required for site-specific chloroplast RNA editing. J Biol
Chem. 2015; 290(5):2957–68. Epub 2014/12/17. PubMed Central PMCID: PMCPmc4317000. https://
doi.org/10.1074/jbc.M114.622084 PMID: 25512379
10. Boussardon C, Salone V, Avon A, Berthome R, Hammani K, Okuda K, et al. Two interacting proteins
are necessary for the editing of the NdhD-1 site in Arabidopsis plastids. Plant Cell. 2012; 24(9):3684–
94. https://doi.org/10.1105/tpc.112.099507 PMID: 23001034
11. Hu G, Hawkins JS, Grover CE, Wendel JF. The history and disposition of transposable elements in
polyploid Gossypium. 2010; 53(8):599–607. https://doi.org/10.1139/g10-038 PMID: 20725147
12. Kazama T, Toriyama K. A fertility restorer gene, Rf4, widely used for hybrid rice breeding encodes a
pentatricopeptide repeat protein. Rice. 2014; 7(1):1–5.
13. Huang W, Yu C, Hu J, Wang L, Dan Z, Zhou W, et al. Pentatricopeptide-repeat family protein RF6 func-
tions with hexokinase 6 to rescue rice cytoplasmic male sterility. Proc Natl Acad Sci U S A. 2015; 112
(48):14984–9. https://doi.org/10.1073/pnas.1511748112 PMID: 26578814
14. Klein RR, Klein PE, Mullet JE, Minx P, Rooney WL, Schertz KF. Fertility restorer locus Rf1of sorghum
(Sorghum bicolor L.) encodes a pentatricopeptide repeat protein not present in the colinear region of
rice chromosome 12. Theoretical and Applied Genetics. 2005; 111(6):994–1012. https://doi.org/10.
1007/s00122-005-2011-y PMID: 16078015
15. Jordan DR, Mace ES, Henzell RG, Klein PE, Klein RR. Molecular mapping and candidate gene identifi-
cation of the Rf2 gene for pollen fertility restoration in sorghum [Sorghum bicolor (L.) Moench].
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 19 / 21
Theoretical and Applied Genetics. 2010; 120(7):1279–87. https://doi.org/10.1007/s00122-009-1255-3
PMID: 20091293
16. Wu J, Gong Y, Cui M, Qi T, Guo L, Zhang J, et al. Molecular characterization of cytoplasmic male steril-
ity conditioned by Gossypium harknessii cytoplasm (CMS-D2) in upland cotton. Euphytica. 2011; 181
(1):17–29.
17. Suzuki H, Rodriguez-Uribe L, Xu J, Zhang J. Transcriptome analysis of cytoplasmic male sterility and
restoration in CMS-D8 cotton. Plant Cell Reports. 2013; 32(10):1531–42. https://doi.org/10.1007/
s00299-013-1465-7 PMID: 23743655
18. You Q, Xu W, Zhang K, Zhang L, Yi X, Yao D, et al. ccNET: Database of co-expression networks with
functional modules for diploid and polyploidGossypium. Nucleic Acids Research. 2017;45(Database
issue):D1090–D9.
19. Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR
and the 2(-Delta Delta C(T)) Method. Methods. 2001; 25(4):402–8. https://doi.org/10.1006/meth.2001.
1262 PMID: 11846609
20. Sugita M, Ichinose M, Ide M, Sugita C. Architecture of the PPR gene family in the moss Physcomitrella
patens. RNA Biol. 2013; 10(9):1439–45. Epub 2013/05/07. PubMed Central PMCID:
PMCPmc3858427. https://doi.org/10.4161/rna.24772 PMID: 23645116
21. Hayes ML, Giang K, Berhane B, Mulligan RM. Identification of two pentatricopeptide repeat genes
required for RNA editing and zinc binding by C-terminal cytidine deaminase-like domains. Journal of Bio-
logical Chemistry. 2013; 288(51):36519–29. https://doi.org/10.1074/jbc.M113.485755 PMID: 24194514
22. Shen C, Zhang D, Guan Z, Liu Y, Yang Z, Yang Y, et al. Structural basis for specific single-stranded
RNA recognition by designer pentatricopeptide repeat proteins. Nature Communications. 2016;
7:11285. https://doi.org/10.1038/ncomms11285 PMID: 27088764
23. Yagi Y, Hayashi S, Kobayashi K, Hirayama T, Nakamura T. Elucidation of the RNA recognition code for
pentatricopeptide repeat proteins involved in organelle RNA editing in plants. PLoS One. 2013; 8(3):
e57286. Epub 2013/03/09. PubMed Central PMCID: PMCPmc3589468. https://doi.org/10.1371/
journal.pone.0057286 PMID: 23472078
24. Yin P. Structural basis for the modular recognition of single-stranded RNA by PPR proteins. Nature.
2013; 504(7478):168. https://doi.org/10.1038/nature12651 PMID: 24162847
25. Prikryl J, Rojas M, Schuster G, Barkan A. Mechanism of RNA stabilization and translational activation
by a pentatricopeptide repeat protein. Proc Natl Acad Sci U S A. 2011; 108(1):415–20. https://doi.org/
10.1073/pnas.1012076108 PMID: 21173259
26. Takenaka M, Zehrmann A, Brennicke A, Graichen K. Improved computational target site prediction for
pentatricopeptide repeat RNA editing factors. Plos One. 2013; 8(6):e65343. https://doi.org/10.1371/
journal.pone.0065343 PMID: 23762347
27. Wu J, Cao X, Guo L, Qi T, Wang H, Tang H, et al. Development of a candidate gene marker for Rf 1
based on a PPR gene in cytoplasmic male sterile CMS-D2 Upland cotton. Molecular Breeding. 2014; 34
(1):231–40.
28. Knoop V, Rudinger M. DYW-type PPR proteins in a heterolobosean protist: plant RNA editing factors
involved in an ancient horizontal gene transfer? FEBS Lett. 2010; 584(20):4287–91. Epub 2010/10/05.
https://doi.org/10.1016/j.febslet.2010.09.041 PMID: 20888816
29. Schmitzlinneweber C, Small I. Pentatricopeptide repeat proteins: a socket set for organelle gene
expression. Trends in Plant Science. 2008; 13(12):663–70. https://doi.org/10.1016/j.tplants.2008.10.
001 PMID: 19004664
30. Okuda K, Myouga F, Motohashi R, Shinozaki K, Shikanai T. Conserved domain structure of pentatrico-
peptide repeat proteins involved in chloroplast RNA editing. Proc Natl Acad Sci U S A. 2007; 104
(19):8178–83. Epub 2007/05/08. PubMed Central PMCID: PMCPmc1876591. https://doi.org/10.1073/
pnas.0700865104 PMID: 17483454
31. Boussardon C, Avon A, Kindgren P, Bond CS, Challenor M, Lurin C, et al. The cytidine deaminase sig-
nature HxE(x)n CxxC of DYW1 binds zinc and is necessary for RNA editing of ndhD-1. New Phytologist.
2014; 203(4):1090–5. https://doi.org/10.1111/nph.12928 PMID: 25041347
32. Hanson MR, Bentolila S. Interactions of Mitochondrial and Nuclear Genes That Affect Male Gameto-
phyte Development. The Plant Cell. 2004; 16(suppl 1):S154–S69.
33. Wang F, Stewart JM, Zhang J. Molecular markers linked to the Rf2 fertility restorer gene in cotton.
Genome / National Research Council Canada = Genome / Conseil national de recherches Canada.
2007; 50(9):818–24.
34. Wei L, Ding YY. Mitochondrial RNA editing of F0-ATPase subunit 9 gene (atp9) transcripts of Yunnan
purple rice cytoplasmic male sterile line and its maintainer line. Acta Physiologiae Plantarum. 2008; 30
(5):657–62.
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 20 / 21
35. Wang J, Cao MJ, Pan GT, Lu YL, Rong TZ. RNA editing of mitochondrial functional genes atp6 and
cox2 in maize (Zea mays L.). Mitochondrion. 2009; 9(5):364–9. https://doi.org/10.1016/j.mito.2009.07.
005 PMID: 19666144
36. Chakraborty A, Mitra J, Bhattacharyya J, Pradhan S, Sikdar N, Das S, et al. Transgenic expression of
an unedited mitochondrial orfB gene product from wild abortive (WA) cytoplasm of rice (Oryza sativa L.)
generates male sterility in fertile rice lines. Planta. 2015; 241(6):1463–79. https://doi.org/10.1007/
s00425-015-2269-5 PMID: 25754232
37. Suzuki H, Yu J, Ness SA, O’Connell MA, Zhang J. RNA editing events in mitochondrial genes by ultra-
deep sequencing methods: a comparison of cytoplasmic male sterile, fertile and restored genotypes in
cotton. Molecular Genetics and Genomics. 2013; 288(9):445–57. https://doi.org/10.1007/s00438-013-
0764-6 PMID: 23812672
38. Brown GG, Formanova N, Jin H, Wargachuk R, Dendy C, Patil P, et al. The radish Rfo restorer gene of
Ogura cytoplasmic male sterility encodes a protein with multiple pentatricopeptide repeats. Plant Jour-
nal. 2003; 35(2):262–72. PMID: 12848830
39. Dahan J, Mireau H. The Rf and Rf-like PPR in higher plants, a fast-evolving subclass of PPR genes.
Rna Biology. 2012; 10(9):1469–76.
40. Fujii S, Bond CS, Small ID. Selection patterns on restorer-like genes reveal a conflict between nuclear
and mitochondrial genomes throughout angiosperm evolution. Proc Natl Acad Sci U S A. 2011; 108
(4):1723–8. https://doi.org/10.1073/pnas.1007667108 PMID: 21220331
A genome-wide identification and analysis of the DYW-deaminase genes in cotton
PLOS ONE | https://doi.org/10.1371/journal.pone.0174201 March 24, 2017 21 / 21