codon usage is an important determinant of gene expression … · codon usage is an important...

9
Codon usage is an important determinant of gene expression levels largely through its effects on transcription Zhipeng Zhou a,1 , Yunkun Dang a,1 , Mian Zhou a,b , Lin Li c , Chien-hung Yu a , Jingjing Fu a , She Chen c , and Yi Liu a,2 a Department of Physiology, The University of Texas Southwestern Medical Center, Dallas, TX 75390; b State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China; and c National Institute of Biological Sciences, Changping District, Beijing 102206, China Edited by Jay C. Dunlap, Geisel School of Medicine at Dartmouth, Hanover, NH, and approved August 11, 2016 (received for review April 27, 2016) Codon usage biases are found in all eukaryotic and prokaryotic genomes, and preferred codons are more frequently used in highly expressed genes. The effects of codon usage on gene expression were previously thought to be mainly mediated by its impacts on translation. Here, we show that codon usage strongly correlates with both protein and mRNA levels genome-wide in the filamentous fungus Neurospora. Gene codon optimization also results in strong up-regulation of protein and RNA levels, suggesting that codon usage is an important determinant of gene expression. Surprisingly, we found that the impact of codon usage on gene expression results mainly from effects on transcription and is largely independent of mRNA translation and mRNA stability. Furthermore, we show that histone H3 lysine 9 trimethylation is one of the mechanisms respon- sible for the codon usage-mediated transcriptional silencing of some genes with nonoptimal codons. Together, these results uncovered an unexpected important role of codon usage in ORF sequences in de- termining transcription levels and suggest that codon biases are an adaptation of protein coding sequences to both transcription and translation machineries. Therefore, synonymous codons not only specify protein sequences and translation dynamics, but also help determine gene expression levels. Neurospora | codon usage | transcription G ene expression is regulated by transcriptional and post- transcriptional mechanisms. Promoter strength and RNA stability are thought to be the major determinants of mRNA levels, whereas transcript levels and protein stability are pro- posed to be largely responsible for protein levels in cells. The use of synonymous codons in the gene coding regions are not ran- dom, and codon usage bias is an essential feature of most ge- nomes (14). Selection for efficient and accurate translation is thought to be the major cause of codon usage bias (49). Recent experimental studies demonstrated that codon usage regulates translation elongation speed and cotranslational protein folding (1013). This effect of codon biases on translation elongation speed and the correlations between codon usage and certain protein structural motifs suggested that codon usage regulates translation elongation rates to optimize cotranslational protein folding processes (11, 12, 1416). Codon optimization has long since been used to enhance protein expression for heterologous gene expression. In addition, the fact that highly expressed proteins are mostly encoded by genes with mostly optimal codons led to the hypothesis that codon usage impacts protein expression levels by affecting translation efficiency (7, 1719). Recent studies, however, suggested that overall trans- lation efficiency is mainly determined by the efficiency of translation initiation, a process that is mostly determined by RNA structure but not codon usage near the translational start site (2022). Recently, codon usage was suggested to be an important determinant of mRNA stability in Saccharomyces cerevisiae and Escherichia coli through an effect on translation elongation (23, 24). It is important to note that S. cerevisiae prefers to have A or T at wobble positions, whereas mammals, Drosophila, and many other fungi prefer C or G. Because such differences may alter mRNA recognition by RNA decay pathways, it is not known whether a similar mechanism exists in other eukaryotic organisms. The filamentous fungus Neurospora crassa exhibits a strong codon usage bias for C or G at wobble positions (16, 25). Codon optimization has been shown to enhance protein expression of heterologous genes in Neurospora (26, 27). We recently dem- onstrated that codon usage is important for the expression, function, and structure of the clock protein FRQ (13, 16). Codon usage affects local rates of translation elongation in Neurospora (preferred codons speed up elongation and rare codons slow it down) and also cotranslational protein folding (12). However, how codon usage regulates gene expression remains unclear. Results Codon Usage Biases Strongly Correlate with Protein and Transcript Levels Genome-Wide in Neurospora. To determine the codon usage effect on protein expression genome-wide, we performed whole- proteome quantitative analyses of Neurospora whole-cell extract by mass spectrometry experiments. These analyses led to the identification and quantification of 4,000 Neurospora proteins based on their emPAI (exponentially modified protein abundance index) values (28), which are proportional to their relative abun- dances in a protein mixture. As shown in SI Appendix, Fig. S1, the results obtained from analyses of two independent replicate samples were highly consistent, indicating the reliability and sensitivity of the Significance Codon usage bias is an essential feature of all genomes. The ef- fects of codon usage biases on gene expression were previously thought to be mainly due to its impacts on translation. Here, we show that codon usage bias strongly correlates with protein and mRNA levels genome-wide in the filamentous fungus Neurospora, and codon usage is an important determinant of gene expression. Surprisingly, we found that the impacts of codon usage on gene expression are mainly due to effects on transcription and are largely independent of translation. Together, these results un- covered an unexpected role of codon biases in determining tran- scription levels by affecting chromatin structures and suggest that codon biases are results of genome adaptation to both tran- scription and translation machineries. Author contributions: Z.Z., Y.D., and Y.L. designed research; Z.Z., Y.D., M.Z., L.L., and S.C. performed research; C.-h.Y. and J.F. contributed new reagents/analytic tools; Z.Z., Y.D., M.Z., S.C., and Y.L. analyzed data; and Z.Z., Y.D., and Y.L. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. 1 Z.Z. and Y.D. contributed equally to this study. 2 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1606724113/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1606724113 PNAS | Published online September 26, 2016 | E6117E6125 GENETICS PNAS PLUS Downloaded by guest on January 23, 2020

Upload: others

Post on 31-Dec-2019

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Codon usage is an important determinant of gene expression … · Codon usage is an important determinant of gene expression levels largely through its effects on transcription Zhipeng

Codon usage is an important determinant of geneexpression levels largely through its effectson transcriptionZhipeng Zhoua,1, Yunkun Danga,1, Mian Zhoua,b, Lin Lic, Chien-hung Yua, Jingjing Fua, She Chenc, and Yi Liua,2

aDepartment of Physiology, The University of Texas Southwestern Medical Center, Dallas, TX 75390; bState Key Laboratory of Bioreactor Engineering, EastChina University of Science and Technology, Shanghai 200237, China; and cNational Institute of Biological Sciences, Changping District, Beijing 102206,China

Edited by Jay C. Dunlap, Geisel School of Medicine at Dartmouth, Hanover, NH, and approved August 11, 2016 (received for review April 27, 2016)

Codon usage biases are found in all eukaryotic and prokaryoticgenomes, and preferred codons are more frequently used in highlyexpressed genes. The effects of codon usage on gene expressionwere previously thought to be mainly mediated by its impacts ontranslation. Here, we show that codon usage strongly correlates withboth protein and mRNA levels genome-wide in the filamentousfungus Neurospora. Gene codon optimization also results in strongup-regulation of protein and RNA levels, suggesting that codon usageis an important determinant of gene expression. Surprisingly, wefound that the impact of codon usage on gene expression resultsmainly from effects on transcription and is largely independent ofmRNA translation and mRNA stability. Furthermore, we show thathistone H3 lysine 9 trimethylation is one of the mechanisms respon-sible for the codon usage-mediated transcriptional silencing of somegenes with nonoptimal codons. Together, these results uncovered anunexpected important role of codon usage in ORF sequences in de-termining transcription levels and suggest that codon biases are anadaptation of protein coding sequences to both transcription andtranslation machineries. Therefore, synonymous codons not onlyspecify protein sequences and translation dynamics, but also helpdetermine gene expression levels.

Neurospora | codon usage | transcription

Gene expression is regulated by transcriptional and post-transcriptional mechanisms. Promoter strength and RNA

stability are thought to be the major determinants of mRNAlevels, whereas transcript levels and protein stability are pro-posed to be largely responsible for protein levels in cells. The useof synonymous codons in the gene coding regions are not ran-dom, and codon usage bias is an essential feature of most ge-nomes (1–4). Selection for efficient and accurate translation isthought to be the major cause of codon usage bias (4–9). Recentexperimental studies demonstrated that codon usage regulatestranslation elongation speed and cotranslational protein folding(10–13). This effect of codon biases on translation elongationspeed and the correlations between codon usage and certainprotein structural motifs suggested that codon usage regulatestranslation elongation rates to optimize cotranslational proteinfolding processes (11, 12, 14–16).Codon optimization has long since been used to enhance

protein expression for heterologous gene expression. In addition,the fact that highly expressed proteins are mostly encoded by geneswith mostly optimal codons led to the hypothesis that codon usageimpacts protein expression levels by affecting translation efficiency(7, 17–19). Recent studies, however, suggested that overall trans-lation efficiency is mainly determined by the efficiency of translationinitiation, a process that is mostly determined by RNA structure butnot codon usage near the translational start site (20–22). Recently,codon usage was suggested to be an important determinant ofmRNA stability in Saccharomyces cerevisiae and Escherichia colithrough an effect on translation elongation (23, 24). It is importantto note that S. cerevisiae prefers to have A or T at wobble positions,

whereas mammals,Drosophila, and many other fungi prefer C or G.Because such differences may alter mRNA recognition by RNAdecay pathways, it is not known whether a similar mechanism existsin other eukaryotic organisms.The filamentous fungus Neurospora crassa exhibits a strong

codon usage bias for C or G at wobble positions (16, 25). Codonoptimization has been shown to enhance protein expression ofheterologous genes in Neurospora (26, 27). We recently dem-onstrated that codon usage is important for the expression,function, and structure of the clock protein FRQ (13, 16). Codonusage affects local rates of translation elongation in Neurospora(preferred codons speed up elongation and rare codons slow itdown) and also cotranslational protein folding (12). However,how codon usage regulates gene expression remains unclear.

ResultsCodon Usage Biases Strongly Correlate with Protein and TranscriptLevels Genome-Wide in Neurospora. To determine the codon usageeffect on protein expression genome-wide, we performed whole-proteome quantitative analyses of Neurospora whole-cell extractby mass spectrometry experiments. These analyses led to theidentification and quantification of ∼4,000 Neurospora proteinsbased on their emPAI (exponentially modified protein abundanceindex) values (28), which are proportional to their relative abun-dances in a protein mixture. As shown in SI Appendix, Fig. S1, theresults obtained from analyses of two independent replicate sampleswere highly consistent, indicating the reliability and sensitivity of the

Significance

Codon usage bias is an essential feature of all genomes. The ef-fects of codon usage biases on gene expression were previouslythought to be mainly due to its impacts on translation. Here, weshow that codon usage bias strongly correlates with protein andmRNA levels genome-wide in the filamentous fungusNeurospora,and codon usage is an important determinant of gene expression.Surprisingly, we found that the impacts of codon usage on geneexpression are mainly due to effects on transcription and arelargely independent of translation. Together, these results un-covered an unexpected role of codon biases in determining tran-scription levels by affecting chromatin structures and suggest thatcodon biases are results of genome adaptation to both tran-scription and translation machineries.

Author contributions: Z.Z., Y.D., and Y.L. designed research; Z.Z., Y.D., M.Z., L.L., and S.C.performed research; C.-h.Y. and J.F. contributed new reagents/analytic tools; Z.Z., Y.D.,M.Z., S.C., and Y.L. analyzed data; and Z.Z., Y.D., and Y.L. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.1Z.Z. and Y.D. contributed equally to this study.2To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1606724113/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1606724113 PNAS | Published online September 26, 2016 | E6117–E6125

GEN

ETICS

PNASPL

US

Dow

nloa

ded

by g

uest

on

Janu

ary

23, 2

020

Page 2: Codon usage is an important determinant of gene expression … · Codon usage is an important determinant of gene expression levels largely through its effects on transcription Zhipeng

method. In addition, RNA-sequencing (seq) analysis of the Neu-rospora mRNA was performed to determine correlations betweenmRNA levels with codon usage biases. To determine the codonusage bias ofNeurospora genes, the codon bias index (CBI) for everyprotein-coding gene in the genome was calculated. CBI ranges from−1, indicating that all codons within a gene are nonpreferred, to +1,indicating that all codons are the most preferred, with a value of0 indicative of random use (29). Because CBI estimates the codonbias for each gene rather than for individual codons, the relativecodon biases of different genes can be compared.For the ∼4,000 proteins detected by mass spectrometry, which

account for more than 40% of the total predicted protein encodinggenes of the Neurospora genome, there is a strong positive corre-lation (Pearson’s product-moment correlation coefficient r is 0.74)between relative protein abundances and mRNA levels (Fig. 1Aand Dataset S1), suggesting that transcript levels largely de-termine protein levels. Importantly, we also observed a strongpositive correlation (r = 0.64) between relative protein abun-dances and CBI values (Fig. 1B). Interestingly, a similarly strongpositive correlation (r = 0.62) was seen between CBI and relativemRNA levels (Fig. 1C). Because codon usage was previouslyhypothesized to affect translation efficiency, we wondered whethermRNA levels could better predict protein levels if codon usagescores were taken into account. Surprisingly, compared with usingmRNA alone, the two factors together did not markedly improvethe correlation value with protein (Fig. 1D). These results suggestthe possibility that codon usage is an important determinant of

protein production genome-wide mainly through its role in affectingmRNA levels.Based on phylogenetic distribution, Neurospora protein encoding

genes can be classified into five mutually exclusive lineage specificitygroups: eukaryote/prokaryote-core (conserved in nonfungal eukary-otes and/or prokaryotes), dikarya-core (conserved in Basidiomycotaand Ascomycota species), Ascomycota-core, Pezizomycotina-specific,and N. crassa-specific genes (30). The median CBI value of eachgroup decreases as lineage specificity (SI Appendix, Fig. S1B), withthe eukaryote/prokaryote-core genes having the highest average CBIvalues and the N. crassa-specific genes having the lowest averagevalues. Interestingly, the difference of median mRNA levels of eachgene group correlate with that of the group median CBI values (SIAppendix, Fig. S1C). These results suggest that codon usage mayregulate gene expression by enhancing that of highly conserved genesand/or limiting that of evolutionarily recent genes.Transcript levels are thought to be mainly determined by

promoter strength. Therefore, it is surprising that codon usage,an intrinsic feature within ORFs, could have such a strong corre-lation with mRNA levels. Unlike S. cerevisiae, Neurospora codonsprefer C or G at the wobble positions, the observed effects couldbe due to an effect of gene GC contents. However, at the genome-wide level, gene GC contents [calculated from transcription startsite (TSS) to transcription end site (TES)] showed no correla-tion with protein levels or a weak negative correlation withmRNA levels (Fig. 1 E and F). However, GC content at thethird position of codons (GC3), which strongly correlates with

20 4-2

0

+2

0 0.4 0.8

Pro

tein

leve

ls

mRNA levels CBI

A B

-2

0

+2

Pro

tein

leve

ls

r = 0.74 r = 0.64

0 0.4 0.8

0

mR

NA

leve

ls

CBI

r = 0.62

2

4

C

E F G

0.5 0.6-2

0

+2

GC content

Pro

tein

leve

ls

r = -0.04

0.5 0.6GC content

0mR

NA

leve

ls

2

4 r = -0.10

D

-2

0

+2P

rote

in le

vels

0 4 8CBI * mRNA

r = 0.77

H

0.6 0.8GC3

-2

0

+2

Pro

tein

leve

ls

r = 0.49

0.6 0.8GC3

0mR

NA

leve

ls

2

4 r = 0.42

I

0 0.4 0.8CBI

0 0.4 0.8CBI

0 0.4 0.8CBI

0 0.4 0.8CBI

0

mR

NA

leve

ls

2

4

0mR

NA

leve

ls

2

4

0

mR

NA

leve

ls

2

4

0

mR

NA

leve

ls

2

3

1

r = 0.64 r = 0.66 r = 0.68 r = 0.60GC 0.46 - 0.53 GC 0.53 - 0.54 GC 0.54 - 0.55 GC 0.55 - 0.64

Fig. 1. Codon usage but not gene GC content correlates with protein and mRNA levels in Neurospora. (A) Scatter plot of protein levels (log10 emPAI) vs.mRNA levels (log10 RPKM). P < 2.2 × 10−16, n = 4,013. (B) Plot of protein levels vs. CBI. P < 2.2 × 10−16, n = 4,013. (C) Scatter analysis of mRNA levels vs. CBI. P <2.2 × 10−16, n = 4,013. (D) Scatter plot of protein levels vs. CBI × mRNA. P < 2.2 × 10−16, n = 4,013. (E and F) Scatter plot of protein levels (E) or mRNA levels (F)vs. gene GC content, n = 4,013. (G and H) Scatter plot of protein levels (G) or mRNA levels (H) vs. GC3. P < 2.2 × 10−16, n = 4,013. (I) Plots of mRNA levels vs. CBIin four groups of genes with different gene GC content. Neurospora genes were ranked based on gene GC content, the outlier was removed, and the geneswere divided into four groups with equal number of genes based on their gene GC contents. First group, gene GC content 0.46–0.53, n = 987; second, GCcontent 0.53–0.54, n = 986; third, GC content 0.54–0.55, n = 987; fourth, GC content 0.55–0.64, n = 986.

E6118 | www.pnas.org/cgi/doi/10.1073/pnas.1606724113 Zhou et al.

Dow

nloa

ded

by g

uest

on

Janu

ary

23, 2

020

Page 3: Codon usage is an important determinant of gene expression … · Codon usage is an important determinant of gene expression levels largely through its effects on transcription Zhipeng

codon usage (SI Appendix, Fig. S1D), positively correlates withboth protein and mRNA levels (Fig. 1 G and H). Furthermore,when Neurospora genes were split into four groups (each withthe same number of genes) based on their GC contents, asimilarly strong positive correlation between CBI and mRNAlevels was seen in each group (Fig. 1I). Remarkably, the geneGC contents were almost the same for the two middle groups ofgenes. Together, these data suggest that codon usage but notjust gene GC contents is an important determinant of mRNAexpression levels in Neurospora.

Codon Optimization Strongly Enhances Protein and mRNA Levels. Todirectly determine the effect of codon usage on gene expression,we codon-optimized (opt) eight Neurospora genes and two het-erologous reporter genes, firefly luciferase (luc) and S. cerevisiaeI-SceI, based on the Neurospora codon usage table (SI Appendix,Table S1). These genes [wild-type (wt) or opt] were under thecontrol of Neurospora ccg-1 or qa promoters and were targeted tothe his-3 locus of Neurospora by homologous recombination.Homokaryotic transformants were obtained. Codon optimizationresulted in marked increases in protein levels for each of the 10genes (Fig. 2A and SI Appendix, Fig. S2A). For the eight Neurosporagenes, codon optimization resulted in up to 25-fold increase ofprotein levels. For the two heterologous genes, luciferase andI-SceI proteins were barely detectable when encoded by wild-typecodons; levels were 70- to more than 100-fold higher for the codon-optimized versions.Comparison of mRNA levels in these strains showed that

codon optimization also resulted in marked increases of thecorresponding mRNA levels for each of the 10 genes; the foldincreases in mRNA levels were comparable to those of the foldincreases in protein levels (Fig. 2B and SI Appendix, Fig. S2B).For luc and I-SceI genes (under the control of the ccg-1 pro-moter), the mRNA levels of the codon-optimized genes weremore than 70-fold higher than those of the wild-type genes.To determine whether the codon effect depended on the

promoter used to drive the transgene, we used a construct (Pfrq-luc) in which the luc gene is under the control of the frequency(frq) promoter (a weak promoter in Neurospora) at the his-3locus. More than a 100-fold increase in LUC protein and lucmRNA levels due to codon optimization were also observed(Fig. 2C). Side-by-side comparison luc mRNA levels of the Pfrq-luc and Pccg-1-luc strains show that codon optimization had amuch stronger effect on mRNA levels than by changing thepromoter (Fig. 2D). Moreover, when the gene encoding for theseptal pore-associated protein (SPA) 16 (spa16) was underthe control of the vvd promoter, which is only activated afterlight induction (31), codon optimization led to more than 40-foldhigher levels of both protein and mRNA levels after light treatmentthan the gene with the wild-type codon usage (Fig. 2E). These re-sults indicate that the effect of codon optimization on gene ex-pression is independent of the promoter used.Furthermore, similar effects of codon optimization on mRNA

levels were also seen when the Pccg1-luc transgene was targetedto the csr-1 locus (SI Appendix, Fig. S2C), indicating that theeffect of codon usage is independent on its genome locus. To-gether, these results suggest that gene codon usage has an im-portant role in determining gene expression levels in Neurospora,an effect that is largely due to changes of transcript levels.Because of codon biases for C at the wobble positions in

Neurospora, codon optimization would result in increased GCcontent. Thus, it is possible that it is GC content rather thancodon usage responsible for increased gene expression. To ruleout this possibility, we created suboptimal luc and I-SceI genes,in which some of most preferred codons with C at the wobbleposition were replaced by the less preferred G. As a result, theGC contents of these suboptimal genes are the same as the fullyoptimized genes but with reduced codon usage scores. As shown

in Fig. 2 F and G, both protein and mRNA levels of thesessuboptimal genes are much lower than that of the fully optimizedgenes. These results suggest that codon usage, but not just GCcontent, is important for expression levels of these genes.

Codon Usage Does Not Consistently Influence mRNA Stability. Non-optimal codons in an mRNA have been suggested to destabilizemRNA during protein translation in yeast and more recently inzebrafish (23, 32, 33). To determine whether the effect of codonusage in Neurospora is due to its effects on mRNA stability, we com-pared mRNA decay rates of seven wild-type and codon-optimized

0

2

80

120

0

2

150

250 ***

0

2507090

0

2406080

0

2203040

0

4

8

wt opt0

2

8

12

0

2

8

12

Rel

ativ

e pr

otei

n le

vels *** *** ******

LUC I-SceI PKAC-1 SPA16

Rel

ativ

e m

RN

A le

vels

*** *** ***luc I-SceI spa16B

A

pkac-1

C

wt opt wt optRel

ativ

e m

RN

A le

vels

Rel

ativ

e pr

otei

n le

vels

***

D

wt opt0

20

40

wt opt0

40

80

Rel

ativ

e m

RN

A le

vels

Rel

ativ

e pr

otei

n le

vels

Pvvd spa16 at his-3

*** ***

EPfrq luc at his-3

***

0

2

80120

0

2

80

1203000

200030020010

0

***

**

Rel

ativ

e m

RN

A le

vels

Pfrq Pccg-1

mem

mem mem

wt opt wt optwt opt

wt optwt opt wt optwt opt

0

0.5

1.0

1.5

Rel

ativ

e pr

otei

n le

vels

*** ***LUC I-SceI

mem

opt sub-opt

F

Rel

ativ

e m

RN

A le

vels luc

0

0.5

1.0

1.5

opt sub-opt

***

0

0.5

1.0

1.5

Rel

ativ

e pr

otei

n le

vels

mem

opt sub-opt Rel

ativ

e m

RN

A le

vels

0

0.5

1.0

1.5

opt sub-opt

**

I-SceIG

wt opt wt opt

luc at his-3

Fig. 2. Codon optimization results in increased levels of both protein andmRNA in Neurospora. (A) Western blot assays showing the effect of codonoptimization on protein expression levels in the indicated Neurosporastrains. An anti-luciferase antibody is used to detect for the protein levels ofLUC, and anti-Myc antibody was used for other Myc-fusion proteins. (A, Bottom)A representative Western blot showing protein levels of wt or opt strains. (A,Top) Densitometric analyses of three independent samples. The luc, I-SceI, andspa16 genes are driven by the ccg-1 promoter, whereas pkac-1 is under thecontrol of the qa-2 promoter. Membrane was stained and served as loadingcontrol. (B) Quantitative RT-PCR results showing the relative indicated mRNAlevels of wt or optimized opt luc, I-SceI, pkac-1, and spa16 strains. (C) The relativeprotein (Left) and mRNA (Right) levels of wt or optimized opt luc under thecontrol of the frq promoter at the his-3 locus. (D) The mRNA levels of wt or optluc under the control of the frq promoter or the ccg-1 promoter at the his-3locus. (E) The relative protein (Left) and mRNA (Right) levels of wt or opt spa16under the control of the vvd promoter at the his-3 locus. The tissues were firstcultured in constant darkness for 24 h, then transferred to light for 1 h, and thetissues were harvested. (F) The relative protein (Left) and mRNA levels of the optor suboptimized (subopt) luc gene. (G) The relative protein (Left) and mRNA(Right) levels of the opt or subopt I-SceI gene. Error bars shown in all graphs areSDs of the means (n = 3). **P < 0.01; ***P < 0.001.

Zhou et al. PNAS | Published online September 26, 2016 | E6119

GEN

ETICS

PNASPL

US

Dow

nloa

ded

by g

uest

on

Janu

ary

23, 2

020

Page 4: Codon usage is an important determinant of gene expression … · Codon usage is an important determinant of gene expression levels largely through its effects on transcription Zhipeng

Neurospora and heterologous genes after the addition of thiolutin, atranscription inhibitor in Neurospora (34). For five of these genes,codon optimization did not result in significant changes in mRNAdecay rates (Fig. 3). The codon-optimized luc mRNA was morestable than the wild type; however, the modest change in mRNAstability cannot explain the more than 100-fold difference in mRNAlevels observed (Fig. 2B). In contrast, the codon-optimizedNCU02621mRNA is less stable than the wild-type gene. Therefore, unlike inyeast, codon usage does not appear to have a major or universaleffect on mRNA stability in Neurospora.

The Effect of Codon Usage Does Not Require Translation and Is Regulatedat the Level of Transcription. We have recently shown that codonusage influences the rate of translation elongation (12). Thus, it ispossible that the effects of codon usage on gene expression aremediated by its role in mRNA translation. Three separate lines ofevidence, however, indicate that the codon effect on RNA levels isnot due to its role in translation. First, treatment of Neurosporacultures with the protein synthesis inhibitor cycloheximide (CHX)to block protein translation did not change the effect of codonoptimization on luc or I-SceImRNAs (Fig. 4A). Second, we createdstrains in which a stop codon was placed at the 14th amino acidposition in the luc and I-SceI transgenes, which terminate translationfor the rest of the mRNA and, thus, should eliminate potentialtranslation-mediated codon effect. Although the introduction of thestop codon completely abolished the production of both wild-typeand codon-optimized LUC and I-SceI proteins (SI Appendix, Fig. S3A and B), it did not reverse the dramatic effect of codon optimiza-tion on luc and I-SceI mRNA levels in both quantitative RT-PCR(qRT-PCR) and Northern blot analysis (Fig. 4B and SI Appendix,Fig. S3 C and D).Third, we introduced a stable stem loop, which has been

shown to block translation initiation (35), into the 5′ UTR of theI-SceI transgene gene to block translation initiation. Althoughthe introduction of the stem loop completely abolished theproduction of I-SceI protein (Fig. 4C) and did not significantlyaffect RNA stability (SI Appendix, Fig. S3E), it did not changethe dramatic effect of codon optimization on I-SceI mRNA(Fig. 4D). Together, these results demonstrate that the codonusage effects on mRNA expression do not depend on translation.The independence of the codon effect on translation promp-

ted us to test whether codon usage affects transcription. Wedetermined the levels of nuclear RNA, which better reflects genetranscription levels than total RNA. As expected, frq pre-mRNAwas enriched in the nuclear RNA compared with the total RNA

(SI Appendix, Fig. S3F). Codon optimization resulted in a dramaticincrease in nuclear transcript levels of both luc and I-SceI (Fig. 4E),suggesting that the effect of codon usage on RNA level is mainlydue to its effect on transcription.To further confirm this conclusion, we introduced an intron

(from the pkac-1 gene) into the 5′UTR of the spa16 gene driven bythe vvd promoter, so that the level of the intronic pre-mRNA couldbe determined. As shown in SI Appendix, Fig. S3G, the intron wasefficiently spliced in the total RNA. As shown in Fig. 4F, the levelsof mRNA and pre-mRNAs of the optimized spa16 were both∼10-fold higher than the wild-type spa16. Together, these dataindicate that codon optimization results in increased transcription.

Codon Optimality Promotes Enrichment of Active RNA Polymerase II.To determine how transcription is affected, we created strainscontaining frq promoter driven luc gene (Fig. 5A, Top, themiddle gray section of the luc ORF uses either wild-type oroptimized codons). Under constant light, the transcription ini-tiation of the frq promoter is mainly triggered by the binding ofthe WC complex at the pLRE box (36). Chromatin immuno-precipitation (ChIP) assays using WC-2 antibody showed thatcodon optimization of luc resulted in a significant increase ofWC-2 binding at the pLRE locus (Fig. 5A). It was shown that thehistone density affects the recruitment of the WC complex at thefrq promoter (37), so we wondered whether histone density wasaffected after codon optimization. Indeed, ChIP assays showedthat histone H3 enrichment levels on the opt-luc gene weresignificantly lower than that of the wt-luc, especially around theTSS region (Fig. 5B).Nucleosome density is known to affect RNA polymerase II

(Pol II)-mediated transcription (38, 39). To confirm the effect ofcodon usage on transcription, we performed the ChIP assay by usingantibodies against nonphosphorylated and S2-phosphorylatedPol II C-terminal end (CTD) to compare their enrichment onwild-type and codon-optimized genes. The results in SI Appendix,Fig. S4 showed that ChIP using these two and other antibodiesresulted in similar enrichment profiles at an endogenous gene locusas those in other organisms (38). As shown in Fig. 5 C and D and SIAppendix, Fig. S5 A and B, codon optimization resulted in a sig-nificant enrichment of Pol II CTD and S2P-CTD not only within theORF region that was codon optimized, but also in adjacent regionsof luc, I-SceI, and two Neurospora genes (spa16 and NCU02621).We also performed next generation sequencing of ChIP for

both Ser-2 phosphorylated and nonphosphorylated Pol II anddetermined the relative enrichment of phosphorylated Pol II for

wtopt

** ** **wtopt

wtopt

wtopt

wtopt

wtopt

spa16

Addition of thiolutin (min)

pkac-1I-SceIluc

Rel

ativ

e m

RN

A le

vels

Rel

ativ

e m

RN

A le

vels NCU05881NCU05196NCU02621

0

0.5

1.0

1.5

0 30 60 900

0.5

1.0

1.5

0 30 60 900

0.5

1.0

1.5

0 30 60 900

0.5

1.0

1.5

0 30 60 90

0

0.5

1.0

1.5

0 30 60 900

0.5

1.0

1.5

0 30 60 900

0.5

1.0

1.5

0 30 60 90

Addition of thiolutin (min) Addition of thiolutin (min) Addition of thiolutin (min)

Addition of thiolutin (min) Addition of thiolutin (min) Addition of thiolutin (min)

** **wtopt

Fig. 3. Stability of mRNA is not consistently altered by codon optimization. The decay of the indicated mRNAs in the wt and opt strains is shown at theindicated time points after addition of thiolutin, a potent RNA synthesis inhibitor. The densitometric analyses of the results from three independent ex-periments are shown. **P < 0.01.

E6120 | www.pnas.org/cgi/doi/10.1073/pnas.1606724113 Zhou et al.

Dow

nloa

ded

by g

uest

on

Janu

ary

23, 2

020

Page 5: Codon usage is an important determinant of gene expression … · Codon usage is an important determinant of gene expression levels largely through its effects on transcription Zhipeng

all annotated Neurospora genes. As shown in SI Appendix, Fig. S5C and D, the enrichment of Pol II positively correlates withmRNA levels genome-wide. Importantly, CBI values also posi-tively correlate with enrichment of S2 phosphorylated (Fig. 5E)and nonphosphorylated (SI Appendix, Fig. S5E) CTD genome-wide, indicating that genes with more optimal codons are asso-ciated with higher levels of transcription. These results suggestthat gene codon usage is an important determinant of genetranscription levels in Neurospora.To determine whether the transcriptional effect by codon us-

age is due to changes in DNA sequences that may influencetranscription efficiency or elongation (40), we compared thetranscription efficiency of wild-type or optimized luc or I-SceIgenes in a well-established Neurospora in vitro transcription systemusing linearized DNA as templates (41). Surprisingly, codon

optimization of these genes had no effect on transcript abun-dance in this system (Fig. 5F). Together, these results indicatethat the effect of codon usage on transcription does not dependon DNA sequences per se.

H3K9me3 Is Responsible for the Codon Usage-Mediated TranscriptionalSuppression of a Subset of Genes. The fact that codon usage did notaffect transcription efficiency in the in vitro transcription systemraised the possibility that it may influence chromatin structure invivo. After treating Neurospora cultures with trichostatin A (TSA),an inhibitor of histone deacetylases that inhibits the class I and IIhistone deacetylase (HDAC) families of enzymes but not class IIIHDACs, we found that the effects of codon optimization on lucif-erase mRNA and protein were mostly abolished (SI Appendix, Fig.S6 A and B). TSA treatment resulted in a dramatic up-regulation

wt opt0

50

100

150

wt opt0

5

10

15

Rel

ativ

e le

vels

luc I-SceI

E

***

0

5

10

15

Rel

ativ

e le

vels

0

5

10

15

wt opt

*** **

F

control CHX0

200

400

600***

***

control CHX0

20

40

60

80

100 *** ***

Rel

ativ

e m

RN

A le

vels

wt opt

luc I-SceIA

0

200

400

600

0

20

40

60

80

control stop control stop

Rel

ativ

e m

RN

A le

vels luc I-SceI

B

*** *** *** ***

wt optgfpI-SceI

wt opt wt optcontrol stem loop

I-SceI

CI-SceI

wtopt

0

10

20

30

40 ******

control stem loop

Rel

ativ

e m

RN

A le

vels

D

gfpPvvd spa16intronNuclear RNA

spa16 mRNA spa16 pre-mRNA

wt opt

Ccontrol

stem loop

control

stopstop codon

***

mem

Fig. 4. The codon effects on gene expression does not require translation. (A) qRT-PCR experiments showing the relative levels of indicated mRNA in the wtand opt strains. CHX was added to block protein synthesis. (B) qRT-PCR experiments showing the relative levels of indicated mRNA in the wt and opt strainswith or without a stop codon inserted at the 14th amino acid. (C, Top) Diagram depicting the I-SceI (wt or opt) constructs with/without a stem loop in the 5′UTR. (C, Bottom) Western blot results using anti-Myc antibody showing that the introduction of the stem loop abolished the production of full-length I-SceIprotein. (D) qRT-PCR results showing the relative levels of the wt and opt I-SceI mRNA levels with or without the stem loop in the 5′ UTR. (E) qRT-PCR resultsshowing the relative levels of nuclear RNA (luc and I-SceI) in the indicated wt and opt luc and I-SceI strains. (F) Top depicts the constructs in which an intronwas introduced into the 5′ UTR of the wt or opt spa16. (F, Bottom) Relative mRNA (Left) and pre-mRNA levels (Right) of the wt and opt spa16 detected by qRT-PCR and strand-specific qRT-PCR, respectively. The tissues were cultured under constant light for 24 h before harvest. Error bars show the SDs of the means(n = 3). **P < 0.01; ***P < 0.001.

Zhou et al. PNAS | Published online September 26, 2016 | E6121

GEN

ETICS

PNASPL

US

Dow

nloa

ded

by g

uest

on

Janu

ary

23, 2

020

Page 6: Codon usage is an important determinant of gene expression … · Codon usage is an important determinant of gene expression levels largely through its effects on transcription Zhipeng

of luciferase protein expression in the wild-type luc strain buthad little influence on protein expression in the optimized luc strain.However, TSA did not affect the expression of two endogenousgenes (SI Appendix, Fig. S5 C and D). Because TSA was previouslyshown to cause the loss of DNAmethylation, a process that requiresthe heterochromatin mark histone H3 lysine 9 methylation(H3K9me3) in Neurospora (42–44), we examined whether the luctransgene locus at this his-3 locus is associated with H3K9me3 in thewild-type and optimized luc strains by ChIP assays (Fig. 6A and SIAppendix, Figs. S6 E and F and S7A). Only background signals weredetected at the luc locus in the optimized luc strain, but high levelsof H3K9me3, similar to those of known heterochromatin regions (SIAppendix, Fig. S6E), were seen in the strain with the luc of wild-typecodons. H3K9me3 was not limited to the wild-type luc gene regionand was also found in the promoter and at the 3′ end of the luc gene.To determine the effect of H3K9me3 in regulating luc ex-

pression, we then introduced the wild-type and the optimized

constructs into the dim-5KO strain in which H3K9me3 is completelyabolished (43, 45). Remarkably, in dim-5KO strain, the effects ofcodon usage on luciferase protein and RNA were almost com-pletely abolished (Fig. 6B), demonstrating that H3K9me3 is re-sponsible for the codon usage effect of the luc gene expression.H3K9me3 was also detected in the I-SceI locus in the strain

containing the wild-type I-SceI transgene (Fig. 6C and SI Appendix,Fig. S6G). As expected, significant reduction of the effects of codonusage on I-SceI was observed in the dim-5KO strain (Fig. 6D).However, codon optimization still resulted in more than 20-foldup-regulation of the I-SceI mRNA and protein. These resultsindicate that, in addition to H3K9me3, additional unidentifiedmechanism(s) are also responsible for the codon usage effect onthe transcription of the I-SceI gene. In Neurospora, there are twoknown types H3K9me3 loci. Most of the H3K9me3 sites are withintransposon relics of repeat-induced point mutation (RIP) loci (44,46, 47). In addition, convergent transcription can also triggerH3K9me3 and DNA methylation at certain loci (48). Neither thewild-type luc nor the I-SceI transgene locus resembles a typicalRIP’d locus (SI Appendix, Fig. S7 A–C) (49) and neither has con-vergent transcription. Although the wild-type luc and I-SceIsequences have modestly lower GC content compared with theoptimized sequence, other regions with similar levels of GC contentaround the transgene locus have no detectable H3K9me3 (SIAppendix, Fig. S7A). This observation is consistent with the bio-informatic results that codon usage, but not GC content, tightlycorrelates with gene expression levels (Fig. 1). Therefore, anadditional mechanism is also responsible for the establishment ofH3K9me3 triggered by nonoptimal codon usage.

DiscussionBy codon manipulation in vivo and by examining the genome-wide correlations between codon usage and protein and RNA

DC-box pLRE

1 2 3 4

lucPfrq

1 2 3 4

02468

wtopt

S2P ChIP

Inpu

t %

C

** **

***

0

10

20

30wtopt

S2P ChIPIn

put %

E

0

1

Rel

ativ

e S

2P le

vels

0 0.4

r = 0.36

CBI

F

RT no-RT RT no-RT

Rel

ativ

e tra

nscr

ipts

leve

ls

wt opt

luc I-SceI

1

0

In vitro transcription

0

0.5

1.0

CTD ChIP

Inpu

t % wtopt

0

0.4

0.8

Inpu

t % *****

*** ***CTD ChIP

wtopt

0.8

gfpPvvd spa16

1 2 3

*

*****

1 2 3

*

******

1.5

0.2 kb 0.2 kb

BC-box pLRE lucPfrqA 0.2 kb

0

1

2

3

Inpu

t %

WC-2 ChIP

wt opt

***

C-box pLRE

1 2 3 4

lucPfrq 0.2 kb

01234

Inpu

t %

5

1 2 3 4

wtopt

H3 ChIP

**** **

Fig. 5. Codon optimization resulted in enrichment of active Pol II. (A) WC-2ChIP assay results showing the relative enrichment of the transcription ac-tivator WC-2 at the frq promoter in the wt and opt luc gene expressed in thefrq knockout strains. The ChIP results were normalized by input DNA andrepresented as input%. (B) ChIP assay results showing the levels of histoneH3 on different regions of the wt and opt luc gene. (C and D) ChIP assaysshowing the relative enrichment of phosphorylated Ser-2 (S2P) and non-phosphorylated Pol II CTD on the indicated regions. The top of the figureshows the indicated transgene at the his-3 locus. Within each indicated ORF,the gray regions are encoded by optimized or wild-type codons in the opt or wtstrains, respectively. The blank regions in luc were codon optimized. (E) Scatteranalysis showing the correlation of S2P levels with CBI. Pearson’s r = 0.36. P < 2.2 ×10−16, n = 4,013. (F) qRT-PCR results showing the relative RNA levels after invitro transcription assays using linearized wt and opt luc and I-SceI plasmidsas templates. Error bars show the SDs of the means (n = 3).

0

200

400

600

800wtopt

H3K

9me3

leve

ls

A

1 2 3 4

B

wt0

50

100

150

200***

wt0

50

100

150

200R

elat

ive

Pro

tein

leve

ls

Rel

ativ

e m

RN

A le

vels

LUC luc

dim-5KOdim-5KO

C DI-SceI

wt wtdim-5KO dim-5KO

H3K9me3 ChIP250

***ns

ns

wtopt

0

4

200

400

Rel

ativ

e P

rote

in le

vels

0

1030

230

130

Rel

ativ

e m

RN

A le

vels

I-SceI

wt opt wt opt

mem

wt opt wt optI-SceI

mem

LUC

wtopt

**

*

1 2 30

50

100

150

****

**

***

H3K

9me3

leve

ls

C-box pLRE

1 2 3 4

lucPfrq 0.2 kb

0.2 kb

1 2 3

gfpPccg-1 I-SceI

Fig. 6. H3K9me3 is responsible for the codon effects of luc and I-SceItranscription. (A) H3K9me3 ChIP assays using anti-H3K9me3 antibody (ActiveMotif, 39161) showing the relative H3K9me3 levels at the wt and opt luctransgene locus (luc driven by the frq promoter at his-3 locus in the frqKO

strain). The enrichment of H3K9me3 was normalized by tubulin and repre-sented as relative H3K9me3 levels. (B) Comparison of the relative LUC proteinand RNA levels between the wt and dim-5KO strains. (C) ChIP assays showingthe relative H3K9me3 levels on the wt and opt I-SceI gene driven by the ccg-1promoter expressed from the his-3 locus. (D) Comparison of the relative I-SceIprotein and RNA levels between the wt and dim-5KO strains. Error bars showthe SDs of the means (n = 3). *P < 0.05; **P < 0.01; ***P < 0.001.

E6122 | www.pnas.org/cgi/doi/10.1073/pnas.1606724113 Zhou et al.

Dow

nloa

ded

by g

uest

on

Janu

ary

23, 2

020

Page 7: Codon usage is an important determinant of gene expression … · Codon usage is an important determinant of gene expression levels largely through its effects on transcription Zhipeng

expression levels, we showed that codon usage is a major de-terminant of protein expression levels in Neurospora through itseffects on mRNA levels. Surprisingly, such effects are mostly due tochanges in transcription. Furthermore, we identified the chromatinmodification H3K9me3 as one of the mechanisms that contributesto the effect of codon usage on transcription.It was recently shown that codon usage is a major determinant

of RNA stability in budding yeast through its effect on trans-lation (23). Both this study and ours demonstrated that there aregenome-wide effects of codon usage on mRNA levels. There-fore, in addition to embedded “codes” in protein elongationrates, codon usage biases may represent another “code” withinORF that determines transcript levels by affecting mRNA sta-bility (budding yeast) or transcription efficiency (Neurospora).Therefore, codon usage is part of the transcriptional and post-transcriptional mechanisms that control the expression levels ofindividual genes. Unlike in S. cerevisiae, however, codon usage inNeurospora does not have consistent impacts on mRNA stabilityand its effect does not appear to require translation. This dif-ference may be partly contributed by almost opposite codonusage preferences in the two organisms: S. cerevisiae prefers A orT at the wobble positions, whereas Neurospora strongly prefersC or G.Codon usage does not have significant effects on mRNA sta-

bility for most tested Neurospora genes. Consistent with a tran-scriptional effect of codon usage, it was previously shown thatmammalian genes with high GC contents, which means the useof more preferred codons, had higher expression levels thanthose with lower GC content; this observation was not a result ofdifferences in mRNA degradation rates (50, 51). More recently,codon usage was shown to contribute to the balanced expressionof Toll-like receptors by affecting transcription rather thantranslation in mammals (52).Our results in Neurospora suggest that codon usage of an in-

dividual gene is due to coevolution of coding region sequenceswith transcription and translation machineries. The effect ofcodon usage on translation elongation and efficiency selectedcodons that are optimized for accurate and efficient translationand that enhance the cotranslation folding of proteins. However,the demand of optimal protein amount for each protein selectedcertain codons that are optimized for either activating/suppressingtranscription or proper mRNA stability. As a result, codon usage isadapted to both translation and transcription processes; codon in-formation is also read by the transcription machinery in forms ofDNA elements, which are used to suppress or activate transcription.Although most known transcriptional regulatory elements residein the promoter regions, our results demonstrate that the codingsequences can also play a major role in transcriptional regulation.Consistent with this conclusion, it was shown that a significantportion of transcription factor recognition sites reside within plantand human exonic regions, suggesting that the adaptation of codingregion sequences to binding of transcription factors is an importantevolutionary force that drives codon usage biases (53, 54).Our results also suggest that codon usage impacts chromatin

modifications and that this mechanism is primarily responsiblefor the codon usage effects we observed on transcription inNeurospora. H3K9me3 is one of the mechanisms that suppressestranscription of some endogenous genes with poor codon usage.How genes with poor codon usage result in H3K9me3 is unclear.In Neurospora, most known H3K9me3 sites are within trans-poson relics of repeat-induced point mutation loci (44, 46, 47). Itwas proposed that these RIP’d sequences recruit chromatin-modifying enzymes to result in de novo H3K9 trimethylation.Sequence analyses of the wild-type luc and I-SceI genes showedthat they are different from typical RIP’d sequences (SI Ap-pendix, Fig. S7). In addition, sequences nearby with similar GCcontents do not result in H3K9me3. Therefore, it is likely thatdifferent mechanisms are involved in H3K9me3 establishment at

these gene loci and at the RIP’d loci. Although H3K9me3 isalmost completely responsible for the codon usage effects onexpression from the luciferase gene, it only partially contributesto the codon usage effect of I-SceI and other genes and had noimpact on some (Fig. 6 and SI Appendix, Fig. S6). Therefore,multiple mechanisms mediated by DNA elements specified bycodon sequences regulate transcription levels.

Materials and MethodsStrains and Culture Conditions. In this study, FGSC 4200 (a) was used as thewild-type strain for the proteomic, RNA-seq, and ChIP-seq analyses. The 301–15 (bd, his-3, a), 303–3 (bd, frq10, his-3) (55), pkac-1KO (bd, his-3) (56), anddim-5KO (bd, his-3) (57) strains were the host strain for his-3 targeting con-structs. A bd ku70RIP strain was used for the csr-1 targeting transforma-tion (58).

Culture conditions have been described (59). Neurospora mats were cutinto discs and transferred to flasks with minimal medium [1× Vogel’s, 2%(wt/wt) glucose]. After 24 h, the tissues were harvested. To induce the ex-pression of pkac-1, liquid cultures were grown in (10−5 M) quinic acid, pH 5.8,1× Vogel’s, 0.1% glucose, and 0.17% arginine. To induce the expression ofspa16, discs were cultured in constant dark for 24 h and then transferred tolight for 1 h before harvest (experiment in Fig. 2E); discs were cultured inconstant light for 24 h before harvest (experiments in Figs. 4F and 5D and SIAppendix, Fig. S3G). For TSA treatment, 5 × 106 fresh conidia were directlyinoculated into minimal medium with or without 2 μg/mL TSA (42). Thetissues were harvested after 24 h, and protein and RNA analyses were per-formed as described below.

Codon Optimization, Plasmid Constructs, and Neurospora Transformation. Co-don optimization was performed as described (13). Codons were optimizedbased on the N. crassa codon-usage frequency, and the codons in the opti-mized region were changed to the most preferred codon without changingamino acid sequences. For the optimized luciferase gene, all codons (550codons) were most preferred codons (12). The middle region of the opti-mized luc gene (nucleotides 670–1292) was replaced with original fireflycodons, and was used as wild-type luc in this study. The gene regions opti-mized are as follows: I-SceI, nucleotides 6–678 (of 678 nt in ORF); pkac-1,nucleotides 226–954 (of 1,787 nt in ORF); spa16, nucleotides 31–1794 (of1,797 nt in ORF); NCU02621, nucleotides 31–756 and 856–1941 (of 2,127 ntin ORF); NCU03855, nucleotides 742–1509 (of 1,920 nt in ORF); NCU05196,nucleotides 34–564 and 1267–1569 (of 1,593 nt in ORF); NCU05881, nu-cleotides 31–465, 520–603, 814–1008, and 1219–2103 (of 2,103 in ORF);spa1, 31–858, and 1087–1272 (of 1,707 in ORF); spa8, 34–1788 (of 1,794in ORF).

The pMF272.LUC-M-wt and pMF272.LUC-opt constructs, in which the lucgene was driven by the ccg-1 promoter with a his-3 targeting sequence,were generated (12). The PCR fragments containing the ccg-1 promoter andwild-type or optimized luc ORF were inserted into pCSR1 (58) between NotIand EcoRI sites to generate the pCSR1.LUC-M-wt and pCSR1.LUC-opt con-structs. The frq promoter was amplified and inserted into pBM61 (60) byusing the NotI and XbaI sites to generate the pBM61.frq construct. The ORFof the wild-type or optimized luc was inserted into pBM61.frq between XbaIand SmaI sites to generate the pBM61.frq.LUC-M-wt and pBM61.frq.LUC-optconstructs. The suboptimal luc gene was synthesized by Genscript andinserted into pBM61.frq to create pBM61.frq.LUC-subopt construct. Theconstruct pqa-5Myc-6His-PKAC-1 was generated (56). The optimized regionof pkac-1 was synthesized (Genscript) and used to replace the correspondingregion of the pqa-5Myc-6His-PKAC-1 by using a homologous recombination-based cloning method (In-Fusion HD cloning kit; Clontech) to generate pqa-5Myc-6His-PKAC-1-opt. To create pMF272-Myc, a DNA fragment encodingfive copies of the c-Myc peptide tag was added at the 3′ end of the GFPsequence in the plasmid pMF272 (61), which contains the ccg-1 promoterand results in a GFP tag at the C-terminal end of the protein of interest. Thepqa-5Myc-6His-I-SceI-wt and pqa-5Myc-6His-I-SceI-opt constructs were pre-viously generated (62). PCR fragments containing I-SceI-wt or I-SceI-opt ORFwere inserted into pMF272-Myc between XbaI and XmaI sites to generatethe pMF272-Myc-I-SceI-wt and pMF272-Myc-I-SceI-opt constructs. The sub-optimal I-SceI gene was synthesized by Genscript and inserted into pMF272-Myc to create pMF272-Myc-I-SceI-subopt construct. The cDNAs for NCU02621,NCU03855, NCU05196, NCU05881, spa1, spa8, and spa16 were obtained byRT-PCR and inserted into the pMF272-Myc vector. Part or all of the wild-typeORFs of each of these seven genes were replaced by the synthesized fragmentscontaining optimized codons (Genscript) using appropriate cutting sites. Thevvd promoter was amplified and inserted into pBM61 (60) by using the NotI

Zhou et al. PNAS | Published online September 26, 2016 | E6123

GEN

ETICS

PNASPL

US

Dow

nloa

ded

by g

uest

on

Janu

ary

23, 2

020

Page 8: Codon usage is an important determinant of gene expression … · Codon usage is an important determinant of gene expression levels largely through its effects on transcription Zhipeng

and XbaI sites to generate the pBM61.vvd construct. The ORF of the wild-typeor optimized spa16 were inserted into pBM61.vvd between SpeI and EcoRIsites to generate the pBM61.vvd.spa16-wt and pBM61.vvd.spa16-opt constructs.The second intron of pkac-1 ORF was amplified and inserted into the 5′ UTR ofthe vvd promoter of pBM61.vvd.spa16-wt and pBM61.vvd.spa16-opt constructsby using In-Fusion HD cloning kit (Clontech). The pMF272.LUC-M-wt-stop,pMF272.LUC-opt-stop, pMF272-Myc-I-SceI-wt-stop, and pMF272-Myc-I-SceI-opt-stop constructs were generated by site-directed mutagenesis. To generatepMF272-Myc-I-SceI-wt-stem loop, and pMF272-Myc-I-SceI-opt-stem loop con-structs, the stem loop was inserted into the 5′ UTR of the ccg-1 promoter asdescribed (35). The resulting constructs were transformed into the host strainsby electroporation as described (58, 63). Homokaryotic transformants wereobtained by microconidia purification and confirmed by quantitative PCR orSouthern blot analysis. The strains used in this study were listed in SI Appendix,Table S2.

Protein and Proteomic Analyses. Tissue harvest, protein extraction, andWesternblot analysis were performed as described (64–66). For Western blot analyses,equal amounts of total protein (50 μg) were loaded in each lane. After elec-trophoresis, proteins were transferred onto PVDF membrane, and Westernblot analysis was performed. Anti-luciferase antibody (L2164, Sigma) wasused to detect LUC, and anti-Myc antibody (M4439, Sigma) was used todetect Myc-fusion proteins in this study.

For proteomic analyses, the wild-type strain FGSC4200 mats were cutinto discs and cultured for 2 d in minimal medium at room temperature.Proteins were extracted by using RIPA buffer (50 mM Tris·HCl, pH 8,150mMNaCl, 1%Nonidet P-40, 0.5% sodium deoxycholate, and 0.1% SDS)and precipitated by the addition of trichloroacetic acid/acetone. The pelletwas dissolved with urea lysis buffer (8 M urea, 75 mM NaCl, 50 mM Tris,pH 8.2) and digested by using sequencing-grade modified trypsin(Promega) at 37 °C overnight. Tryptic peptides were fractioned by strongcation exchange chromatography. Each fraction was subjected to LC-MS/MSanalyses (13), and the emPAI were calculated for each protein (28). A de-tailed mass spectrometry protocol for Neurospora is available upon requestfrom S.C.

RNA and RNA-seq Analyses. RNA extraction, qRT-PCR, strand-specific qRT-PCR, and Northern blot were performed as described (67). Total RNA wasextracted by using TRIzol and then purified with 2.5 M LiCl (67). cDNAwas obtained by reverse transcription using a High-Capacity cDNA Re-verse Transcription Kit (ABI) in accordance with the manufacturer’s in-structions and subjected to real-time PCR analysis. β-tubulin was usedas internal control. The primer sequences were listed in SI Appendix,Table S3.

The mRNA decay assay was performed as described (34). The tis-sues were harvested at different time points after the addition of thio-lutin (final concentration 12 μg/mL), and Northern blot analyses wereperformed.

The nascent nuclear transcripts were isolated as described (67). Briefly, thenuclei were isolated, and nuclear RNA was extracted by using TRIzol. Con-taminating DNA was removed by using TURBO DNase (Ambion), and theresulting RNA was used for qRT-PCR as described above.

The in vitro transcription assay was performed as described (41). Briefly,the transcription extract was prepared from freshly geminated conidia.The reaction mixtures (total volume 25 μL) contained 10 mM K Hepes,pH 7.9, 73 mM potassium acetate, 2.5 mM DTT, 400 mM each GTP, CTP,ATP, and UTP and 2 μL of transcription extract, 0.6 μL of creatine kinase (1 μg/μL),0.1 μL of creatine phosphate (1 mM), and 0.5 μg of linearized plasmid astemplate. Reactions were incubated at room temperature for 30 min. Thereactions were stopped by the addition of 100 μL of extraction buffer[100 mM sodium acetate, pH 5, 10 mM EDTA, 4% (wt/wt) SDS, and 4 mMurea] and extracted with 500 μL of hot acid phenol once. The supernatantwas transferred to a new tube, and 3.5 volumes of pellet buffer [1 Mammonium acetate, 85% (wt/wt) ethanol] and 6 μL of glycoblue wereadded to precipitate the RNA. Contaminating DNA was removed byTURBO DNase (Ambion), and the resulting RNA was used for qRT-PCR asdescribed above. Samples without reverse transcriptase were also in-cluded as negative controls.

For strand-specific mRNA-seq, total mRNA was extracted from the wild-type strain (FGSC 4200). The strand-specific mRNA sequencing library con-struction and sequencing were performed as described (62). Briefly, totalmRNA were purified with NEBnext Oligo d(T)25 (NEB), fragmented to an∼200-bp length and cloned with NEBNext Ultra Directional RNA Library PrepKit (NEB no. 7420S). Paired-end sequencing was performed by BGI, and shortreads were mapped with tophat (version 2.1.1) to N. crassa genome (Broad

Institute, version 10). Guided by the version 10 annotation, we analyzed thetranscriptional level [fragments per kilobase of exon per million fragmentsmapped (FPKM)] and reannotated the TSS and TES (transcriptional start andtranscriptional end sites) based on RNA sequencing result, with Cufflinks(version 2.0.9) as described (68). The GC content of each gene was calculatedfrom TSS to TES, including introns. We also calculated the GC content at thethird position of codons for each gene (i.e., total number of codons with G/Cat third position divided by total number of codons). For genes with alter-native splicing (∼5% of all annotated genes), the first annotated transcriptswere used for analyses.

Chromatin Immunoprecipitation Assay and ChIP-seq Analyses. ChIP assays wereperformed as described with some modifications (67, 69). The tissues werecultured in 50 mL of minimal medium and were fixed by adding 1%formaldehyde for 15 min at room temperature. The tissues were harvestedand grounded in the liquid nitrogen. One hundred to two hundred mi-croliters of tissue powder was suspended in 300 μL of ChIP lysis buffer.Chromatin was sheared to approximately 500 bp by sonication. In eachreaction, 500 μg of total lysate and 2 μL of antibody were added and in-cubated at 4 °C overnight. The antibodies used in this study were as fol-lows: WC-2 (65), Pol II S2P (Abcam; ab5095), Pol II CTD (Abcam; ab26721),H3 (Abcam; ab1791), H3K9me3 (Active Motif; 39161), and Flag (Sigma;F3165, as negative control). Fifty micrograms of total lysate was saved asinput. Twenty-five microliters of G protein coupled beads were added andincubated for 2 h. The beads washed at 4 °C for 5 min with the followingbuffers: ChIP lysis buffer, low salt buffer, high salt buffer, LNDET (0.25 MLiCl, 1% NP40, 1% deoxycholate, 1 mM EDTA) buffer, and twice with10 mM Tris, 1 mM EDTA buffer. One hundred forty microliters of 10%chelex beads (Sigma) were added and heated at 96 °C for 20 min. Aftercentrifugation, 100 μL of supernatant was transferred to a new tube. Theinput DNA was decross-linked and extracted by phenol. ImmunoprecipitatedDNA was quantified by using real-time quantitative PCR. For CTD, S2P, H3,and WC-2 ChIP, the results were normalized by input DNA and presented asinput %. For H3K9me3 ChIP, the results were further normalized by theinternal control tubulin and represented as relative H3K9me3 levels. Theprimers used were listed in SI Appendix, Table S3. Each experiment wasperformed independently three times.

For semiquantitative PCR, 1 μL of immunoprecipitated or input DNA wasused as the template. Primer sets for both the gene of interest and tubulinwere added to the same PCR. PCR condition was as follows: 4 min at 94 °Cand 26 circles of 94 °C for 30 s, 55 °C for 30 s, and 72 °C for 30 s. PCR productswere resolved by electrophoresis on 2% (wt/wt) argrose gels.

For ChIP-seq analyses, nuclear lysate was used. The nuclei were extracted asdescribed above and resuspended in 300 μL of ChIP lysis buffer. Chromatinwas sheared by sonication to ∼200- to 500-bp fragments. For each immu-noprecipitation assay, total volume was 200 μL, and 50 μL of nuclear lysateand 2 μL of Pol II S2P or Pol II CTD were used for each reaction. The pre-cipitated DNA was used to make the sequencing library in accordance withthe manufacturer’s protocol (iDeal Library Preparation Kit; Diagenode).Illumina sequencing was performed by BGI using an Illumina HiSEq 2000platform.

Data Analyses. Raw reads from both ChIP-seq and RNA-seq were generatedby using HiSeq 2000. The raw ChIP-seq data were aligned against Neu-rospora crassa genome (version 10, Neurospora cassa Database; BroadInstitute) with bowtie (version 1.1.1). Two mismatches were allowed andthe setting (-m 1–best–strata) was used to allow only one best hit for eachread. PCR duplicates were then removed from alignment results withSamtools. The bedgraph files were generated with bedtools for visuali-zation on IGV. To estimate the relative ChIP signal of each gene, wecounted the number of reads within the span of annotated genes fromthe ChIP sample and normalized the counts based on the size of the library(i.e., RPM), which was then divided by the counts from control sample (input)processed in parallel. Raw mRNA-seq data were aligned to N. crassa genomewith tophat (version 2.0.13), and then processed with Cufflinks (version2.1.1) as described to obtain the gene expression data (68). The mRNA level(RPKM), protein level [exponentially modified protein abundance index(emPAI)], and relative ChIP level were log10-transformed.

ACKNOWLEDGMENTS. We thank Drs. Bing Li, Chen-Ming Chiang, andShwu-Yuan Wu for helpful discussion; Dr. Qun He for providing the dim-5 KO strains; and the members of our laboratory for technical assistance.This work is supported by National Institutes of Health Grants GM068496,GM084283, and 1R35GM118118 and Welch Foundation Grant I-1560(to Y.L.).

E6124 | www.pnas.org/cgi/doi/10.1073/pnas.1606724113 Zhou et al.

Dow

nloa

ded

by g

uest

on

Janu

ary

23, 2

020

Page 9: Codon usage is an important determinant of gene expression … · Codon usage is an important determinant of gene expression levels largely through its effects on transcription Zhipeng

1. Ikemura T (1985) Codon usage and tRNA content in unicellular and multicellular or-ganisms. Mol Biol Evol 2(1):13–34.

2. Sharp PM, Tuohy TM, Mosurski KR (1986) Codon usage in yeast: Cluster analysis clearlydifferentiates highly and lowly expressed genes. Nucleic Acids Res 14(13):5125–5143.

3. Comeron JM (2004) Selective and mutational patterns associated with gene expres-sion in humans: Influences on synonymous composition and intron presence. Genetics167(3):1293–1304.

4. Plotkin JB, Kudla G (2011) Synonymous but not the same: The causes and conse-quences of codon bias. Nat Rev Genet 12(1):32–42.

5. Gingold H, Pilpel Y (2011) Determinants of translation efficiency and accuracy. MolSyst Biol 7:481.

6. Akashi H (1994) Synonymous codon usage in Drosophila melanogaster: Natural se-lection and translational accuracy. Genetics 136(3):927–935.

7. Hershberg R, Petrov DA (2008) Selection on codon bias. Annu Rev Genet 42:287–299.8. Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a

dominant constraint on coding-sequence evolution. Cell 134(2):341–352.9. Qian W, Yang JR, Pearson NM, Maclean C, Zhang J (2012) Balanced codon usage

optimizes eukaryotic translational efficiency. PLoS Genet 8(3):e1002603.10. Spencer PS, Siller E, Anderson JF, Barral JM (2012) Silent substitutions predictably alter

translation elongation rates and protein folding efficiencies. J Mol Biol 422(3):328–335.

11. Pechmann S, Chartron JW, Frydman J (2014) Local slowdown of translation by non-optimal codons promotes nascent-chain recognition by SRP in vivo. Nat Struct MolBiol 21(12):1100–1105.

12. Yu CH, et al. (2015) Codon usage influences the local rate of translation elongation toregulate co-translational protein folding. Mol Cell 59(5):744–754.

13. Zhou M, et al. (2013) Non-optimal codon usage affects expression, structure andfunction of clock protein FRQ. Nature 495(7439):111–115.

14. Zhou T, Weems M, Wilke CO (2009) Translationally optimal codons associate withstructurally sensitive sites in proteins. Mol Biol Evol 26(7):1571–1580.

15. Pechmann S, Frydman J (2013) Evolutionary conservation of codon optimality revealshidden signatures of cotranslational folding. Nat Struct Mol Biol 20(2):237–243.

16. Zhou M, Wang T, Fu J, Xiao G, Liu Y (2015) Nonoptimal codon usage influencesprotein structure in intrinsically disordered regions. Mol Microbiol 97(5):974–987.

17. Quax TE, Claassens NJ, Söll D, van der Oost J (2015) Codon bias as a means to fine-tunegene expression. Mol Cell 59(2):149–161.

18. Duret L, Mouchiroud D (1999) Expression pattern and, surprisingly, gene length shapecodon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA96(8):4482–4487.

19. Hiraoka Y, Kawamata K, Haraguchi T, Chikashige Y (2009) Codon usage bias is cor-related with gene expression levels in the fission yeast Schizosaccharomyces pombe.Genes Cells 14(4):499–509.

20. Kudla G, Murray AW, Tollervey D, Plotkin JB (2009) Coding-sequence determinants ofgene expression in Escherichia coli. Science 324(5924):255–258.

21. Pop C, et al. (2014) Causal signals between codon bias, mRNA structure, and the ef-ficiency of translation and elongation. Mol Syst Biol 10:770.

22. Tuller T, et al. (2010) An evolutionarily conserved mechanism for controlling the ef-ficiency of protein translation. Cell 141(2):344–354.

23. Presnyak V, et al. (2015) Codon optimality is a major determinant of mRNA stability.Cell 160(6):1111–1124.

24. Boël G, et al. (2016) Codon influence on protein expression in E. coli correlates withmRNA levels. Nature 529(7586):358–363.

25. Radford A, Parish JH (1997) The genome and genes of Neurospora crassa. FungalGenet Biol 21(3):258–266.

26. Gooch VD, et al. (2008) Fully codon-optimized luciferase uncovers novel temperaturecharacteristics of the Neurospora clock. Eukaryot Cell 7(1):28–37.

27. Morgan LW, Greene AV, Bell-Pedersen D (2003) Circadian and light-induced expres-sion of luciferase in Neurospora crassa. Fungal Genet Biol 38(3):327–332.

28. Ishihama Y, et al. (2005) Exponentially modified protein abundance index (emPAI) forestimation of absolute protein amount in proteomics by the number of sequencedpeptides per protein. Mol Cell Proteomics 4(9):1265–1272.

29. Bennetzen JL, Hall BD (1982) Codon selection in yeast. J Biol Chem 257(6):3026–3031.30. Kasuga T, Mannhaupt G, Glass NL (2009) Relationship between phylogenetic distri-

bution and genomic features in Neurospora crassa. PLoS One 4(4):e5286.31. Hurley JM, Chen CH, Loros JJ, Dunlap JC (2012) Light-inducible system for tunable

protein expression in Neurospora crassa. G3 (Bethesda) 2(10):1207–1212.32. Mishima Y, Tomari Y (2016) Codon usage and 3′ UTR length determine maternal

mRNA stability in zebrafish. Mol Cell 61(6):874–885.33. Bazzini AA, et al. (2016) Codon identity regulates mRNA stability and translation

efficiency during the maternal-to-zygotic transition. EMBO J, e201694699.34. Guo J, Cheng P, Yuan H, Liu Y (2009) The exosome regulates circadian gene expres-

sion in a posttranscriptional negative feedback loop. Cell 138(6):1236–1246.35. Doma MK, Parker R (2006) Endonucleolytic cleavage of eukaryotic mRNAs with stalls

in translation elongation. Nature 440(7083):561–564.36. Froehlich AC, Liu Y, Loros JJ, Dunlap JC (2002) White Collar-1, a circadian blue light

photoreceptor, binding to the frequency promoter. Science 297(5582):815–819.37. Cha J, Zhou M, Liu Y (2013) CATP is a critical component of the Neurospora circadian

clock by regulating the nucleosome occupancy rhythm at the frequency locus. EMBORep 14(10):923–930.

38. Komarnitsky P, Cho EJ, Buratowski S (2000) Different phosphorylated forms of RNApolymerase II and associated mRNA processing factors during transcription. GenesDev 14(19):2452–2460.

39. Hsin JP, Manley JL (2012) The RNA polymerase II CTD coordinates transcription andRNA processing. Genes Dev 26(19):2119–2137.

40. Zamft B, Bintu L, Ishibashi T, Bustamante C (2012) Nascent RNA structure modulatesthe transcriptional dynamics of RNA polymerases. Proc Natl Acad Sci USA 109(23):8948–8953.

41. Tyler BM, Giles NH (1985) Accurate transcription of cloned Neurospora RNA poly-merase II-dependent genes in vitro by homologous soluble extracts. Proc Natl Acad SciUSA 82(16):5450–5454.

42. Selker EU (1998) Trichostatin A causes selective loss of DNA methylation in Neuros-pora. Proc Natl Acad Sci USA 95(16):9430–9435.

43. Tamaru H, Selker EU (2001) A histone H3 methyltransferase controls DNA methylationin Neurospora crassa. Nature 414(6861):277–283.

44. Lewis ZA, et al. (2009) Relics of repeat-induced point mutation direct heterochro-matin formation in Neurospora crassa. Genome Res 19(3):427–437.

45. Lewis ZA, Adhvaryu KK, Honda S, Shiver AL, Selker EU (2010) Identification of DIM-7,a protein required to target the DIM-5 H3 methyltransferase to chromatin. Proc NatlAcad Sci USA 107(18):8310–8315.

46. Rountree MR, Selker EU (2010) DNA methylation and the formation of heterochro-matin in Neurospora crassa. Heredity (Edinb) 105(1):38–44.

47. Selker EU, Fritz DY, Singer MJ (1993) Dense nonsymmetrical DNA methylation re-sulting from repeat-induced point mutation in Neurospora. Science 262(5140):1724–1728.

48. Dang Y, Li L, Guo W, Xue Z, Liu Y (2013) Convergent transcription induces dynamicDNA methylation at disiRNA loci. PLoS Genet 9(9):e1003761.

49. Margolin BS, et al. (1998) A methylated Neurospora 5S rRNA pseudogene contains atransposable element inactivated by repeat-induced point mutation. Genetics 149(4):1787–1797.

50. Kudla G, Lipinski L, Caffin F, Helwak A, Zylicz M (2006) High guanine and cytosinecontent increases mRNA levels in mammalian cells. PLoS Biol 4(6):e180.

51. Krinner S, et al. (2014) CpG domains downstream of TSSs promote high levels of geneexpression. Nucleic Acids Res 42(6):3551–3564.

52. Newman ZR, Young JM, Ingolia NT, Barton GM (2016) Differences in codon bias andGC content contribute to the balanced expression of TLR7 and TLR9. Proc Natl AcadSci USA 113(10):E1362–E1371.

53. Stergachis AB, et al. (2013) Exonic transcription factor binding directs codon choiceand affects protein evolution. Science 342(6164):1367–1372.

54. Sullivan AM, et al. (2014) Mapping and dynamics of regulatory DNA and transcriptionfactor networks in A. thaliana. Cell Reports 8(6):2015–2030.

55. Cha J, Yuan H, Liu Y (2011) Regulation of the activity and cellular localization of thecircadian clock protein FRQ. J Biol Chem 286(13):11469–11478.

56. Huang G, et al. (2007) Protein kinase A and casein kinases mediate sequentialphosphorylation events in the circadian negative feedback loop. Genes Dev 21(24):3283–3295.

57. Xu H, et al. (2010) DCAF26, an adaptor protein of Cul4-based E3, is essential for DNAmethylation in Neurospora crassa. PLoS Genet 6(9):e1001132.

58. Bardiya N, Shiu PK (2007) Cyclosporin A-resistance based gene placement system forNeurospora crassa. Fungal Genet Biol 44(5):307–314.

59. Aronson BD, Johnson KA, Loros JJ, Dunlap JC (1994) Negative feedback defining acircadian clock: Autoregulation of the clock gene frequency. Science 263(5153):1578–1584.

60. Honda S, Selker EU (2009) Tools for fungal proteomics: Multifunctional neurosporavectors for gene replacement, protein expression and protein purification. Genetics182(1):11–23.

61. Freitag M, Hickey PC, Raju NB, Selker EU, Read ND (2004) GFP as a tool to analyze theorganization, dynamics and function of nuclei and microtubules in Neurospora crassa.Fungal Genet Biol 41(10):897–910.

62. Yang Q, Ye QA, Liu Y (2015) Mechanism of siRNA production from repetitive DNA.Genes Dev 29(5):526–537.

63. Bell-Pedersen D, Dunlap JC, Loros JJ (1996) Distinct cis-acting elements mediate clock,light, and developmental regulation of the Neurospora crassa eas (ccg-2) gene. MolCell Biol 16(2):513–521.

64. Garceau NY, Liu Y, Loros JJ, Dunlap JC (1997) Alternative initiation of translation andtime-specific phosphorylation yield multiple forms of the essential clock proteinFREQUENCY. Cell 89(3):469–476.

65. Cheng P, Yang Y, Heintzen C, Liu Y (2001) Coiled-coil domain-mediated FRQ-FRQinteraction is essential for its circadian clock function in Neurospora. EMBO J 20(1–2):101–108.

66. Zhou Z, Wang Y, Cai G, He Q (2012) Neurospora COP9 signalosome integrity playsmajor roles for hyphal growth, conidial development, and circadian function. PLoSGenet 8(5):e1002712.

67. Xue Z, et al. (2014) Transcriptional interference by antisense RNA is required forcircadian clock function. Nature 514(7524):650–653.

68. Trapnell C, et al. (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578.

69. Zhou Z, et al. (2013) Suppression of WC-independent frequency transcription by RCO-1 is essential for Neurospora circadian clock. Proc Natl Acad Sci USA 110(50):E4867–E4874.

Zhou et al. PNAS | Published online September 26, 2016 | E6125

GEN

ETICS

PNASPL

US

Dow

nloa

ded

by g

uest

on

Janu

ary

23, 2

020