temperature-dependent patterns of gene expression in ... · ii temperature-dependent patterns of...
TRANSCRIPT
Temperature-Dependent Patterns of Gene Expression in Caenorhabditis brigssae
by
Stephanie Mark
A thesis submitted in conformity with the requirements for the degree of Master of Science
Department of Ecology and Evolutionary Biology University of Toronto
© Copyright by Stephanie Mark 2017
ii
Temperature-Dependent Patterns of Gene Expression in
Caenorhabditis briggsae
Stephanie Mark
Master of Science
Department of Ecology and Evolutionary Biology University of Toronto
2017
Abstract
Discerning the genetic basis of adaptive phenotypes is a fundamental problem in biology that
remains an open question. Studies using high-throughput sequencing methods of gene expression
have contributed greatly to our understanding of how genotype becomes phenotype by treating
gene expression as an intermediary phenotype, especially under variable environmental
conditions. Using whole genome high-throughput RNAseq data, I characterized the responses of
Temperate and Tropical genotypes of Caenorhabditis briggsae to chronic temperature stress.
These genotypes show evidence of local adaptation, suggesting that differences in their
responses to temperature may underlie adaptive phenotypes. I discovered that a large proportion
of genes show genotype-specific changes in gene expression in response to temperature
(genotype-by-temperature interactions), and that most of these genotype-specific responses occur
under heat stress. These results suggest that responses to cold stress and heat stress are
qualitatively different, and identify sets of genes that suggest further study into temperature-
adapted phenotypes.
iii
Acknowledgments
I would like to acknowledge the efforts of Joerg Weiss and Julie Claycomb in designing the
experiment and collecting the data on which my thesis work is based. I would also like to thank
past and present members of the Cutter Lab, especially Jeremy Gray, Richard Jovelin, Rebecca
Schalkowski, Janice Ting, Joanna Bundus, and Gregory Stegeman, for their advice and
encouragement. I am also extremely grateful for the help and guidance of two bioinformaticians,
Wei Wang and Ting Liu, for answering my many questions and mentoring me as a new
bioinformatician. I would also like to thank my supervisory committee, Nicholas Provart and
Stephen Wright for their constructive criticism and Helen Rodd for her assistance and kindness.
Finally, I would like to thank my advisor, Asher Cutter, for keeping me on track, for always
making time for my questions, and for giving me the opportunity to learn and grow while doing
research in his lab.
iv
Table of Contents
Acknowledgments.......................................................................................................................... iii
Table of Contents ........................................................................................................................... iv
List of Tables ................................................................................................................................. vi
List of Figures ............................................................................................................................... vii
Introduction ......................................................................................................................................1
How do genotypes give rise to phenotypes? ...............................................................................1
Using local adaptation to untangle environmental effects on phenotype ...................................2
Caenorhabditis briggsae as a model organism for GxE .............................................................3
Gene expression as a proxy for phenotype .................................................................................5
Exploring temperature-dependent patterns of gene expression in C. briggsae ..........................6
Methods............................................................................................................................................7
Experimental design and data provenance ..................................................................................7
Processing of raw data ................................................................................................................7
Mapping reads to the genome .....................................................................................................8
Quantifying expression: Counting reads ...................................................................................10
Exploratory data analysis of count data ....................................................................................11
Analysis for genes with differential expression ........................................................................13
Co-expression clustering of genes with similar expression profiles .........................................13
Gene Ontology analysis ............................................................................................................15
Heat shock proteins ...................................................................................................................16
Chromosomal domain analysis .................................................................................................16
Results ............................................................................................................................................17
Characterizing differential gene expression in response to genotype & temperature ...............17
Statistical analysis with limma...........................................................................................17
v
Differential expression in response to extreme temperature..............................................17
Defining co-expression clusters ................................................................................................19
Distribution of genes in modules .......................................................................................19
Genes with differential expression in modules and module eigengene expression ...........20
Analysis of heat shock protein genes ........................................................................................26
Differential expression of hsp genes ..................................................................................26
Clustering and expression patterns of hsp genes ...............................................................27
Analysis of differential expression across chromosomal domains ...........................................28
Chromosomal domain enrichment of differentially expressed genes ................................28
Chromosomal domain enrichment of genes for Representative Modules .........................28
Discussion ......................................................................................................................................30
Genes with genotype-by-environment interactions in gene expression ....................................30
Gene expression responses to chronic cold versus chronic heat stress .....................................32
Chromosomal domains and differentially expressed genes ......................................................35
Small RNAs and temperature-sensitive regulation of gene expression ....................................38
Conclusion ................................................................................................................................39
References ......................................................................................................................................92
vi
List of Tables
1. Table 1. Number of raw and cleaned reads in fastq files from Genome Quebec.
2. Table 2. Number and percentage of reads that mapped to unique locations (i.e. one
location in the genome) with STAR.
3. Table 3. Values for each soft threshold power that was tested.
4. Table 4. Table of G-test p-values for to test whether the proportion of differentially
expressed genes in a module differed significantly from genome-wide proportions
5. Table 5. G-test p-values from a test to determine whether the proportion of genes located
in chromosome arms or centres differed from expected proportions for each differential
expression group
6. Table 6. G-test p-values from a test to determine whether the proportion of genes located
in autosome arms or centres differed from expected proportions for each of the 22
modules identified by co-expression clustering
7. Table 7. Table of G-test p-values for tests of whether the proportions of genes on
autosomes and the X-chromosome were significantly different from expectations for each
differential expression group
8. Table 8. Table of G-test p-values for tests of whether the proportions of genes on
autosomes and the X-chromosome were significantly different from expectations for each
co-expression module
vii
List of Figures
Figure 1. Number of reads from fastq files for each data file……………………………………50
Figure 2. Distribution of intron lengths in C. briggsae reference genome………………………51
Figure 3. Ratio of average number of uniquely mapped reads in Tropical and Temperate Genotypes……………………………………………………………………………………..…52
Figure 4. Percentage of uniquely mapped reads with STAR per biological replicate……….......54
Figure 5. Reads counted by htseq-count per replicate…………………………………………...55
Figure 6. Multi-dimensional scaling plot (MDS) of filtered, normalized, and log transformed count data……………………………………………………………………………………...…56
Figure 7. Distributions of p-values from preliminary tests………………………………………57
Figure 8. Analysis for differential expression with edgeR versus limma………………………..58
Figure 9. Quantile-quantile plot for normalized, voom-transformed count data………………...59
Figure 10. Dendrogram and heatmap of normalized count data…………………………………60
Figure 11. Fit of scale-free topology generated by soft-thresholding powers…………………...61
Figure 12. Mean connectivity of soft-thresholding powers……………………………………...62
Figure 13. Dendrogram of initial 124 modules from co-expression clustering………………….64
Figure 14. Similarity heatmap of initial 124 modules from co-expression clustering…………..65
Figure 15. Dendrogram of merged co-expression clusters………………………………………66
Figure 16. Heatmap of merged co-expression clusters…………………………………………..67
Figure 17. Numbers of differentially expressed genes…………………………………………..68
Figure 18. Proportions of genes expressed under chronic cold versus heat stress………………69
Figure 19. Proportions of genes that increase or decrease expression under chronic cold versus heat stress………………………………………………………………………………………...70
Figure 20. Magnitude of change in expression under chronic cold versus heat stress…………..71
Figure 21. Distribution of differential expression groups within co-expression modules……….72
viii
Figure 22. Proportions of differentially expressed genes within co-expression modules….....…73
Figure 23. Module eigengene expression plots for Genotype modules………………………….74
Figure 24. Module eigengene expression plots for Temperature modules………………………75
Figure 25. Module eigengene expression plots for GxT modules……………………………….76
Figure 26. Module eigengene expression plots for genes with no differential expression………77
Figure 27. Proportion of differentially expressed genes in arm versus centre domains of autosomes………………………………………………………………………………………...81
Figure 28. Proportion of differentially expressed genes in arm versus centre domains of the X-chromosome……………………………………………………………………………………...82
Figure 29. Proportion of module genes in arm versus centre domains of autosomes…………...83
Figure 30. Proportion of module genes in arm versus centre domains of X-chromosome..…….84
Figure 31. Proportion of genes on autosomes and the X-chromsome for each differential expression group…………………………………………………………………………………87
Figure 32. Proportion of genes on autosomes and the X-chromsome for each co-expression module............................................................................................................................................88
Supplementary Figure 1.................................................................................................................89
Supplementary Figure 2.................................................................................................................90
Supplementary Figure 3.................................................................................................................91
Supplementary Figure 4.................................................................................................................92
Supplementary Figure 5.................................................................................................................93
Supplementary Figure 6.................................................................................................................94
Supplementary Figure 7.................................................................................................................95
Supplementary Figure 8.................................................................................................................96
Supplementary Figure 9.................................................................................................................97
Supplementary Figure 10...............................................................................................................98
Supplementary Figure 11...............................................................................................................99
1
Introduction
How do genotypes give rise to phenotypes?
Understanding the relationship between genotype and phenotype remains a fundamental
problem in biology. While much technological progress has been made in revealing the
sequences of genomes (Pruitt et al. 2014), the details of how DNA sequences give rise to
the different forms and functions of life largely remains an open question. This question
is of particular interest in evolutionary biology, which seeks to explain how adaptive
phenotypes arise from the information encoded in genomes as they get shaped by natural
selection. Many basic questions regarding the underlying genetic architecture of adaptive
phenotypes are the subject of active research (Grillo et al. 2013, Lasky et al. 2014, Chen
et al. 2015). For example, do mutations in certain functional regions of genes contribute
to adaptive phenotypes more than others? Even more basic questions include: What
proportion of the genome is involved in producing adaptive phenotypes? Are some
locations in the genome more likely to be involved in adaptive phenotypes than others?
Early research pointed to surprising patterns about the connection between genotype and
phenotype (Jacob and Monod 1961, Britten and Davidson 1969). Differences in
phenotype were initially assumed to result from proportional differences in genotype.
Contrary to the initial intuition that different species would have very divergent protein
coding sequences, there is a high degree of similarity between orthologous coding
sequences between species that differ in morphology, behaviour, and physiology (King
and Wilson 1975). This discovery led to the hypothesis that many differences between
species originate not from differences within genes themselves but rather arise from
differences in the timing, location, and levels to which genes are expressed (Britten and
Davidson 1971). Mutations in cis-regulatory controls have since been linked to
phenotypic variation in a wide variety of taxa, providing support for this gene regulation
model (Wray 2007). Proponents of this model have also argued that adaptation is most
likely to occur through mutations in cis-regulatory regions in particular (Wray et al. 2003,
Carroll 2005). Cis-regulatory regions are defined as the regions adjacent to a gene that
directly control its transcription, such as promoter sequences (Wray 2007). This argument
2
for adaptation through cis-regulatory regions is primarily based on the assumption that
mutations in these regions should result in fewer pleiotropic effects (Carroll 2005, Wray
2007). For example, changes to protein-coding sequences could influence the function of
all proteins in the same network by altering the way they interact. Conversely, by
changing only spatial or temporal patterns of gene expression, cis-regulatory mutations
could alter phenotypes while avoiding potentially deleterious changes to interactions
between proteins (Wray 2007). However, given that there are other avenues for avoiding
pleiotropic effects such as gene duplication and alternative splicing, the cis-regulatory
model of adaptive evolution by itself likely only provides a partial explanation (Hoekstra
and Coyne 2007). Nevertheless, changes to cis-regulatory regions have been linked to
several forms of phenotypic variation, particularly in Drosophila (Massey and Wittkopp
2016). For example, 40 different loci have been linked to pigmentation variation to date
both within and between species and the vast majority of mutations that cause these
phenotypic differences occur in cis-regulatory regions. Furthermore, cis-regulatory
mutations seem to be more common in phenotype divergence between species than as
polymorphisms within species, which strongly suggests that the regulation of gene
expression plays a key role in adaptation and speciation (Coolon et al. 2014, Massey and
Wittkopp 2016).
Using local adaptation to untangle environmental effects on phenotype
The relationship between genotype and phenotype can be modulated further by the
environment (Gibson 2008). When subject to different environmental conditions, one
genotype can produce multiple phenotypes, resulting in phenotypic plasticity. This
capacity to generate multiple phenotypes can allow an organism to successfully navigate
the challenges of variable environments when that plasticity is adaptive (Pigliucci et al.
2006). For example, heat shock proteins (hsp) are upregulated in response to diverse
forms of stress, and to heat stress in particular (Lindquist 1986). Heat stress causes the
misfolding of outer membrane proteins, which act as a signal to initiate the cascade that
3
results in the production of more heat shock proteins (Walsh et al. 2003). Because heat
shock proteins can shield other proteins from the effects of heat stress, the regulation of
hsp gene expression can also influence functions that are related to fitness, such as
spermatogenesis (Sarge et al. 1994). The path from genotype to phenotype can therefore
be the result of the coordinated activity of many genes in many pathways simultaneously
responding to changing environmental conditions. To more fully understand how
genotypes generate phenotypes it is necessary to untangle the consequences of
environmental changes for different genes.
One way to isolate the genes involved in adaptive responses to the environment is to
study populations that have adapted to different local conditions (local adaptation). In
local adaptation, populations of the same species undergo divergent natural selection due
to differences in local environmental conditions (Kawecki and Ebert 2004). Responses to
divergent selection manifest as the “local” population having higher fitness than the
“foreign” population when tested in each other’s habitats. At the same time, these
populations are otherwise very similar because of gene flow and recent common
ancestry. Therefore, the differences between them, particularly with respect to
ecologically relevant traits, are especially likely to reflect selection for local conditions in
their respective habitats (Des Marais et al. 2013). By examining traits that show GxE in
locally adapted populations, we can identify the subset of genes that are potential
candidates for contributing to adaptive phenotypes (Thomas 2010).
Caenorhabditis briggsae as a model organism for GxE
Local adaptation and GxE are often studied in organisms whose range spans across
latitudes as habitats at different latitudes provide many opportunities for divergent natural
selection (Hurme et al. 1997, Stinchcombe et al. 2004, Gilchrist and Huey 2004,
Fournier-Level et al. 2011). One such organism is Caenorhabditis briggsae, a nematode
worm with a worldwide distribution and a close relative of the model organism
Caenorhabditis elegans. Like C. elegans, C. briggsae has a mature and highly contiguous
reference genome sequence, gene annotations, and genetic map (Stein et al. 2003, Hillier
4
et al. 2007, Ross et al. 2011). While temperature affects both C. elegans and C. briggsae
during development (Moss et al. 1997, Matsuba et al. 2013), only C. briggsae shows
evidence of local adaptation. Populations of C. briggsae show striking patterns of genetic
differentiation that mirror its geographic distribution (Cutter et al. 2006, Thomas et al.
2015). Populations from temperate latitudes, including Japan, Europe, and USA, are more
genetically similar to each other than populations from tropical latitudes including Africa,
Asia, and South America. C. briggsae isolates from these regions form two distinct
“Temperate” and “Tropical” clades in neighbour networks that make up the vast majority
of C. briggsae strains collected from nature (Cutter et al. 2006, Felix et al. 2013, Cutter
2015). This pattern suggests that ecological correlates of latitude could play a role in
shaping genetic divergence between Temperate and Tropical populations.
There is also strong experimental evidence for local adaptation in C. briggsae. When C.
briggsae strains of Temperate and Tropical populations are reared in the lab at a range of
temperatures, Temperate strains have higher lifetime fecundity at 14°C while Tropical
strains have higher lifetime fecundity at 30°C (Prasad et al. 2011). This pattern of strain-
specific fecundity responses to different temperatures demonstrates a genotype-by-
environment interaction that is consistent with local adaptation. This interaction also
points to temperature as an ecologically relevant factor that could be responsible for
divergent natural selection in this species. Furthermore, sperm number in C. briggsae is
reduced in response to heat stress whereas sperm number is not affected in C. elegans, a
species that does not show evidence of local adaptation (Poullet et al 2015). Strains of C.
briggsae from different latitudes also show greater within than between species
differences in thermotaxis (Stegeman et al. 2013). Finally, 20% of F2 offspring obtained
from crossing Temperate strains and the AF16 Tropical strain of C. briggsae show a
delay in development due to dysgenic interactions between maternal and zygotic loci,
showing the beginnings of reproductive isolation (Baird and Stonesifer 2012). Taken
together, this support for local adaptation combined with a high-quality genome assembly
that has been experimentally verified to the chromosome level makes C. briggsae an
excellent organism in which to study adaptive gene expression differences.
5
Gene expression as a proxy for phenotype
Many traits have plastic responses to changes in the environment and could therefore be
used to study GxE. However, using genome-wide gene expression to quantify responses
to the environment has several advantages. Firstly, the initial step in the process from
genotype to phenotype is transcription of a gene into mRNA. It is therefore
straightforward to infer which genes have a response to the environment by matching
mRNA sequences to the sequences of their genes of origin (Wang et al. 2009).
Furthermore, genome-wide measures of gene expression can identify genes that may
contribute to GxE without relying on prior expectations of gene function because
transcription of all genes can be captured at once (Marioni et al. 2008). The cost
effectiveness of high-throughput sequencing has also made RNA-seq studies a practical
way to obtain a vast amount of unbiased data to assess gene expression differences
among experimental treatments for the entire genome (Wang et al. 2009, Des Marais et
al. 2013).
Microarray and high-throughput sequencing techniques have contributed several
important insights about variation in plasticity and mechanisms of gene regulation by
measuring gene expression responses to the environment and to temperature in particular.
For example, by characterizing the plasticity of expression in response to temperature
change and linking expression patterns to regulatory architecture in D. melanogaster,
transcription factors and microRNAs were found to have opposing effects on gene
expression (Chen et al. 2015). Other investigations into expression plasticity across
latitudinal gradients in D. melanogaster revealed a “directionality” to plasticity such that
more genes were downregulated when reared in foreign temperature conditions than local
(Levine et al. 2010). Whole genome expression data were also used to quantify variance
in expression plasticity in response to environmental stress in A. thaliana, illustrating that
such variance has a key role in the genetic basis of local adaptation (Lasky et al. 2014).
6
Exploring temperature-dependent patterns of gene expression in C. briggsae
In order to characterize how organisms’ transcriptomes respond to temperature stress in
two locally adapted populations of C. briggsae, we performed a genome-wide survey of
differences in and patterns of gene expression. We analyse representative genotypes from
each of the Temperate and Tropical phylogeographic groups: one Temperate strain from
Okayama, Japan (HK104) and one Tropical strain from Ahmedabad, India (AF16). These
strains also provide the parental progenitors of a collection of recombinant inbred lines
(RILs) used in genetic mapping (Ross et al. 2011), making their gene expression
difference a valuable resource for future research in this study system. We reared
replicated pools of isogenic individuals from both Temperate and Tropical strains at 14,
20, and 30°C as treatments for differential expression analysis. These temperatures
encompass the range of temperatures for which strain-specific responses were observed
(Prasad et al. 2011). We then collected, sequenced and analysed mRNA of young adult
hermaphrodite animals to determine the effects that genotype, temperature, and the
interaction between genotype and temperature (“GxT”) had on the expression of all genes
as assayed by RNA-seq. We also performed co-expression clustering on gene expression
to reveal expression patterns across temperatures and to gain insight into the specific
effects that temperature exerts on different groups of genes. Finally, we examined the
genomic location of genes whose expression was significantly affected by genotype,
temperature, or GxT, as well as groups of genes with similar expression patterns.
7
Methods
Experimental design and data provenance
The raw data that were analysed in this project orginated from an experiment that was
designed by a former student in the Cutter Lab, Joerg Weiss. Joerg Weiss also isolated
the RNA and collected the data in collaboration with Julie Claycomb in the Department
of Molecular Genetics at the University of Toronto. Briefly, the experimental design that
they implemented to isolate and sequence mRNA is as follows.
In order to determine the genome-wide effect of different temperatures and different
genotypes on gene expression changes, hermaphrodites from two isogenic strains (AF16
= “Tropical” strain, HK104 = “Temperate” strain) were raised at 14°C, 20°C, and 30°C
from egg to adult. Prior to this experiment, previous generations of both genotypes had
been raised at 20°C. Individuals were synchronized to be at the same stage of
development using the standard C. elegans sodium hypochlorite (“bleaching”) protocol.
After reaching young adulthood, total RNA was isolated from each strain at each
temperature by mashing worms into a slurry, vortexing the slurry, adding 1-Bromo-3
Chloropropane, separating organic and aqueous phases, and then precipitating RNA with
a glycogen and isopropanol solution overnight. Each of the treatments had three
biological replicates, yielding a total of 18 samples. The mRNA was then separated from
rRNA and small RNA fractions.
The mRNA was then sequenced as 100 base-pair, single end reads using Illumina HiSeq
2000 at the Genome Quebec facility in Montreal, Quebec. mRNA from each of the 18
samples was sequenced across 2 lanes to control for lane effects (Fang and Cui 2011).
Processing of raw data
The number of reads per sample per lane in raw FASTQ files generated by Genome
Quebec ranged from 17.2 million to 36.7 million and had an average of 25.7 million
8
reads across all raw data files. The number of raw reads was similar across strains,
temperatures, biological replicates, and lanes, indicating roughly equal coverage across
all variables and that biases in sequencing depth were minimal (Table 1).
Cleaning and trimming of raw FASTQ files must be done in order to remove potential
artifacts of the sequencing process, such as adapter sequences. To identify and remove
the Illumina TruSeq 3 single-end adapters that are used in Illumina HiSeq 2000, I used
Trimmomatic 0.36 (Bolger et al. 2014). To ensure that any adapter sequences that
differed slightly due to technical sequencing variability could be identified, I chose a
seed-mismatch rate of 2 and a simple clip threshold of 10. Reads shorter than 60 base-
pairs were also discarded, and bases were trimmed from 5’ and 3’ ends if they had
phred33 scores lower than 3. After cleaning and trimming, over 90% of reads were kept
from each FASTQ file and the average number of reads in each of the 36 cleaned files
was 25.4 million reads (Figure 1).
Once these raw FASTQ files had been cleaned, I merged files that corresponded to the
same biological replicate but that had been sequenced in different lanes by concatenating
them into a single FASTQ file. This single file therefore represented the complete mRNA
data for one biological replicate (e.g. the first replicate of AF16 reared at 14°C).
Following this concatenation of files, the average number of reads in a biological
replicate (or “sample”) was 49.7 million reads and ranged from 33.3 million reads to 70.7
million reads. All subsequent analyses were performed on these 18 cleaned and merged
sample files.
Mapping reads to the genome
In order to identify which genes had been expressed as mRNA in each sample, read
sequences were aligned (or “mapped”) to the latest C. briggsae genome available on
wormbase.org (WS253), which is based on the AF16 (Tropical genotype) genome.
Mapping is an essential step in gene expression analysis as each mRNA transcript is cut
into smaller “reads” to increase the speed and depth with which genomic data can be
9
sequenced. The short length of the reads and the absence of introns in mRNA present
complications in identifying the locations of origin of reads in the genome, with many
bioinformatics mapping tools using different algorithms to address these challenges. I
evaluated several mapping approaches and settled on STAR as the best match of speed
and quality.
Because each read originated from a single location in the genome, one measure of
mapping efficacy is the proportion of reads in a sample for which a unique location in the
genome has been identified as the origin. I chose the software STAR (Dobin et al. 2013)
for mapping because of its ability to reliably identify unique locations of origin for a
large proportion of reads. As the STAR alignment algorithm is based on using the longest
matching sequence as an anchor from which splicing is inferred, it is important to specify
the number of mismatches that should be allowed between the read and the genome, as
well as the maximum intron size. To account for read sequence variability due to 1)
technical error in library preparation and sequencing and 2) intrinsic differences between
the Tropical and Temperate genotypes, a mismatch rate of 10 was selected. Although a
mismatch rate of 10 can be considered to be lax, I chose this parameter to minimize the
potential bias towards the Tropical genotype that could overestimate the effect of
differential expression due to genotype (Figure 3). The maximum intron size was set at
5000 base pairs which includes 99.6% of all intron annotations in the C. briggsae
reference genome (Figure 2).
Over 90% of the 894 million total reads mapped to unique locations in all samples except
for one, and the average number of reads per sample that mapped to unique locations was
45.9 million reads (Table 2). The third replicate of HK104 at 30°C was the only sample
with a relatively low proportion of reads that mapped uniquely to the genome (73.86%)
(Figure 4). However, because this sample nevertheless contained a substantial number of
reads (35.6 million), comparable to the number of reads in other samples, I retained this
sample in downstream analyses.
10
Quantifying expression: Counting reads
After locating the origin of each read in the genome, I counted the number of reads that
correspond to each gene to obtain a quantitative measure of expression for each gene.
Gene locations on each chromosome are defined by GFF/GTF (General Feature
Format/General Transfer Format) annotation files. As the publicly available annotation
file for C. briggsae from wormbase.org (version WS253) contains annotations for many
different types of genomic features that are not relevant to this analysis including
deprecated historical gene locations and BLAST matches, a custom annotation file was
parsed from it that consisted only of exons that were annotated on wormbase.org. The
number of reads that mapped to each exon was counted with htseq-count (Anders et al.
2014) and summed over all exons in a gene to give a measure of expression for each gene
in each sample.
Ambiguity in counting reads can arise when genes have alternative splicing isoforms that
consist of some of the same exons and when there are multiple genes that overlap in the
same location. Because it often is impossible to determine the isoform of origin of
individual reads, I neglected alternative splicing isoforms for this analysis such that
separate isoforms were treated as the same gene. To resolve the ambiguity of overlapping
genes, the “mode” parameter in htseq-count was set to “intersection-nonempty”. For
example, if Gene A and Gene B overlap in the genome annotation, this parameter would
count a read as coming from Gene A if the majority of the length of the read was located
Within Gene A as opposed to Gene B. However, if a read fell equally within Gene A and
Gene B, it was deemed ambiguous and not counted at all.
There were 23 267 genes in the genome annotation used for this analysis. Among all the
reads whose origin in the genome was successfully identified (mapped reads), 82-85% of
them were assigned successfully to a particular gene in all samples but one. Again, the
third replicate of HK104 at 30°C fared the worst in this analysis, with only 58% of reads
assigned to genes (Figure 5b). However, I continued to retain this replicate for
downstream analyses because the absolute number of reads assigned to genes (24.0
million) was comparable to other samples (Figure 5a). Among the reads that were not
assigned to genes, most of them (9% on average) could not be associated with any exon
11
or were counted in multiple locations (8% on average) and less than 0.1% were
ambiguous.
Exploratory data analysis of count data
Once I had obtained gene expression counts for genes in each sample, I performed
preliminary data analyses to better understand the overall quality of the data and to
evaluate the chances of being able to detect differential gene expression among samples.
To gauge the degree of similarity between samples and the consistency between
biological replicates in particular, I visualized the gene expression counts in each sample
in a Multi-Dimensional Scaling (MDS) plot. If samples are highly similar, they will have
similar values on both x- and y-axes and therefore cluster together in an MDS plot
(Nikolayeva and Robinson 2014). Most biological replicates showed strong similarity to
each other in an MDS plot (Figure 6). However, the biological replicates for 3
experimental combinations of genotype and temperature showed more variability among
replicates than the other treatments (AF16 at 14°C, HK104 at 20°C, and HK104 at 30°C).
In each of these three cases, one replicate appeared to be set apart while the other two
replicates clustered together well. Nevertheless, these replicates still grouped well with
other samples from the same genotype and so were retained in downstream analyses.
When looking for differential expression across a large number of genes, it is helpful to
perform preliminary tests to determine whether there is a strong enough signal in the data
set to detect significant effects of experimental variables of interest. If there is a strong
signal in the data set, the distribution of the p-values of these preliminary tests will be
heavily right-tailed and have a large peak for values very close to zero (Hung et al. 1997).
Distributions of p-values from preliminary t-tests for the effect of genotype and F-tests
for the effect of temperature on the count data both showed strong peaks near zero and
therefore confirmed that many genes within the data set would be identified as being
significantly differentially expressed (Figures 7a & 7b).
12
Gene expression quantified as counts of reads per gene do not follow a normal
distribution. Therefore, in order to perform statistical analyses for differential expression
of genes, it is necessary to either use statistical methods that apply a distribution that is
suitable for count data (e.g. negative binomial distribution) or to transform the count data
so that they closely approximate a normal distribution. While there are many statistical
packages for both approaches, I chose to transform the expression counts for this analysis
because one package in particular, limma and voom (Smyth 2005, Law et al. 2014) has
been shown to be better at controlling Type I error and detecting more true positives than
many popular software packages based on count distributions (Law et al. 2014). Limma
was also more conservative with my data than edgeR (Robinson et al. 2010) (Figure 8).
Because the power of limma depends on the data being normally distributed, it was
important to ensure that the count data were transformed such that they approximated a
normal distribution as closely as possible. Histograms and quantile-quantile plots (or “Q-
Q plots”) are effective ways to visually inspect the normality of data. Histograms of
normally distributed data should approximate a bell shape while data points in Q-Q plots
should fall on the 1-to-1 line. A common transformation to approximate normality with
count data is the log transformation. The histogram of log2 transformation of gene counts
began to approximate a bell shape but revealed a large number of genes with very few to
no counts. To remove genes with extremely low levels of expression I first changed gene
counts to counts per million based on library size using the “cpm” function in edgeR
(Robinson et al. 2010). Only genes with at least 1 cpm in 3 or more libraries (i.e. in one
biological replicate) were kept. There were 7068 genes that did not have at least 1 count-
per-million across at least 3 samples and were excluded from further analysis. Once
genes with low to no expression were removed from the data set, I applied the voom
transformation from the limma package to the remaining set of 16 199 genes. The Q-Q
plot showed that the data closely approximated a normal distribution after filtering and
voom-transformation (Figure 9).
13
Analysis for genes with differential expression
I tested the voom-transformed data for the remaining 16 199 genes for differential
expression using the package limma. Expression for each gene was fit to a linear model
(expression ~ strain + temperature + interaction) and tested. The intercept was set as
expression for the Tropical strain at 20°C. p-values were adjusted for multiple testing
using the Benjamini-Hochberg correction and the significance level was set at FDR=0.05
(Benjamini and Hochberg 1995).
In genes with a significant effect of temperature (either simple or with an interaction),
differential expression can be driven by exposure to cold, exposure to heat, or exposure to
both cold and heat. In order to distinguish which genes were responding to which
temperature conditions, I performed post-hoc tests on the individual temperature
coefficients for genes with a significant effect of temperature. Again, the Benjamini-
Hochberg correction was applied to p-values to maintain the false discovery rate (FDR)
at 0.05.
I then divided genes into categories based on whether they showed significant differential
expression due to genotype only (“genotype genes”), temperature only (“temperature
genes”), genotype and temperature independently (“G&T genes”), an interaction between
genotype and temperature (“GxT genes”), or no differential expression.
Co-expression clustering of genes with similar expression profiles
To illustrate the ways in which expression was changing in response to temperature in the
Temperate versus the Tropical genotype, I performed a co-expression clustering analysis
of gene expression using the Weighted Gene Correlation Network Analysis (WGCNA)
package (Langfelder and Horvath 2008). Co-expression clustering analysis identifies
genes with similar expression patterns across genotypes and temperatures to group them
together. This approach captures genes that change expression in similar ways even if
they are not identified as having individually statistically significant differential
expression. I therefore used co-expression clustering as a complementary approach to
differential expression testing to characterize gene activity in response to temperature.
14
Because WGCNA works best with normally distributed expression values, I again used
the same voom-transformed expression values for 16 199 filtered genes for this analysis
as for the differential expression analysis. Before performing co-expression analysis, I
determined whether the data were homogeneous or whether the data had a tendency to
cluster non-randomly (e.g. due to batch effects) by performing hierarchical clustering on
the samples. This procedure rejected batch effects but did identify genotype and
temperature as likely sources of heterogeneity in the data (Figure 10). Because the
experiment was designed to test these factors, they represent legitimate sources of
heterogeneity and contribute to biologically interesting expression similarities among
genes.
Because WGCNA defines cluster membership on a graded scale rather than an absolute
one, it is necessary to set a “soft-threshold” to define similarity. The soft threshold is
defined by the “soft-thresholding power”, the exponent that, when similarity values are
raised by it, results in the closest approximation to a scale-free network topology, the
topology that is assumed by WGCNA. To determine the best soft-thresholding power for
our data, I computed the fits (R2 correlation with a scale-free network topology) of a
range of values from 1 to 10, then even numbers from 12 to 42 and determined which
value resulted in the best fit (Table 3). The authors of WGCNA recommend using soft-
thresholding powers that result in fits whose R2 correlation with the theoretical scale-free
topology with the same power is at least 0.8 (Zhang and Horvath 2005). The best fit for
our data corresponded to a soft-thresholding power of 30 (R2 = 0.75) (Figure 11). This
lower than the recommended R2 value is likely the result of the biologically legitimate
heterogenity of our data. The soft-thresholding power of 30 also yielded an acceptable
level of mean connectivity (k = 115), which is again central to the assumptions of the
WGCNA model (Figure 12). I therefore set the soft-thresholding power to 30.
Initial clustering of genes using WGCNA resulted in 124 clusters or “modules” in which
all member genes showed similar patterns of expression (Figures 13 & 14). To
consolidate the number of modules further, I merged similar modules, defined as those
with a correlation of 0.75 or higher with each other. This procedure produced 23 modules
sorted by size (e.g. Module 1 contains the largest number of genes) (Figures 15 & 16).
15
One module was comprised of genes whose expression patterns were not similar to the
expression pattern of any other module and could not be grouped into a co-expression
cluster (Module 0, with 37 genes).
WGCNA generates a composite representative of the expression patterns of all the genes
in the module and is defined as the first principal component of each module called the
“module eigengene.” Overall patterns of gene expression for all genes in the same
module can be well-characterized by the expression pattern of the module eigengene
(Langfelder and Horvath, 2007). To visualize the expression profiles of each module, I
plotted the module eigengene expression values averaged across the three biological
replicates as a function of rearing temperature for each genotype separately.
Gene Ontology analysis
To generate hypotheses about the potential functions of modules comprised of genes that
showed similar expression patterns, I performed statistical overrepresentation tests of
Gene Ontology terms associated with gene lists of each module using PANTHER (Mi et
al. 2009). I tested the list of genes in each module against all four PANTHER lists
available for C. briggsae: PANTHER Pathways, PANTHER GO-slim Molecular
Function, PANTHER GO-slim Biological Process, and PANTHER GO-slim Cellular
Components. P-values were adjusted for multiple testing with the Bonferroni correction
and the significance level was set at p = 0.05.
After I obtained lists of significantly overrepresented GO terms for each module, I used
REVIGO (Supek et al. 2011) to simplify GO terms by combining terms with high
semantic similarity. I also used output from REVIGO to visualize the fold change of the
significant terms into tree maps such that terms that had a higher fold increase in
overrepresentation had a greater area in the tree map.
16
Heat shock proteins
Because I am investigating the effects of temperature on gene expression, I am especially
interested in understanding whether the heat shock protein genes were differentially
expressed in the experiment and what were their patterns of expression. I identified 27
heat shock protein genes in the C. briggsae genome by querying “hsp” on wormbase.org
and determined whether any of these showed significant differential expression in my
analysis with limma. I then assessed whether any of the significantly differentially
expressed heat shock protein genes were clustered into modules that had over 50%
differentially expressed genes or at least 1000 genes.
After genes with very low expression had been removed from the initial set of 23 267
genes, 24 heat shock protein genes remained for analysis.
Chromosomal domain analysis
Distinct recombination domains of C. briggsae chromosomes separate the tips, arms, and
centres of chromosomes (Ross et al. 2011). The centres of chromosomes have lower rates
of recombination and lower nucleotide polymorphism than the arms (Thomas et al.
2015). I therefore tested whether genes that showed differential expression occurred
preferrentially in a particular chromosomal domain. I counted the numbers of each kind
of differentially expressed gene (genotype genes, temperature genes, genotype and
temperature genes, GxT genes, non-differentially expressed genes, modules) in each
chromosomal domain and compared them to the number of genes found in centres and
arms for autosomes and the X chromosome separately because chromosomal
recombination domains are less distinct on the X chromosome. I performed G-tests to
determine whether the proportions of genes in arms versus centres in each group were
statistically different from genome-wide proportions.
17
Results
Characterizing differential gene expression in response to genotype & temperature
Statistical analysis with limma
Over half (54%) of all 16 199 genes that I tested for differential expression with limma
differed significantly in expression across genotypes, temperatures, or both. The majority
of genes that were differentially expressed had a significant interaction between genotype
and temperature (“GxT genes”), comprising 56% of the 8795 differentially expressed
genes. There were 1.3 times more genes with a significant interaction (n=4919) than
genes that were differentially expressed but that did not show an interaction (n=3876)
(Figure 17).
Among genes with differential expression but no interaction, genes with a significant
effect of temperature only (“temperature genes”, n=1987) outnumbered by 2.5-fold those
genes with a significant effect of genotype only (“genotype genes”, n=770). Finally, a
moderate number of genes had significant differential expression that resulted from the
independent effects of genotype and temperature (“G & T genes”, n= 1119). In other
words, 1987 + 1119 = 3106 genes overall showed a significant response to temperature
and 770 + 1119 = 1889 genes showed a significant response to genotype and none of
these genes showed a significant interaction effect (Figure 17).
Differential expression in response to extreme temperature
Among genes for whom a change in temperature caused differential expression,
responses to rearing under cold stress (14°C) differed from responses to rearing under
heat stress (30°C) relative to benign rearing conditions (20°C) in terms of the number of
genes involved, whether genes increased or decreased expression, and in terms of the
magnitude of expression change. For example, for the majority of genes with an effect of
temperature but no interaction (temperature genes and G & T genes), expression changed
significantly in response to rearing under cold stress (2308 out of 3106 genes, 74%)
18
whereas for GxT genes, the largest number of genes responded significantly to rearing
under heat stress only (2393 out of 4919 genes, 49%) (Figure 18).
However, when considering whether genes increased or decreased expression in response
to chronic cold stress or heat stress, genes with an effect of temperature and no
interaction were represented in similar proportions to GxT genes. There was a bias
towards genes that decreased expression as opposed to increased in response to cold
rearing, both for genes with an effect of temperature and GxT genes (Figure 19: 1.05
times more temperature genes decreased, 1.3 times more GxT genes decreased).
Similarly, in response to rearing under heat stress, there was a bias towards genes that
increased expression as opposed to decreased it for both genes with an effect of
temperature and for GxT genes (Figure 19: 6.8 times more temperature genes increased,
1.2 times more GxT genes increased). This suggests that, among all genes that respond
significantly to temperature in some way, more genes reduced their expression at cool
temperatures and more genes elevated their expression at hot temperatures.
In contrast, the magnitude of expression changes in response to rearing under cold stress
versus heat stress differed for those genes that showed GxT versus those that exhibited an
effect of temperature but no interaction (Figure 20). GxT genes showed no consistency in
the median fold change in expression between genes that were differentially expressed
under cold stress versus heat stress; the magnitude of expression change was different at
both temperature extremes. Specifically, for GxT genes that increased expression,
chronic heat stress caused a larger magnitude change in expression than chronic cold
stress. Conversely, for GxT genes that decreased expression, chronic cold stress caused a
larger magnitude change in expression than chronic heat stress. The median fold change
in expression of GxT genes that increased expression under heat stress was 8.57 while the
median fold change in expression of GxT genes that increased in under cold stress was
3.73. There was a similarly large discrepancy between magnitudes of expression change
under stress at both temperature extremes for GxT genes that decreased expression. For
GxT genes that decreased expression, the median fold change in response to cooling was
9.19 while the median fold change in response to heating was 6.50.
19
By contrast, the magnitude of expression changes in response to rearing under cold stress
versus heat stress differed for GxT genes versus those genes that showed an effect of
temperature but no interaction. For genes with an effect of temperature but no interaction,
the increased or decreased expression was similar under chronic exposure to both
temperature extremes. Specifically, gene expression levels showed a similar fold change
regardless of the direction of the temperature change (Figure 20). For instance, the
median fold change of genes that increased expression under cold stress was 6.06 while
the median fold change of genes that increased expression under heat stress was 6.50 for
genes with an effect of temperature but no interaction. Similarly, among these same
genes, the median fold change of genes that decreased expression in response to rearing
under cold stress was 3.03 while the median fold change of genes that decreased
expression under heat stress was 3.73.
Altogether, these results suggest that temperature shifts cause a wider range of expression
responses in terms of the magnitude of expression change for genes with a significant
interaction between genotype and temperature than for genes that show a simple
independent effect of temperature.
Defining co-expression clusters
Distribution of genes in modules
In order to characterize patterns of gene expression change in response to temperature in
both Temperate and Tropical strains, I clustered genes into “modules” based on similarity
of expression. Each gene was placed into one of 23 modules that were defined by co-
expression clustering.
Genes were not equally distributed among the 23 modules produced by WGCNA (Figure
21). The first six modules contained 75% of all 16 199 genes included in this analysis,
with each containing 1000 or more genes. Specifically, Modules 1 and 2 contributed 20%
and 18% of all genes, respectively, while Modules 3 to 6 had 8-10% each. Only three
modules contained fewer than 100 genes (Modules 21, 22, and 0). Module 0 represented
20
those 37 genes whose expression patterns were not sufficiently similar to any others to
cluster into a well-defined co-expression module.
Genes with differential expression in modules and module eigengene
expression
Cross-referencing module membership with lists of genes whose individual expression
differed significantly across treatments revealed that those genes with differential
expression were not equally distributed among modules (Figure 22). The proportion of
genes that were individually differentially expressed at FDR = 0.05 within each module
ranged from 6% (Module 13) to 84% (Module 15), with a mean proportion of 46%.
Within the six largest modules, Module 5 had the highest proportion of differentially
expressed genes (81%) followed by Modules 1 and 4 (each with 55%), Module 6 (47%),
Module 3 (45%) and Module 2 (38%) (Figure 22).
Although Modules 7 to 22 accounted for only 25% of all genes, several of these smaller
modules showed a striking concentration of genes that were differentially expressed due
to a specific experimental variable (Figure 22). For example, two modules, Module 10
and Module 7, contained the highest proportions of genotype genes that together
comprised 44% of all 770 genotype genes genome-wide. Similarly, two other modules,
Modules 15 and 12, consisted of over 50% temperature genes. However, these two
predominantly temperature gene modules accounted for just 12% of all 1987 temperature
genes. Finally, Modules 9, 14, 16, and 22 showed high proportions of GxT genes whereas
the remaining modules (8, 11, 13, 17, 18, 19, 20, 21) consisted primarily of genes with no
individually significant differential expression.
The expression profile of each module can be represented by its “module eigengene”,
which is defined as the first principal component of each module. In my analysis, module
eigengene plots also illustrate the trends of differentially expressed genes within each
module (Figures 23, 24, 25, 26). For instance, for modules with a high proportion of
genotype genes, the module eigengene expression for each genotype was distinct and did
21
not change across temperatures (Figure 23). Similarly, for modules with a high
proportion of temperature genes, module eigengene expression was very similar for each
genotype such that the genotype expression profiles overlapped and changed
synchronously across temperatures (Figure 24).
By contrast, modules with an abundance of non-differentially expressed genes did not
show distinct eigengene profiles (Figure 26), with relatively modest differences between
genotypes and temperatures, particularly in small modules (n < 1000). This lack of a
strong pattern in eigengenes of modules with many non-differentially expressed genes
likely reflected a preponderance of expression changes that were too slight to be
statistically distinguishable from random fluctuation on a per-gene basis. Therefore, for
the remainder of my analysis with co-expression clustering, I focused on the six largest
modules and the 8 modules with at least 50% genes that were identified as having
significantly different expression with limma (collectively referred to as “Representative
Modules”).
Among the 14 Representative Modules, the proportion of genotype genes, temperature
genes, G & T genes, GxT genes, and non-differentially expressed genes differed
significantly from genome-wide proportions for all modules (Table 4). Again, genotype
genes were especially common in two modules (Module7, Module 10), and together
contained nearly half of all genotype genes genome-wide. The 638 genes in Module 7
yielded an eigengene expression pattern of greater expression in the Temperate genotype
whereas Module 10 (338 genes) expression was characteristic of greater expression in the
Tropical (Figure 23). Significantly overrepresented GO terms with the highest fold
enrichment associated with Module 7 such as “GABA receptor activity” and
“acetylcholine receptor activity” suggested that some genes in this module were related to
nervous system function (Supplementary Figure 1). In contrast, GO terms associated with
Module 10 such as “regulation of liquid surface tension” and “homeostatic process”
suggested that many genes in Module 10 were involved in basic physiological processes
(Supplementary Figure 2).
22
The Representative Modules that showed a preponderance of temperature genes together
consisted of 35% of all temperature genes genome-wide (Modules 6, 12, and 15; Figure
24). Although Modules 12 and 15 only accounted for 12% of all temperature genes, they
each had a very high proportion of temperature genes and each of their module
eigengenes described expression patterns that were strongly characteristic of genes whose
response to temperature is not genotype-specific (Figure 24). For example, genes in
Module 12 (n = 245) had their lowest expression at 14°C and highest expression at 30°C
and there was little to no expression difference between genotypes. For genes in Module
12, the increase in expression between 14°C and 20°C was greater than the increase
between 20°C and 30°C, suggesting that genes in this module are strongly downregulated
at low temperatures (Figure 24). Only one GO term, “chromatin binding”, was
significantly overrepresented in Module 12.
Module 15 was similar to Module 12 in that genes in this module show their highest
expression at 30°C (Figure 24). However, expression for genes in Module 15 differed in
that expression at 14°C and 20°C was equally low; there was only a very slight decrease
in expression from 14°C to 20°C and the greatest increase in expression occurred
between 20°C and 30°C. This suggests that genes in Module 15 are strongly upregulated
when reared at under chronic heat stress but are not affected when reared under chronic
cold stress or benign temperatures. However, functions of genes in Module 15 were
unknown as it yielded no significantly overrepresented GO terms, providing little clue as
to whether these heat-sensitive genes act in related functional pathways.
Although 31% of genes in Module 6 were temperature genes, over half of the total
number of genes in this module were also non-differentially expressed, making its
module eigengene somewhat difficult to interpret (Figure 22). However, one general
trend in the Module 6 eigengene is that expression increased in both the Temperate and
Tropical genotypes as temperature increased (Figure 24). GO terms associated with
Module 6 suggest that genes in this module are used in mitochondrial activity
(“mitochondrial transport”, “mitochondrion organization”) and also in translation (“RNA
splicing”, “translation”, “mRNA binding”) (Supplementary Figure 3).
23
Two Representative Modules, Module 4 and Module 5, had a high proportion of G & T
genes that was also reflected in their module eigengene expression profiles (Figure 22).
These two modules accounted for over half (56%) of all G & T genes genome-wide.
Genes in Module 4 (n = 1592) had higher expression in the Temperate genotype at all
temperatures (Figure 24). Module 4 genes also had their highest expression at 14°C,
which decreased as the temperature reached 20°C. However, as temperatures increased to
30°C, there was no change in expression from 20°C. This expression pattern suggests that
genes in Module 4 exhibit elevated expression at cool temperatures only, and that they
are consistently more strongly expressed in the Temperate genotype than the Tropical.
GO terms associated with Module 4 suggest that several genes in this module are
involved in basic physiological processes, and in the processing of fats in particular
(“lipid transport”, “cholesterol metabolism”, “regulation of liquid surface tension”)
(Supplementary Figure 4).
Expression in Module 5 (n = 1390) was similar to that in Module 4 in that expression was
highest at 14°C (Figure 24). However, in Module 4, expression was higher in the
Tropical genotype at all temperatures as compared to the Temperate strain. While
expression decreased with a shift in temperature from 14°C to 20°C, as in Module 4,
expression rebounded by increasing again strongly from 20°C to 30°C. This pattern
suggests that genes in Module 5 increase expression at extreme temperatures, and that
expression is greater in Tropical than Temperate genotypes. GO terms associated with
Module 5 are mainly indicative of intracellular activity, and of “regulation of
carbohydrate metabolism”, “cellular component morphogenesis”, and “cell proliferation”
in particular, suggesting that genes in this module are involved in the regulation of
cellular activity (Supplementary Figure 5).
The remaining seven modules (1, 2, 3, 9, 14, 16, 22) were composed of a large proportion
of GxT genes, indicating that the majority of genes in these modules had genotype-
specific responses to temperature (Figure 22). These seven modules accounted for 69%
of all GxT genes genome-wide.
24
Four modules, Modules 1, 2, 3, and 16, were similar in that both genotypes responded to
temperature by changing expression in the same direction (e.g. increasing in response to
heat). However, the interaction between genotype and temperature arose from expression
changing at different rates in the different genotypes. Specifically, the Temperate
genotype showed more drastic expression changes in response to heat in particular. For
example, all genes in Module 1 (n = 3203) have their peak expression at benign
temperatures. While both the Temperate and the Tropical genotype decrease expression
under heat stress, the Temperate genotype does so at a much faster rate (Figure 25). Some
GO terms associated with Module 1 that had the highest fold enrichment included “DNA
repair”, “RNA polymerase II transcription”, “chromatin segregation”, and “double-
stranded DNA binding”, suggesting that many genes in Module 1 interact with DNA, and
particularly in the context of transcription (Supplementary Figure 6).
In contrast to Module 1, genes in Module 2 (n = 2863) had their lowest expression at
20°C and increased expression at extreme temperatures (Figure 25). Again, in Module 2,
the Temperate genotype changed expression more drastically than the Tropical genotype;
expression in the Tropical genotype was lower than the Temperate at all temperatures and
expression only increased slightly at extreme temperatures. Also, the increase in
expression in the Temperate genotype was more drastic from 20°C to 30°C than from
20°C to 14°C. This suggests that genes in Module 2 increase expression in response to
extreme temperatures, but only very slightly in the Tropical genotype. Conversely,
Module 2 genes in the Temperate genotype increase expression more strongly in response
to extreme temperatures, and to heat in particular. Similarly to Module 1, Module 2 had
enriched GO terms that were related to transcription (“RNA polymerase II transcription”)
but also “translation” and “muscle contraction” (Supplementary Figure 7).
Expression in Module 3 (n = 1609) was similar to that in Module 2 in that expression
increased in response to extreme temperatures in both genotypes. Expression was also
higher in the Temperate genotype than the Tropical at all temperatures in Module 3, and
expression increased more drastically in the Temperate genotype than the Tropical.
However, what distinguished Module 3 from Module 2 was that expression in the
Tropical genotype in Module 3 was more similar to the Temperate genotype in response
25
to cool temperatures (Figure 25). When temperatures shifted from 20°C to 14°C, both the
Temperate and Tropical genotypes increased expression at a similar rate. This suggests
that, for genes in Module 3, genotype-specific expression happens particularly in
response to heat. Some highly enriched GO terms associated with Module 3 were
“neuron-neuron synaptic transmission”, “neurological system process”, “voltage-gated
potassium channel activity”, and “acetylcholine receptor activity”, suggesting that many
genes in Module 3 are involved in nervous system processes (Supplementary Figure 8).
Module 16 (n = 168) was the last GxT module in which the genotype-by-temperature
interaction resulted from a difference in rate of expression change (Figure 25). As in the
previous three modules, the GxT interaction resulted primarily from a drastic expression
change in the Temperate genotype in response to heat. For example, although the
Tropical genotype had higher expression than the Temperate at 14°C and 20°C,
expression in both genotypes remained steady. Both genotypes increased expression from
20°C to 30°C, but the Temperate genotype increased expression much more strongly and
reached the same level of expression as the Tropical at 30°C. This suggests that genes in
Module 16 do not respond to cool temperatures, but increase expression in response to
heat, particularly in the Temperate genotype. However, possible common functions of
genes in Module 16 remain unknown as there were no significantly enriched GO terms
associated with these genes.
The remaining three Representative Modules with a high proportion of GxT genes,
Modules 9, 14, and 22, were similar in that genes within these modules changed
expression in opposite directions in different genotypes in response to temperature
(Figure 25). For instance, when temperatures cooled from 20°C to 14°C, genes in Module
9 (n = 562) increased expression in the Tropical genotype but decreased expression in the
Temperate genotype. The Tropical genotype had its highest expression at 14°C which
declined strongly at 20°C and decreased only slightly at 30°C. In contrast, the Temperate
genotype had its lowest expression at 14°C, then increased expression at 20°C and
decreased slightly at 30°C. This suggests that, at cool temperatures, genes in Module 9
have low expression in Temperate genotypes and high expression in Tropical genotypes
but that expression does not change much relative to benign conditions for either
26
genotype under chronic heat stress.. Many GO terms associated with Module 9 suggested
that these genes were involved in transcription (“regulation of transcription from RNA
polymerase II promoter”, “sequence-specific DNA binding transcription factor activity”,
“nucleic acid binding transcription factor activity”) and “biological regulation” was also
indicated (Supplementary Figure 9).
In Module 14 (n = 228), expression in the Tropical genotype peaked at 20°C while this
same temperature in the Temperate genotype caused expression to reach its lowest point
(Figure 25). In the Tropical genotype, expression decreased in response to extreme
temperatures while in the Temperate genotype, expression increased in response to
extreme temperatures such that the Temperate genotype had higher expression than the
Tropical at both 14°C and 30°C. This suggests that for genes in Module 14, Tropical and
Temperate genotypes have opposite responses to extreme temperatures.
This same opposing pattern of expression across temperatures for the two genotypes was
seen in Module 22 (n = 49). However, expression in Module 22 differed in that
expression in the Temperate genotype was highest at 14°C instead of 30°C as in Module
14 (Figure 25). Changes in expression at extreme temperatures were also more drastic in
Module 22 than in Module 14. This suggests that genes in Module 22 had a stronger
response to extreme temperatures, and that expression increased greatly in the Temperate
genotype in response to cold in particular. Both Modules 14 and 22 were enriched for GO
terms related to translation, translation initiation, and the ribosome, suggesting that genes
in these modules are responsible for the conversion of mRNA transcripts into functional
proteins (Supplementary Figures 10 & 11).
Analysis of heat shock protein genes
Differential expression of hsp genes
Of the 24 hsp genes that I tested for differential expression with limma, 8 showed
significant differential expression. Surprisingly, this proportion (33%) is significantly less
than the proportion of genes with differential expression genome-wide (54%) (G = 4.267,
27
df = 1, p-value = 0.039). Among the 8 significantly differentially expressed genes, there
was 1 genotype gene, 2 temperature genes, and 5 GxT genes, which is not significantly
different from the distribution of differential expression classes throughout the genome
(Fisher Exact Test, p-value = 0.35). This suggests that while there were fewer hsp genes
with differential expression than expected, no one type of differential expression (e.g. due
to temperature only) was predominant among hsp genes.
Clustering and expression patterns of hsp genes
Among the 24 hsp genes, 15 genes clustered into Representative Modules (Module 2: 3,
Module 3: 5, Module 4: 4, Module 5: 2, Module 7: 1). Of these 15 genes, 8 were in
predominantly GxT modules, 6 were in predominantly G & T, and 1 was in a
predominantly genotype module.
Given the module eigengenes of these Representative Modules, expression in 13 of these
15 genes was higher in the Temperate genotype than the Tropical. Four genes had their
highest expression at 14°C, 8 had their highest expression levels at 30°C, 2 genes
increased expression at both temperature extremes, and one gene (genotype gene) did not
change expression across temperatures. No genes were most highly expressed at 20°C,
suggesting that hsp genes that respond to temperature changes (either cooling or heating)
do so by increasing expression. Furthermore, many hsp genes were more highly
expressed in the Temperate genotype than the Tropical.
28
Analysis of differential expression across chromosomal domains
Chromosomal domains are characterized by distinct recombination rates in C. briggsae
(Ross et al. 2011). Arm regions of chromosomes experience high recombination rates
whereas centre regions have relatively low recombination rates. This pattern is true
across the autosomes, but the X chromosome has a less pronounced difference in
recombination rates between arms and the centre. Consequently, I tested whether
different groups of genes were located preferentially in the arms or centres of
chromosomes, and examined the autosomes and X chromosome separately. Among
groups of differentially expressed genes, Genotype genes and genes with no differential
expression were significantly enriched on the X chromosome whereas G&T and GxT
genes were enriched on autosomes (Figure 31). Among Representative Modules,
Modules 1, 5, 6, and 14 were significantly enriched on autosomes (whereas Modules 3, 4,
9, 10, and 12 were significantly enriched on the X chromosome (Figure 32).
Chromosomal domain enrichment of differentially expressed genes
Across autosomes, Genotype and GxT groups of differentially expressed genes were
significantly enriched in the arms by 1.22- and 1.04-fold, respectively. However, G&T
genes were enriched in centres by 1.15-fold. Temperature genes and genes with no
differential expression were not enriched in either domain (Figure 27, Table 5). By
contrast, no differentially expressed gene groups were enriched in either arm or centre
domains on the X chromosome (Figure 28).
Chromosomal domain enrichment of genes for Representative
Modules
Among the 22 co-expression modules (Module 0 was not included as membership in this
module was not based on expression similarity between genes), 14 modules had
29
significant enrichment in either arm or centre chromosomal domains on autosomes. Of
these 14 modules, 9 also were Representative Modules containing a large complement of
a particular class of differentially expressed genes (i.e. Temperature only, Genotype only,
GxT, G&T; Modules 1, 3, 6, 9, 10, 12, 14, 15, 16; Figure 29, Table 6). Among the 10
Representative Modules with significant chromosome domain enrichment on autosomes,
3 were enriched in the centres (Modules 3, 6,14), with 2 being predominantly GxT
modules (Modules 3, 14), the third module being comprised predominantly of
temperature genes (Module 6). The remaining 6 Representative Modules were enriched
in arm domains, corresponding to temperature (Modules 12 and 15), GxT (Modules 1, 9,
and 16), and genotype (Module 10) modules.
On the X chromosome, three co-expression modules were significantly enriched in either
arm or centre chromosomal domains, and all were Representative Modules (Modules 1,
7, 12, Figure 30). Modules 1 and 12 were enriched on the arms while Module 7 was
enriched in the centre. All three of these modules represented different differential
expression groups; Module 1 was a GxT module, Module 6 a Genotype module, and 12 a
Temperature module.
Overall, more modules showed enrichment in arms than in centres for both autosomes
and the X chromosome (Figure 32). All modules that consisted of primarily of Genotype
or Temperature genes were significantly enriched in either the arms or centre. Module 6,
a Temperature module, was enriched in the centres whereas Modules 12 and 15 were
enriched in the arms on autosomes. Module 12 was enriched in the arms on both
autosomes and the X chromosome. Similarly, all Genotype modules were enriched in
either the arms or centres. Module 10 was enriched in the arms of autosomes while
Module 7 was enriched in the centre of the X chromosome. Conversely, only 4 of the
GxT Representative Modules were significantly enriched in either arm or centre regions.
Among autosomes, two GxT modules were enriched in the arms and two were enriched
in the centres. Module 1 was significantly enriched in the arms on both autosomes and
the X chromosome. In terms of function, many GO terms associated with Representative
Modules with enrichment in centres were related to translation whereas this was not true
of modules with enrichment in the arms.
30
Discussion
I tested 16 199 genes for differential expression across three temperatures and between
two genotypes that derived from different latitudes of origin and discovered that over half
of all transcripts (54%) showed differential expression. The majority of differentially
expressed genes showed an interaction between genotype and temperature (4919 of
8795). To characterize the ways in which expression changes with temperature, and to
contrast these patterns between the Temperate and Tropical genotypes, I used co-
expression clustering to identify 22 distinct expression profiles, 14 of which I considered
their module eigengenes to be representative of all genes in the module. Finally, to
investigate the genomic architecture of genes whose expression is controlled by different
variables, I asked whether certain genes were preferentially located in certain regions of
chromosomes and discovered a consistent overrepresentation in arm domains of
genotype-dependent differential expression.
Genes with genotype-by-environment interactions in gene expression
Advances in sequencing technology have enabled the study of gene expression on a
genomic scale, making it possible to investigate the effects of genetics and environmental
conditions on whole organisms. Whole-genome expression studies also make it possible
to quantify genotype-by-environment interactions (GxE) and to shed light on the relative
proportion of genes that show genotype-specific responses to environmental variables.
Interaction effects on expression between genotype and environment can be very
common, with most studies showing over 20% of genes having GxE responses. However,
the proportion of genes with GxE effects tends to be variable. For example, 47% of yeast
transcripts showed strain-condition interactions in response to conditions with different
glucose concentrations (Smith and Kruglyak 2008). In contrast, 59% of genes in C.
elegans showed eQTL-by-environment interactions in response to changes in temperature
(Li et al 2006) and 21% of genes showed GxE in A. thaliana in response to soil drying
(DesMarais et al. 2012). Similarly, an investigation into the interactions between
31
temperature and geographic origin in Drosophila melanogaster identified 56 genes with
an interaction out of 1,760 assayed at an FDR of 0.10 (Levine et al. 2011). Results from
my study fall within the wide range seen in other investigations; 30% of all genes tested
showed GxE. However, there are many technical considerations that could affect the
proportion of genes with interactions. For example, some experimental designs include
many genotypes (DesMarais et al. 2012) or instead focus on contrasting two isogenic
genotypes, as was done in my study. Also, microarray experiments might provide
different results than transcriptomic studies (Marioni et al. 2008, Wang et al. 2009).
Comparing results between species may also be difficult due to inherent differences in
ecology that could influence gene expression. For instance, genes involved in
reproduction in an animal that is capable of self-fertilization could feasibly be expressed
in different proportions than genes with similar function in an animal that shows a high
degree of sexual conflict. Furthermore, different proportions of genes could be involved
in behavioural responses in animals compared to chemical responses in plants, making it
difficult to compare proportions of genes with GxE effects across species. Finally, even
within-species comparisons yield different proportions of GxE when testing different
environmental factors. In A. thaliana, for example, cold stress elicits more genotype-by-
environment interactions than drought stress (Lasky et al. 2014).
While numerous variables make it difficult to compare the proportions of interactions
observed, a common pattern among all previous studies investigating GxE in gene
expression is that interactions comprise a minority of genes among all genes that are
differentially expressed. Genes that are differentially expressed due to genotype or
environment generally outnumber those with significant interaction effects. By contrast,
in my study, the highest number of genes showed an interaction between genotype and
temperature, followed by genes that were differentially expressed due to temperature.
While part of this discrepancy may be due to true biological differences between study
organisms, it is again possible that experimental design and other technical factors might
contribute. For example, the environmental change from 14°C to 30°C may elicit a
greater response than the environmental difference between glucose concentrations for
yeast and could result in more power to detect interactions in my study. Furthermore,
gene expression was measured for single genotypes only in this study. When
32
investigating GxE in the context of local adaptation, it is common to measure expression
for ecotypes, often in natural habitats, that comprise several individuals that may not be
genetically identical (Des Marais et al. 2012). Expression for each ecotype could then
potentially be averaged across several genotypes, dampening stronger reactions and
decreasing the chance of detecting statistically significant interactions.
While averaging expression within ecotypes or across multiple strains (Smith and
Kruglyak 2008) likely gives a more realistic representation of overall patterns, focusing
on isogenic genotypes has the advantage of increasing resolution at the sequence level.
For example, it is possible to link SNPs directly to expression differences because
sequences are identical across all replicates within a genotype. This advantage could be
particularly useful in identifying the sequence changes that are associated with genes that
show an interaction between genotype and environment because it could allow us to
home in on the sequence differences that cause genes to behave differently in different
genotypic backgrounds.
Finally, many studies investigate GxE at only two environmental levels. In contrast, my
study measured expression at three different temperatures. This again could have the
effect of increasing the number of significant interactions identified in my analysis
because a gene could have an interaction at cold temperatures, hot temperatures, or at
both. Studies for which the environmental variable only has two levels have just one
opportunity for an interaction to manifest. If my analysis had only examined either hot or
cold stress, the number of significant interactions observed may well decrease and reflect
similar proportions to other studies of expression GxE.
Gene expression responses to chronic cold versus chronic heat stress
Although chronic cold stress and chronic heat stress are both induced by changes in
temperature, the challenges they pose to organisms are qualitatively different (Deutsch et
al. 2008, MacMillan and Sinclair 2011). This is especially true of ectotherms whose
33
internal temperature fluctuates with ambient temperature (Martin and Huey 2008). Traits
that vary with temperature often do so by following a characteristic thermal performance
curve that approximates the shape of a concave parabola where the peak of the curve
represents the optimal temperature for that trait (Angilletta 2009). While performance
declines when moving away from the optimal temperature in either direction, it does so at
different rates, prompting the hypothesis that cold stress affects organisms in different
ways than heat stress and operates through different mechanisms (Roitberg and Mangel
2016). If this is indeed the case, then I would expect to see very few genes that were
differentially expressed due to both very low and very high temperatures.
The results from my analysis provide support for the idea that cold stress and heat stress
elicit different responses, especially at the level of gene expression. Of the 3106 genes
that changed expression in response to temperature (with no interaction), fewer than 20%
responded to both cold and heat stress while nearly 80% responded to cold alone. This
result is consistent with the idea that one set of genes in particular responds to cold stress.
Interestingly, genes that had a genotype-specific response to temperature again had very
few genes that responded to both cold and heat, but in contrast to temperature genes, the
majority of GxT genes responded to heat alone, indicating that interactions happen more
frequently at increased temperatures. Taken together, these results suggest that most
genes whose expression changes in response to temperature respond in the same way
when temperatures decrease but have genotype-dependent responses when temperatures
increase. This pattern of expression was also borne out in module eigengene expression.
Modules 4, 5, and 6 represented the majority of temperature genes and Modules 4 and 6
in particular showed a change in expression only at decreasing temperatures. Similarly,
Modules 1, 2, and 3 comprised the majority of GxT genes and illustrated similar
expression levels between genotypes at decreased temperatures but disparate expression
levels at increased temperatures.
The differences in cold and heat response could be due to both mechanical and ecological
factors. For example, it has been suggested that as temperatures decrease from the
thermal optimum, performance is described by the Arrhenius relationship, indicating that
as temperatures cool, performance decreases simply because the reaction rates of
34
enzymes slow down (Brown et al. 2004, Roitberg and Mangel 2016). In contrast, high
temperatures disrupt diverse essential functions that affect performance in a complex
way. For example, exposure to high temperatures lowers the ability of mitochondria to
process oxygen in aquatic ectotherms (Portner 2010), may compromise immune function
in insects (Karl et al. 2011), and causes neuron death in D. melanogaster due to an
imbalance in ion homeostasis (Robertson and Money 2012). If it is true that lowering
temperatues simply slows temperature-dependent reactions whereas increasing
temperature results in complex instabilities, it could explain my observations of relatively
consistent responses to cold in both Temperate and Tropical phenotypes versus the varied
responses to heat.
In spite of Temperate and Tropical strains showing similar expression patterns under cold
stress and distinct patterns under heat stress, I observed phenotypic differences at both
temperature extremes, suggesting that expression is not simply the result of temperature-
dependent enzyme activity. At 14°C, the Temperate genotype has higher fecundity than
the Tropical genotype whereas at 30°C, the Tropical genotype is more fecund than the
Temperate (Prasad et al. 2011). This suggests that the relationship between gene
expression and phenotype at the organism level is not straightforward and that certain
genes influence fitness differently depending on the genotype in which it is expressed. A
possible explanation for this complex relationship is a scenario in which different alleles
fix in different populations because they are each beneficial in their local environments
but have either no effect or negative effects in the other, known as conditional neutrality
and antagonistic pleiotropy, respectively (Anderson et al. 2013). A meta analysis of QTL
studies in A. thaliana found that antagonistic pleiotropy underlies at least 60% of
instances of GxE (Des Marais et al. 2013), suggesting that antagonistic pleiotropy is very
common, and could be responsible for the pattern observed in our Temperate and
Tropical genotypes.
35
Chromosomal domains and differentially expressed genes
Given the well-described structure of C. briggsae chromosomes, I wanted to determine
whether there was a relationship between gene expression pattern and physical location
on chromosomes. Like C. elegans, chromosomes in C. briggsae lack centromeres and are
organized into clear domains that are defined by distinct rates of recombination
(Rockman and Kruglyak 2009, Ross et al. 2011). Centre domains have relatively low
rates of recombination while arm domains have much higher rates of recombination
(Cutter and Choi 2010). Arm domains also are the most genetically variable regions, in
terms of both functional and silent site diversity, whereas centre domains have the highest
gene density and lower polymorphism (Thomas et al. 2015). These patterns suggest that
if sequence differences drive differential expression, then the expression of genes that are
primarily located in chromosome centres is likely regulated by trans-acting factors.
Conversely, genes that are differentially expressed that are located in arm regions could
potentially be regulated in cis, given the higher polymorphism in these domains.
Furthermore, genes whose expression is more consistent across environments, or whose
expression changes in a similar manner across environments tend to be cis-regulated
(Smith and Kruglyak 2008) whereas genes whose expression is more variable across
different environments tend to be trans-regulated in C. elegans and in yeast (Li et al.
2006, Smith and Kruglyak 2008). Taken together, these observations suggest the
hypothesis that genes identified as having a significant effect of genotype are more likely
to be cis-regulated and located in arm domains. Additionally, I would expect to see most
Temperature genes in centre regions because they have similar responses in both
genotypes and therefore should have fewer cis-acting regulatory polymorphisms. Finally,
given that variable responses to the environment, especially those that constitute a change
in direction of expression between genotypes (i.e. increase in one and decrease in the
other), are regulated by trans-acting factors ((Li et al. 2006, Smith and Kruglyak 2008), I
would expect that most GxT genes would be located in the centres of chromosomes.
Consistent with this hypothesis, my analysis revealed that, on the whole, autosomal genes
that were differentially expressed due to genotype were overrepresented in arm domains
by almost 20%. Contrary to my expectations, GxT genes were also enriched in autosome
36
arm domains, albeit to a lesser extent (1.04-fold enrichment). Interestingly, G&T genes
were enriched in autosome centres.
The unexpected enrichment of GxT genes in autosome arms may be the result of
considering all GxT genes together, regardless of their expression patterns. For example,
whereas differential expression in Genotype genes can have only two patterns (i.e. higher
expression in Tropical than Temperate or vice versa), numerous distinct expression
profiles can each produce a significant interaction between genotype and temperature.
Indeed, when arm versus centre enrichment for GxT genes was examined for separate co-
expression modules the results can be better explained. For example, the observation that
the expression of cis-regulated genes tends to change less across environments whereas
expression in trans-regulated genes tends to change more across environments as well as
demonstrates the crossing reaction norms characteristic of GxE in local adaptation (Smith
and Kruglyak 2008, Kawecki and Ebert 2004). For instance, Modules 1, 9, and 16 were
enriched in autosome arms while Modules 3 and 14 were enriched in the centre. Module
eigengenes for the modules enriched in the arms could be interpreted as having regions in
which expression stays relatively consistent, a feature of cis-regulated genes (Smith and
Kruglyak 2008). For example, gene expression in Modules 1 and 16 does not change
drastically between 14°C and 20°C and expression in Module 9 is relatively unchanged
between 20°C and 30°C. Expression patterns between these temperatures are similar to
the expression profiles of Genotype genes, which were also enriched in arm domains. In
contrast, expression of module eigengenes of modules enriched in the centres shows
more drastic changes in expression, consistent with genes regulated in trans (Smith and
Kruglyak 2008). For instance, in Module 14, expression is opposite in the genotypes,
showing the most drastically different expression pattern between Temperate and
Tropical.
The enrichment of G&T genes in autosome centres can also be interpreted as conforming
to expectations. Although these genes have significant effects of Genotype and
Temperature on their expression, the effects of these factors are independent. When G&T
genes are considered as genes that are significantly differentially expressed due to
temperature, it is expected that they be primarily trans-regulated and consequently that
37
they be enriched in chromosome centres, where polymorphism is low. The difference in
expression between the genotypes that is maintained across temperatures could be caused
by a polymorphism in the promoter of a trans-acting factor that regulates expression in
the same way in both the Temperate and Tropical genotypes.
Less enrichment of gene groups on the X chromosome overall is unsurprising under the
assumption that differences in recombination rates drive differences in gene expression
and function. For example, genes that are specific to germline function are absent on the
X chromosome (Reinke et al. 2000). Gene density and recombination rates are both more
uniform across domains on the X chromosome (Hillier et al. 2007, Andersen et al. 2012).,
suggesting that differential expression groups would not be preferentially located in
either the arms or the centre. At the module level, three modules (1, 7, 12) were enriched
in X chromosome domains. However, Modules 1 and 12 show the same pattern of
enrichment in the autosomes, indicating that these two modules are overrepresented in
the arms in all chromosomes. Only Module 7, a Genotype module, was unique in being
enriched in the centre on the X. While it would be unexpected that a Genotype module be
enriched in the centre on an autosome, it is less surprising on the X chromosome where
the domains are less distinct.
My results are therefore consistent with the idea that higher nucleotide polymorphism
creates more functional variation in the form of genetic variation for gene expression.
More enrichment in distinct chromosomal domains was observed in the autosomes,
where regions of polymorphism are most pronounced. Given that these patterns of
enrichment are also consistent with expectations for genes that are regulated by cis- and
trans-acting factors, it would be interesting to further explore the question of gene
regulation with this dataset.
It would be possible to gain insight into the potential cis- and trans-regulation of these
differentially expressed genes by integrating data from my analysis with SNP
information. Using SNP variant data for the Temperate genotype (the C. briggsae
reference genome is based on the Tropical genotype), the SNP density could be
quantified for upstream regions of differentially expressed genes. If the expression of
38
genotype genes is primarily regulated in cis, then the density of SNPs in promoter regions
ought to be higher than expected. This information would be particularly interesting
given that the Temperate and Tropical genotypes are considered by some researchers to
represent the early stages of speciation (Baird and Stonesifer 2012, Abbott et al. 2013,
Chang et al. 2016). Interspecific differences in gene expression are caused primarily by
differences in cis-regulatory regions whereas intraspecies differences tend to be driven by
trans-regulatory regions (Wittkopp et al. 2008, Tirosh et al. 2009). If genotype genes
have higher SNP density in promoter regions, they could represent genes that are
contributing to expression differences between incipient species.
Small RNAs and temperature-sensitive regulation of gene expression
Small RNA RNA-seq data also were collected at the same time as the mRNA data used
in this study for the aim of shedding light on the mechanisms of gene expression
regulation in response to chronic temperature stress. Small RNAs are encoded within the
genome and bind to mRNA targets after being transcribed themselves (Claycomb 2012).
Typically, small RNAs silence their target genes by cleaving or binding to transcripts to
prevent subsequent translation into proteins. Certain small RNAs are also sensitive to
temperature, particularly those involved in the maintenance of fertility, such as Piwi-
interacting small RNAs (piRNAs) (Conine et al. 2009, Batista et al. 2008). piRNAs target
foreign genetic sequences such as transgenes and transposons in the germline during
development and are more active at increased temperatures in C. elegans (Batista et al.
2008, Lee et al. 2012). Similar piRNAs are found in many species within Caenorhabditis,
including C. briggsae (Shi et al. 2013, Tu et al. 2015). Given that most genotype-specific
responses are observed under heat stress, it is possible that post-transcriptional regulation
by piRNAs provide another pathway through which different expression patterns are
produced between the Temperate and Tropical genotypes. Future analyses that relate
small RNA expression to the patterns of mRNA expression revealed through this analysis
could help explain genotype-specific gene regulation, particularly under chronic heat
stress.
39
Conclusion
My analysis of temperature-dependent patterns of gene expression in Temperate and
Tropical populations of C. briggsae revealed several surprising results. For example, both
my differential expression and my co-expression clustering analyses showed that the
response to rearing under cold stress was qualitatively different from the response to
rearing under heat stress. A small proportion of genes responded to both cold stress and
heat stress whereas most Temperature genes responded to cold stress only and most GxT
genes responded to heat stress only. Genotype-specific responses to temperature were
also relatively common throughout the genome, occurring in 30% of all genes tested.
Visualization of module eigengenes of co-expression clusters corroborated my
differential expression results and revealed that the majority of genes that have GxT
responses showed genotype-specific expression only in response to heat stress.
Expression changed in the same direction for both Temperate and Tropical genotypes but
under heat stress, the expression change in the Temperate genotype was more drastic
when compared to that of the Tropical genotype. Finally, the enrichment of Genotype
genes and GxT genes in the more polymorphic arm domains of autosomal chromosomes
suggests that these genes tend to be regulated by cis-acting factors.
Results from this study point to the potential for future investigations. For example, given
that the Temperate and Tropical genotypes are considered by some to be undergoing
speciation and that most expression differences observed between species are cis-
regulated, it would be interesting to quantify the actual number of SNPs in regions
upstream of the differential expression groups to validate the results from analyses done
at the chromosome scale. In particular, genes with an effect of genotype that have a high
number of SNPs in upstream regions could be identified as likely being regulated in cis
and potentially contributing to expression differences between the incipient species. Such
genes may be considered candidate genes for investigations into the genome architecture
of speciation. Finally, the integration of the mRNA expression data with small RNA
expression data from the same experiment could shed light on patterns of post-
transcriptional regulation.
40
Table 1. Number of raw and cleaned reads in fastq files from Genome Quebec.
sample raw clean
AF14-1.1 17830362 17168256
AF14-1.2 17782202 17120594
AF14-2.1 26025046 25044050
AF14-2.2 26054157 25056610
AF14-3.1 29963016 29129264
AF14-3.2 29612363 28780806
AF20-1.1 31872995 30736051
AF20-1.2 31763308 30628482
AF20-2.1 34508074 33251105
AF20-2.2 34522693 33241950
AF20-3.1 24025150 23389721
AF20-3.2 23746489 23115267
AF30-1.1 36732913 35418582
AF30-1.2 36617640 35307045
AF30-2.1 19787947 19103142
AF30-2.2 19811944 19115062
AF30-3.1 18967224 18475211
AF30-3.2 18735493 18246983
HK14-1.1 28556436 27483161
HK14-1.2 28482204 27409768
HK14-2.1 24166460 23286124
HK14-2.2 24185031 23286128
HK14-3.1 26809455 26054377
HK14-3.2 26491330 25742077
HK20-1.1 17304708 16698014
HK20-1.2 17251430 16646977
HK20-2.1 27441709 26350515
HK20-2.2 27480811 26366445
HK20-3.1 22694619 22052191
HK20-3.2 22434965 21794800
HK30-1.1 29222885 28223148
HK30-1.2 29132626 28137121
HK30-2.1 22879967 22100957
HK30-2.2 22876949 22086416
HK30-3.1 24915269 24343611
HK30-3.2 24623445 24056254
41
Figure 1. Number of reads from fastq files for each data file (each biological replicate was sampled across 2 lanes) before and after cleaning with Trimmomatic. Replicates that begin with “AF” denote Tropical genotypes and “HK” denotes Temperate genotypes.
0
5
10
15
20
25
30
35
40
AF1
4-1
.1
AF1
4-1
.2
AF1
4-2
.1
AF1
4-2
.2
AF1
4-3
.1
AF1
4-3
.2
AF2
0-1
.1
AF2
0-1
.2
AF2
0-2
.1
AF2
0-2
.2
AF2
0-3
.1
AF2
0-3
.2
AF3
0-1
.1
AF3
0-1
.2
AF3
0-2
.1
AF3
0-2
.2
AF3
0-3
.1
AF3
0-3
.2
HK
14
-1.1
HK
14
-1.2
HK
14
-2.1
HK
14
-2.2
HK
14
-3.1
HK
14
-3.2
HK
20
-1.1
HK
20
-1.2
HK
20
-2.1
HK
20
-2.2
HK
20
-3.1
HK
20
-3.2
HK
30
-1.1
HK
30
-1.2
HK
30
-2.1
HK
30
-2.2
HK
30
-3.1
HK
30
-3.2
Re
ad
s (1
06)
raw
clean
42
Figure 2. Distribution of intron lengths in C. briggsae reference genome (WS253). Counts left of the red line represent 99% of all introns.
log2 intron length (bp)
Co
unt
(mill
ion
s)
0 5 10 15
0
1
2
3
4
5
43
Figure 3. Ratio of average number of uniquely mapped reads in Tropical and Temperate Genotypes. To minimize the bias towards reads mapped from the reference genotype, up to 10 mismatches were allowed per read.
1.06
1.08
1.1
1.12
1.14
1.16
1.18
0 1 2 3 4 5 6 7 8 9 10
Ra
tio
of
Tro
pic
al
to T
em
pe
rate
av
g.
un
iqu
e r
ea
ds
ma
pp
ed
Number of Mismatches
44
Table 2. Number and percentage of reads that mapped to unique locations (i.e. one location in the genome) with STAR.
Sample
Uniquely Mapped
Reads
% Uniquely Mapped
Reads
AF14-1 32007586 93.35%
AF14-2 47012080 93.84%
AF14-3 54523984 94.15%
AF20-1 57787850 94.17%
AF20-2 62122726 93.43%
AF20-3 43596994 93.75%
AF30-1 66295835 93.74%
AF30-2 35618842 93.20%
AF30-3 33087508 90.10%
HK14-1 51264144 93.39%
HK14-2 43616826 93.65%
HK14-3 48035023 92.74%
HK20-1 30262908 90.76%
HK20-2 49528582 93.95%
HK20-3 41332043 94.26%
HK30-1 51537189 91.44%
HK30-2 40748065 92.22%
HK30-3 35632338 73.62%
45
Figure 4. Percentage of uniquely mapped reads with STAR per biological replicate. Maximum mismatch rate was set at 10. “AF” denotes Tropical genotypes and “HK” denotes Temperate genotypes. Although sample HK30-3 had a lower proportion of uniquely mapped reads, the absolute number of uniquely mapped reads was comparable and so the sample was retained for downstream analysis.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
% u
niq
ue
ly m
ap
pe
d r
ea
ds
Biological Replicate
46
a)
b)
Figure 5. a) Number of reads counted with htseq-count by biological replicate. “AF” = Tropical genotype, “HK” = Temperate genotype. b) Percentage of reads counted. Although the percentage of reads counted in HK30-3 was low, the sample was kept for downstream analysis because the number of reads counted was comparable.
0
10
20
30
40
50
60
70
80R
ea
ds
(x 1
06)
Biological Replicates
Not Counted
Counted
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
% R
ea
ds
Biological Replicates
Not Counted
Counted
47
Figure 6. Multi-dimensional scaling plot (MDS) of filtered, normalized, and log transformed count data. Different colours represent experimental groups (ex. Temperate at 20°C). “AF” = Tropical genotype and “HK” = Temperate genotype. The x-axis represents the principal component with the largest proportion of variation and the y-axis represents the principal component with the second largest proportion of variation. Biological replicates that cluster together in the plot are more similar to each other. Samples that are close in space indicates consistency across replicates.
−4 −2 0 2 4
−2
0
2
4
Leading logFC dim 1
Le
ad
ing
lo
gF
C d
im 2
AF14−1AF14−2
AF14−3
AF20−1AF20−2
AF20−3
AF30−1AF30−2
AF30−3HK14−1HK14−2HK14−3
HK20−1HK20−2
HK20−3
HK30−1HK30−2
HK30−3
48
a)
b)
Figure 7. a) Distribution of p-values for t-tests for effect of strain, b) distribution of p-values for F-tests for effect of temperature. Heavily right-skewed distributions for both indicate that there is a good possibility of identifying significant effects for strain and temperature in many genes.
p−values
No.
of g
en
es (
10
00s)
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
5
6
7
p−values
No.
of g
en
es (
10
00
s)
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
5
6
7
49
Figure 8. Analysis for differential expression with edgeR (negative binomial distribution) versus limma (log-transformed counts distribution). edgeR = blue, limma = red. Limma is more conservative in all tests.
GxT G only T only G&T no DE
Differential Expression Group
No.
ge
ne
s (
10
00
s)
0
1
2
3
4
5
6
7
8
50
Figure 9. Quantile-quantile plot for normalized, voom-transformed count data shows that data approximate a normal distribution (represented by the red line) and are suitable for analysis with the limma package.
51
Figure 10. After filtering out genes with very low to no counts, TMM normalization for different library sizes, and log-transforming count data, a dendrogram reveals similarity of samples within strain and within temperature but not replicate. This suggests that the data are heterogeneous (i.e. from different sources) but for reasons of experimental design and not batch effects.
52
Figure 11. Analysis of soft-thresholding powers revealed 30 to be the power at which the scale-free fit is maximized (R2 = 0.75) and most closely approximates a scale-free network.
53
Figure 12. Analysis of a range of soft-thresholding powers revealed 30 to be the number at which the scale-free fit is maximized and that has a mean connectivity of at least 100 (k = 115).
54
Table 3. Values for each soft threshold power that was tested. Ideal powers have an
R-squared value close to 1, a slope close to -1, and a mean k (connectivity) over 100.
55
Figure 13. Clustering with WGCNA produced 124 modules based on expression similarity of 16 199 genes. One module (Module 0) contained 37 genes whose expression patterns were not sufficiently similar to be placed in any module. The soft thresholding power that determines similarity between genes was set at 30.
56
Figure 14. Clustering of 16 199 genes with WGCNA was also visualized as a heatmap in which red represents maximum similarity and blue no similarity. Large blocks of red indicate there is also similarity between clusters. Each colour on the x- or y-axis represents a co-expression module.
57
Figure 15. After merging modules with a <0.25 distance, 23 modules remained, including Module 0. The dendrogram and heatmap echo module similarity patterns across the initial 124 modules.
58
Figure 16. The heatmap of clustering of 16 199 genes with WGCNA after merging modules with a distance between them of less than 0.25. The clustering pattern of similar modules that was seen in the initial heatmap of 124 modules is retained after merging. Each colour on the x- or y-axis represents a co-expression module.
59
Figure 17. Differential expression analysis with limma showed that over half (54%) of all genes were significantly differentially expressed. Of these genes, the majority had significant interaction effects (GxT) (FDR= 0.05). “G&T” genes showed significant effects of genotype and temperature independently whereas “Genotype” genes and “Temperature” genes were significantly differentially expressed for those variables alone.
60
Figure 18. Looking at the proportion of differentially expressed genes that are expressed under cold stress versus heat stress reveals that the majority of genes with an effect of temperature respond to cold stress whereas most of the genes with an interaction respond to heat stress. G&T genes are included in this figure as Temperature genes because they have a significant effect of temperature and it is independent of its effect of genotype.
61
Figure 19. Looking at the proportion of genes that increase or decrease expression in response to either cold stress (light blue line) or heat stress (red line) shows that roughly equal proportions increase as decrease expression, especially for Temperature genes. Again, more Temperature genes (G&T and Temperature genes) respond to cold stress whereas more GxT genes respond to heat stress.
62
Figure 20. Constrasting the magnitude of expression change for genes with a significant independent effect of temperature (Temperature genes, G&T genes) under chronic cold stress (light blue line) and chronic heat stress (red line) shows that for genes with a significant independent effect of temperature, the magnitude of change in expression is similar under both types of extreme temperature stress. However, more variation is seen in the magnitude of expression change between cold stress and heat stress for genes with a significant interaction. For example, for GxT genes, the magnitude of expression increase is greater under heat stress than cold stress whereas for genes with a significant effect of temperature, the response is comparable at both temperatures.
63
Figure 21. Co-expression clustering of 16 199 genes by expression similarity with WGCNA resulted in 22 modules ordered by size. Module 0 is the module that contains genes whose expression patterns were not sufficiently similar to other genes to be placed in a module. Cross-referencing module membership with the results of my differential expression analysis revealed that differentially expressed genes are not distributed equally among the modules.
64
Figure 22. The 23 modules that resulted from co-expression clustering of 16 199 genes have different proportions of differentially expressed genes. Modules with > 1000 genes and or with > 50% differentially expressed genes were retained for further analysis.
65
Figure 23. Module eigengene plots of normalized, log2-transformed expression across temperature treatments for Modules 7 and 10, Representative Modules with a large proportion of genes that were differentially expressed due to Genotype, show that genes in Module 7 are expressed more in the Temperate genotype (blue) whereas genes in Module 10 are expressed more in the Tropical genotype (red).
66
Figure 24. Module eigengene plots of normalized, log2-transformed expression across temperature treatments for Modules 4, 5, 6, 12, and 15, Representative Modules with a large proportion of genes that were differentially expressed due to Temperature. Genes in Modules 6, 12, and 15 increase expression in response to rearing under heat stress. Genes in Modules 4 and 5, modules with a large proportion of G&T genes, increase expression in response to rearing under cold stress. Genes in Module 4 are expressed more in the Temperate genotype (blue) whereas genes in Module 5 are expressed more in the Tropical genotype (red).
67
Figure 25. Module eigengene plots of normalized, log2-transformed expression across temperature treatments for Modules 1,2,3,9,14,16, and 22, Representative Modules with a large proportion of genes that showed significant interactions between genotype and temperature. Genes in Modules 1, 2, 3, and 16 show similar patterns of increase and decrease in expression at each temperature, but the Temperate (blue) genotype changes expression more drastically when reared under chronic heat stress. A minority of genes from Modules 14 and 22 show opposite patterns of expression between Temperate and Tropical (red) genotypes.
68
Figure 26. Module eigengene plots of normalized, log2-transformed expression across temperature treatments for Modules with fewer than 1000 genes and less than 50% differentially expressed genes. Expression patterns in these non-Representative Modules are less distinct between Temperate (blue) and Tropical (red) genotypes and across temperatures. Crossing over of expression patterns between genotypes where the difference in slopes is not pronounced indicates low power to detect differential expression for genes in these modules.
69
Table 4. Table of G-test p-values for to test whether the proportion of differentially expressed genes in a module differed significantly from genome-wide proportions (p = 0.05, Bonferroni adjusted). All modules are significantly different except for Module 0 (membership in Module 0 is not based on expression similarity).
Module
G-test p-
value adj. p-value
0 0.006774373 0.155810568
1 2.69E-36 6.19E-35
2 3.67E-143 8.45E-142
3 2.06E-48 4.73E-47
4 1.66E-72 3.81E-71
5 1.34E-138 3.08E-137
6 8.79E-132 2.02E-130
7 1.39E-124 3.19E-123
8 9.32E-92 2.14E-90
9 3.47E-11 7.99E-10
10 3.52E-108 8.10E-107
11 5.35E-27 1.23E-25
12 1.24E-60 2.86E-59
13 8.84E-58 2.03E-56
14 1.24E-05 0.000284691
15 1.23E-46 2.83E-45
16 6.15E-08 1.41E-06
17 1.60E-08 3.68E-07
18 7.40E-17 1.70E-15
19 1.01E-21 2.33E-20
20 2.53E-08 5.81E-07
21 1.75E-14 4.01E-13
22 2.34E-08 5.39E-07
70
Table 5. G-test p-values from a test to determine whether the proportion of genes located in chromosome arms or centres differed from expected proportions for each differential expression group (p = 0.05, Bonferroni adjusted).
DE
group
G test p-
value adj. p-value
T only 0.144735537 0.723677687
G only 1.13E-07 5.66E-07
G&T 2.35E-05 0.00011734
GxT 0.005203834 0.026019169
noDE 0.132138705 0.660693526
71
Table 6. G-test p-values from a test to determine whether the proportion of genes located in autosome arms or centres differed from expected proportions for each of the 22 modules identified by co-expression clustering (FDR = 0.05, Benjamini-Hochberg correction).
Module
G test p-
value adj. p-value
1 0.012177946 0.022326235
2 0.544743083 0.630755149
3 0.009617524 0.021158554
4 0.152170221 0.209234054
5 0.768639623 0.805946524
6 6.28E-18 1.38E-16
7 0.995516473 0.995516473
8 1.37E-05 6.03E-05
9 0.02862504 0.044982206
10 1.86E-11 1.37E-10
11 0.116978603 0.171568618
12 8.24E-05 0.000258911
13 8.16E-14 8.98E-13
14 0.011301327 0.022326235
15 0.002961166 0.008143206
16 1.87E-05 6.85E-05
17 0.00545346 0.01333068
18 0.377099659 0.460899583
19 0.769312591 0.805946524
20 0.014375624 0.024327979
21 2.54E-09 1.40E-08
22 0.265232617 0.34324221
72
Figure 27. The proportion of genes in each differential expression category that are located in either arm or centre chromosomal domains on autosomes (chromosomes I – V). * indicates significant enrichment (p = 0.05, Bonferroni adjustment).
73
Figure 28. The proportion of genes in each differential expression category that are located in either arm or centre chromosomal domains on the X chromosome. No groups were significantly enriched in arms or the centre (p = 0.05, Bonferroni adjustment).
74
Figure 29. The proportion of genes in each of the Representative Modules that are located in either arm or centre chromosomal domains on the autosomes. Blue = Temperature Modules, red = Genotype Modules, purple = G&T modules, orange = GxT modules. * indicates significant enrichment (FDR = 0.05).
75
Figure 30. The proportion of genes in each of the Representative Modules that are located in either arm or centre chromosomal domains on the X-chromosome. Blue = Temperature Modules, red = Genotype Modules, purple = G&T modules, orange = GxT modules. * indicates significant enrichment (FDR = 0.05).
76
Table 7. Table of G-test p-values for tests of whether the proportions of genes on autosomes and the X-chromosome were significantly different from expectations for each differential expression group (p = 0.05, Bonferroni adjusted).
DE
group
G test p-
value adj. p-value
T only 0.766174012 1
G only 0.002842788 0.014213941
G&T 0.001210832 0.006054162
GxT 0.000236965 0.001184823
no DE 0.002134383 0.010671915
77
Table 8. Table of G-test p-values for tests of whether the proportions of genes on autosomes and the X-chromosome were significantly different from expectations for each co-expression module (FDR = 0.05, BH correction).
Module
G test p-
value adj. p-value
1 1.11E-36 8.17E-36
2 0.058403179 0.071381663
3 4.32E-17 1.58E-16
4 1.85E-38 2.04E-37
5 2.31E-30 1.27E-29
6 8.17E-39 1.80E-37
7 0.010162859 0.013973931
8 0.039370389 0.050949915
9 4.70E-28 2.07E-27
10 8.20E-05 0.000180384
11 0.002065337 0.003029161
12 1.05E-13 3.29E-13
13 0.000638395 0.001170391
14 0.001346515 0.002115951
15 0.285415181 0.313956699
16 0.121207929 0.140346023
17 1.96E-05 4.79E-05
18 0.000107935 0.000215869
19 0.415392993 0.435173612
20 0.000718097 0.001215242
21 1.09E-06 3.00E-06
22 0.758338434 0.758338434
78
Figure 31. The proportion of genes on autosomes and the X-chromsome for each differential expression group. The dotted red line indicates the expected proportion. * indicates significant enrichment on either the autosomes or the X-chromosome (p = 0.05, Bonferroni adjusted).
79
Figure 32. The proportion of genes on autosomes and the X-chromsome for each co-expression module. The dotted red line indicates the expected proportion. * indicates significant enrichment on either the autosomes or the X-chromosome (p = 0.05, Bonferroni adjusted). – under the module name indicates a Representative Module.
80
Supplementary Figure 1. Treemap generated with REVIGO from significantly overrepresented molecular function GO terms for Module 7 (Genotype module).
m7_mf
acetylcholine receptor activityGABA receptor activity GABA receptor activity
ligand−gated ion channel activity
nucleic acid binding
81
Supplementary Figure 2. Treemap generated with REVIGO from significantly overrepresented biological process GO terms for Module 10 (Genotype module).
m10_bp
metabolic
process
homeostatic processregulation of liquid surface tension
cell
communication
metabolism
regulation of liquid surface tension
82
Supplementary Figure 3. Treemap generated with REVIGO from significantly overrepresented biological process GO terms for Module 6 (Temperature module).
m6_bp
mitochondrial transport
protein localization
protein targeting
cellular component biogenesismitochondrion organization
nitrogen compound
metabolic process
primary
metabolic
process
nucleobase−containing
compound metabolic process
oxidative phosphorylation
protein metabolic process
RNA metabolic process
RNA splicing
rRNA metabolic process
translation
neurological
system
process
sensory
perception
of smell
single−multicellular
organism process
system
process
anatomical
structure
morphogenesis
behavior
biological
regulation
cell
communication
cellular component
organization or
biogenesis
developmental
process
immune system
process
metabolism
mitochondrial transport
mitochondrion organization
multicellular
organismal
process
nitrogen compound metabolism
response to
stimulus
RNA splicing
sulfur compound metabolism
system process
83
Supplementary Figure 4. Treemap generated with REVIGO from significantly overrepresented biological process GO terms for Module 4 (G&T module).
m4_bp
cellular amino acid
metabolic process
cholesterol
metabolic process
DNA metabolic process
lipid metabolic
process
proteolysis
cation transport
exocytosis
ion transport
lipid transport
protein transport
receptor−mediated endocytosis
transport
vesicle−mediated
transport
vitamin transport
anatomical structure
morphogenesis
cellular component
morphogenesis
ectoderm
developmentmesoderm development
organelle
organization
primary
metabolic
process
homeostatic process
regulation
of catalytic
activity
regulation of liquid
surface tension
regulation of molecular function
behavior
cell
communication
cell−cell signaling
synaptic transmission
sensory
perception
of smell
single−multicellular
organism process
system
processvisual perception
biological adhesion
biological regulation
cell−matrix adhesion
cellular
process
cholesterol metabolism
developmental process
immune system process
lipid transport
localization
mesoderm development
metabolism
multicellular
organismal
process
primary
metabolism
regulation of liquid surface tension
response to external stimulus
response to stimulus
synaptic transmission
visual perception
84
Supplementary Figure 5. Treemap generated with REVIGO from significantly overrepresented biological process GO terms for Module 5 (G&T module).
m5_bp
carbohydrate
metabolic process
anatomical structure morphogenesis
cellular component morphogenesis
cellular component
organization
mitotic nuclear division
response
to stress
DNA
metabolic
process
mRNA processing
nucleobase−containing
compound metabolic
process
phosphate−containing
compound metabolic processprotein metabolic process
protein phosphorylation
regulation of
transcription
from RNA
polymerase II
promoter
RNA metabolic
process
biosynthetic
process
primary
metabolic
process
glycogen metabolic process
polysaccharide metabolic process
regulation of carbohydrate metabolic process
neurological
system process
single−multicellular
organism process
system
process
carbohydrate metabolism
cell proliferation
cellular component morphogenesis
developmental process
immune responseimmune system process
mRNA processing
multicellular
organismal
processprimary metabolism
regulation of carbohydrate metabolism
response to stimulus
system process
85
Supplementary Figure 6. Treemap generated with REVIGO from significantly overrepresented biological process GO terms for Module 1 (GxT module).
m1_bp
cell communication
cell cycle
chromatin organization mitotic nuclear division
organelle organization
cellular protein
modification process
DNA metabolic processDNA repair
nucleobase−containing
compound metabolic processprotein metabolic process
regulation of
nucleobase−containing
compound
metabolic process
regulation of
transcription from RNA
polymerase II promoter
RNA metabolic process
RNA splicing
transcription from RNA
polymerase II promoter
translation
nitrogen compound
metabolic process
primary metabolic process
anion
transportcation transport ion transport
nuclear transportRNA localization
muscle
contraction
single−multicellular
organism process
system
process
visual
perception
biological
adhesion
biosynthesis
cell
adhesion
cell cycle
cellular component
organization or biogenesis
chromatin organization
chromosome segregation
DNA repair
ectoderm
development
immune system
process
metabolism
multicellular
organismal
process
nitrogen compound metabolism
RNA localization
synaptic
transmissionsystem process
86
Supplementary Figure 7. Treemap generated with REVIGO from significantly overrepresented biological process GO terms for Module 2 (GxT module).
m2_bp
anatomical
structure
morphogenesis
cellular
component
morphogenesis
cellular
component
organization
mitotic nuclear
divisioncation transport
ion transport
muscle contraction
single−multicellular organism process
system process
lipid metabolic processsteroid metabolic process cellular protein
modification process
DNA metabolic
processDNA repair
nucleobase−containing
compound metabolic processprotein metabolic process
RNA splicing
rRNA
metabolic
processtranslation
cell cycle
cellular component
morphogenesis
cellular component
organization or biogenesis
cellular process
developmental process
ion transportmulticellular organismal process
muscle contraction
primary metabolism
steroid metabolism
translation
87
Supplementary Figure 8. Treemap generated with REVIGO from significantly overrepresented biological process GO terms for Module 3 (GxT module).
m3_bp
anion transport
carbohydrate transport
cation transportendocytosis
extracellular transportion transport
transport
vesicle−mediated
transport
vitamin transport
cyclic nucleotide metabolic process
DNA
metabolic
process
regulation of
catalytic activity
regulation of phosphate
metabolic process
RNA metabolic
process
mesoderm
development
neurological
system process
single−multicellular
organism processsystem process
visual perception
cell−cell signaling
neuron−neuron synaptic transmission
mRNA
processing
translation
anion transport
biological adhesion
biological regulation
cell communication
cellular process
cyclic nucleotide metabolism
developmental process
ectoderm development
immune system
process
localizationmulticellular
organismal process
neurological system process
neuron−neuron synaptic transmission
regulation of
molecular function
response to
stimulus
single organismal
cell−cell adhesion
translation
88
Supplementary Figure 9. Treemap generated with REVIGO from significantly overrepresented biological process GO terms for Module 9 (GxT module).
m9_bp
cell communicationcell cycle
ectoderm developmentmesoderm development
cellular protein
modification process
nucleobase−containing
compound metabolic process
regulation of
biological process
regulation of catalytic activity
regulation of molecular function
regulation of
nucleobase−containing
compound
metabolic process
regulation of
transcription from RNA
polymerase II promoter
transcription from RNA
polymerase II promoter
transcription, DNA−templated
biological adhesion
biological regulationcell cycle
cell−matrix adhesion
cellular component movement
cellular process
developmental process
mesoderm development
metabolism
primary
metabolism
regulation of transcription from RNA polymerase II promoter
sensory perception of sound
89
Supplementary Figure 10. Treemap generated with REVIGO from significantly overrepresented biological process GO terms for Module 14 (GxT module).
m14_bp
generation of precursor metabolites and energy protein metabolic process
regulation of translationtranslation
cellular component
biogenesis
metabolism
oxidative phosphorylation
primary metabolism
translation
90
Supplementary Figure 11. Treemap generated with REVIGO from significantly overrepresented biological process GO terms for Module 22 (GxT module).
m22_bp
protein metabolic processtranslation
metabolism
primary metabolism
translation
91
92
92
References
Abbott, R., Albach, D., Ansell, S., Arntzen, J. W., Baird, S. J. E., Bierne, N., ... & Butlin, R. K.
(2013). Hybridization and speciation. Journal of Evolutionary Biology, 26(2), 229-246.
Anders, S., Pyl, P. T., & Huber, W. (2015). HTSeq—a Python framework to work with high-
throughput sequencing data. Bioinformatics, 31(2), 166.
Andersen, E. C., Gerke, J. P., Shapiro, J. A., Crissman, J. R., Ghosh, R., Bloom, J. S., ... &
Kruglyak, L. (2012). Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic
diversity. Nature genetics, 44(3), 285-290.
Anderson, J. T., Lee, C. R., Rushworth, C. A., Colautti, R. I., & Mitchell-Olds, T. (2013).
Genetic trade-offs and conditional neutrality contribute to local adaptation. Molecular ecology,
22(3), 699-708.
Angilletta, M. J. (2009). Thermal adaptation: a theoretical and empirical synthesis. Oxford
University Press.
Baird, S. E., & Stonesifer, R. (2012). Reproductive Isolation in Caenorhabditis briggsae:
Dysgenic Interactions Between Maternal-and Zygotic-effect Loci Result in a Delayed
Development Phenotype. Worm, 1(4), 189-195.
Batista, P. J., Ruby, J. G., Claycomb, J. M., Chiang, R., Fahlgren, N., Kasschau, K. D., ... &
Conte, D. (2008). PRG-1 and 21U-RNAs interact to form the piRNA complex required for
fertility in C. elegans. Molecular cell, 31(1), 67-78.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and
powerful approach to multiple testing. Journal of the royal statistical society. Series B
(Methodological), 289-300.
Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina
sequence data. Bioinformatics, 30(15), 2114.
93
93
Britten, R. J., & Davidson, E. H. (1969). Gene regulation for higher cells: a theory. Science,
165(3891), 349-357.
Britten, R. J., & Davidson, E. H. (1971). Repetitive and non-repetitive DNA sequences and a
speculation on the origins of evolutionary novelty. Quarterly Review of Biology, 111-138.
Brown, J. H., Gillooly, J. F., Allen, A. P., Savage, V. M., & West, G. B. (2004). Toward a
metabolic theory of ecology. Ecology, 85(7), 1771-1789.
Carroll, S. B. (2005). Endless forms most beautiful: The new science of evo devo and the making
of the animal kingdom (No. 54). WW Norton & Company.
Chang, C. C., Rodriguez, J., & Ross, J. (2016). Mitochondrial–nuclear epistasis impacts fitness
and mitochondrial physiology of interpopulation Caenorhabditis briggsae hybrids. G3: Genes|
Genomes| Genetics, 6(1), 209-219.
Chen, J., Nolte, V., & Schlötterer, C. (2015). Temperature-Related Reaction Norms of Gene
Expression: Regulatory Architecture and Functional Implications. Molecular Biology and
Evolution, 32(9), 2393.
Claycomb, J. M. (2012). Caenorhabditis elegans small RNA pathways make their mark on
chromatin. DNA and cell biology, 31(S1), S-17.
Conine, C. C., Batista, P. J., Gu, W., Claycomb, J. M., Chaves, D. A., Shirayama, M., & Mello,
C. C. (2010). Argonautes ALG-3 and ALG-4 are required for spermatogenesis-specific 26G-
RNAs and thermotolerant sperm in Caenorhabditis elegans. Proceedings of the National
Academy of Sciences, 107(8), 3588-3593.
Coolon, J. D., McManus, C. J., Stevenson, K. R., Graveley, B. R., & Wittkopp, P. J. (2014).
Tempo and mode of regulatory evolution in Drosophila. Genome research, 24(5), 797-808.
Cutter, A. D. (2015). Caenorhabditis evolution in the wild. BioEssays, 37(9), 983-995.
94
94
Cutter, A. D., & Choi, J. Y. (2010). Natural selection shapes nucleotide polymorphism across the
genome of the nematode Caenorhabditis briggsae. Genome research, 20(8), 1103-1111.
Cutter, A. D., Félix, M. A., Barrière, A., & Charlesworth, D. (2006). Patterns of nucleotide
polymorphism distinguish temperate and tropical wild isolates of Caenorhabditis briggsae.
Genetics, 173(4), 2021-2031.
Des Marais, D. L., McKay, J. K., Richards, J. H., Sen, S., Wayne, T., & Juenger, T. E. (2012).
Physiological genomics of response to soil drying in diverse Arabidopsis accessions. The Plant
Cell, 24(3), 893-914.
Des Marais, D. L., Hernandez, K. M., & Juenger, T. E. (2013). Genotype-by-environment
interaction and plasticity: exploring genomic responses of plants to the abiotic environment.
Annual Review of Ecology, Evolution, and Systematics, 44, 5-29.
Deutsch, C. A., Tewksbury, J. J., Huey, R. B., Sheldon, K. S., Ghalambor, C. K., Haak, D. C., &
Martin, P. R. (2008). Impacts of climate warming on terrestrial ectotherms across latitude.
Proceedings of the National Academy of Sciences, 105(18), 6668-6672.
Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., ... & Gingeras, T. R.
(2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15-21.
Fang, Z., & Cui, X. (2011). Design and validation issues in RNA-seq experiments. Briefings in
bioinformatics, 12(3), 280.
Félix, M. A., Jovelin, R., Ferrari, C., Han, S., Cho, Y. R., Andersen, E. C., ... & Braendle, C.
(2013). Species richness, distribution and genetic diversity of Caenorhabditis nematodes in a
remote tropical rainforest. BMC evolutionary biology, 13(1), 1.
Fournier-Level, A., Korte, A., Cooper, M. D., Nordborg, M., Schmitt, J., & Wilczek, A. M.
(2011). A map of local adaptation in Arabidopsis thaliana. Science, 334(6052), 86-89.
Gibson, G. (2008). The environmental contribution to gene expression profiles. Nature Reviews
Genetics, 9(8), 575-581.
95
95
Gilchrist, G. W., & Huey, R. B. (2004). Plastic and genetic variation in wing loading as a
function of temperature within and among parallel clines in Drosophila subobscura. Integrative
and Comparative Biology, 44(6), 461-470.
Grillo, M. A., Li, C., Hammond, M., Wang, L., & Schemske, D. W. (2013). Genetic architecture
of flowering time differentiation between locally adapted populations of Arabidopsis thaliana.
New Phytologist, 197(4), 1321-1331.
Hiller, L. W., Miller, R. D., Baird, S. E., Chinwalla, A., Fulton, L. A., Koboldt, D. C., &
Waterston, R. H. (2007). Comparison of C. elegans and C. briggsae Genome Sequences Reveals
Extensive Conservation of Chromosome Organization and Synteny. PLoS Biology, 5(7), 1603.
Hoekstra, H. E., & Coyne, J. A. (2007). The locus of evolution: evo devo and the genetics of
adaptation. Evolution, 61(5), 995-1016.
Hung, H. J., O'Neill, R. T., Bauer, P., & Kohne, K. (1997). The behavior of the p-value when the
alternative hypothesis is true. Biometrics, 11-22.
Hurme, P., Repo, T., Savolainen, O., & Pääkkönen, T. (1997). Climatic adaptation of bud set and
frost hardiness in Scots pine (Pinus sylvestris). Canadian Journal of Forest Research, 27(5),
716-723.
Jacob, F., & Monod, J. (1961). Genetic regulatory mechanisms in the synthesis of proteins.
Journal of molecular biology, 3(3), 318-356.
Karl, I., Stoks, R., De Block, M., Janowitz, S. A., & Fischer, K. (2011). Temperature extremes
and butterfly fitness: conflicting evidence from life history and immune function. Global Change
Biology, 17(2), 676-687.
Kawecki, T. J., & Ebert, D. (2004). Conceptual issues in local adaptation. Ecology letters, 7(12),
1225-1241.
King, M. C., & Wilson, A. C. (1975). Evolution at two levels in humans and chimpanzees.
Science, 188(4184), 107-116.
96
96
Langfelder, P., & Horvath, S. (2007). Eigengene networks for studying the relationships between
co-expression modules. BMC systems biology, 1(1), 54.
Langfelder, P., & Horvath, S. (2008). WGCNA: an R package for weighted correlation network
analysis. BMC bioinformatics, 9(1), 1.
Lasky, J. R., Des Marais, D. L., Lowry, D. B., Povolotskaya, I., McKay, J. K., Richards, J. H., ...
& Juenger, T. E. (2014). Natural variation in abiotic stress responsive gene expression and local
adaptation to climate in Arabidopsis thaliana. Molecular biology and evolution, 31(9), 2283-
2296.
Law, C. W., Chen, Y., Shi, W., & Smyth, G. K. (2014). Voom: precision weights unlock linear
model analysis tools for RNA-seq read counts. Genome biology, 15(2), 1.
Lee, H. C., Gu, W., Shirayama, M., Youngman, E., Conte, D., & Mello, C. C. (2012). C. elegans
piRNAs mediate the genome-wide surveillance of germline transcripts. Cell, 150(1), 78-87.
Levine, M. T., Eckert, M. L., & Begun, D. J. (2011). Whole-genome expression plasticity across
tropical and temperate Drosophila melanogaster populations from Eastern Australia. Molecular
biology and evolution, 28(1), 249-256.
Li, Y., Álvarez, O. A., Gutteling, E. W., Tijsterman, M., Fu, J., Riksen, J. A., ... & Breitling, R.
(2006). Mapping determinants of gene expression plasticity by genetical genomics in C. elegans.
PLoS Genet, 2(12), e222.
Lindquist, S. (1986). The heat-shock response. Annual review of biochemistry, 55(1), 1151-1191.
MacMillan, H. A., & Sinclair, B. J. (2011). Mechanisms underlying insect chill-coma. Journal of
Insect Physiology, 57(1), 12-20.
Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M., & Gilad, Y. (2008). RNA-seq: an
assessment of technical reproducibility and comparison with gene expression arrays. Genome
research, 18(9), 1509-1517.
97
97
Martin, T. L., & Huey, R. B. (2008). Why “suboptimal” is optimal: Jensen’s inequality and
ectotherm thermal preferences. The American Naturalist, 171(3), E102-E118.
Massey, J. H., & Wittkopp, P. J. (2016). Chapter Two-The Genetic Basis of Pigmentation
Differences Within and Between Drosophila Species. Current Topics in Developmental Biology,
119, 27-61.
Matsuba, C., Ostrow, D. G., Salomon, M. P., Tolani, A., & Baer, C. F. (2013). Temperature,
stress and spontaneous mutation in Caenorhabditis briggsae and Caenorhabditis elegans. Biology
letters, 9(1), 20120334.
Mi, H., Dong, Q., Muruganujan, A., Gaudet, P., Lewis, S., & Thomas, P. D. (2009). PANTHER
version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology
Consortium. Nucleic acids research, 33(1), D284-D288.
Moss, E. G., Lee, R. C., & Ambros, V. (1997). The cold shock domain protein LIN-28 controls
developmental timing in C. elegans and is regulated by the lin-4 RNA. Cell, 88(5), 637-646.
Nikolayeva, O., & Robinson, M. D. (2014). edgeR for differential RNA-seq and ChIP-seq
analysis: an application to stem cell biology. Stem Cell Transcriptional Networks: Methods and
Protocols, 45-79.
Pigliucci, M., Murren, C. J., & Schlichting, C. D. (2006). Phenotypic plasticity and evolution by
genetic assimilation. Journal of Experimental Biology, 209(12), 2362-2367.
Pörtner, H. O. (2010). Oxygen-and capacity-limitation of thermal tolerance: a matrix for
integrating climate-related stressor effects in marine ecosystems. Journal of Experimental
Biology, 213(6), 881-893.
Poullet, N., Vielle, A., Gimond, C., Ferrari, C., & Braendle, C. (2015). Evolutionarily divergent
thermal sensitivity of germline development and fertility in hermaphroditic Caenorhabditis
nematodes. Evolution & development, 17(6), 380-397.
Prasad, A., Croydon‐Sugarman, M. J., Murray, R. L., & Cutter, A. D. (2011).
Temperature‐dependent fecundity associates with latitude in Caenorhabditis briggsae. Evolution,
65(1), 52-63.
98
98
Pruitt, K. D., Brown, G. R., Hiatt, S. M., Thibaud-Nissen, F., Astashyn, A., Ermolaeva, O., ... &
Murphy, M. R. (2014). RefSeq: an update on mammalian reference sequences. Nucleic acids
research, 42(D1), D756-D763.
Reinke, V., Smith, H. E., Nance, J., Wang, J., Van Doren, C., Begley, R., ... & Kim, S. K.
(2000). A global profile of germline gene expression in C. elegans. Molecular cell, 6(3), 605-
616.
Robertson, R. M., & Money, T. G. (2012). Temperature and neuronal circuit function:
compensation, tuning and tolerance. Current opinion in neurobiology, 22(4), 724-734.
Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor package for
differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139-140.
Rockman, M. V., & Kruglyak, L. (2009). Recombinational landscape and population genomics
of Caenorhabditis elegans. PLoS Genet, 5(3), e1000419.
Roitberg, B. D., & Mangel, M. (2016). Cold snaps, heatwaves, and arthropod growth. Ecological
Entomology, 41(6), 653-659.
Ross, J. A., Koboldt, D. C., Staisch, J. E., Chamberlin, H. M., Gupta, B. P., Miller, R. D., ... &
Haag, E. S. (2011). Caenorhabditis briggsae recombinant inbred line genotypes reveal inter-
strain incompatibility and the evolution of recombination. PLoS Genet, 7(7), e1002174.
Sarge, K. D., Park-Sarge, O. K., Kirby, J. D., Mayo, K. E., & Morimoto, R. I. (1994). Expression
of heat shock factor 2 in mouse testis: potential role as a regulator of heat-shock protein gene
expression during spermatogenesis. Biology of Reproduction, 50(6), 1334-1343.
Shi, Z., Montgomery, T. A., Qi, Y., & Ruvkun, G. (2013). High-throughput sequencing reveals
extraordinary fluidity of miRNA, piRNA, and siRNA pathways in nematodes. Genome research,
23(3), 497-508.
Smith, E. N., & Kruglyak, L. (2008). Gene–environment interaction in yeast gene expression.
PLoS Biol, 6(4), e83.
99
99
Smyth, G. K. (2005). Limma: linear models for microarray data. In Bioinformatics and
computational biology solutions using R and Bioconductor (pp. 397-420). Springer New York.
Stein, L. D., Bao, Z., Blasiar, D., Blumenthal, T., Brent, M. R., Chen, N., ... & Coulson, A.
(2003). The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics.
PLoS Biol, 1(2), 166-192.
Stegeman, G. W., de Mesquita, M. B., Ryu, W. S., & Cutter, A. D. (2013). Temperature-
dependent behaviours are genetically variable in the nematode Caenorhabditis briggsae. Journal
of Experimental Biology, 216(5), 850-858.
Stinchcombe, J. R., Weinig, C., Ungerer, M., Olsen, K. M., Mays, C., Halldorsdottir, S. S., ... &
Schmitt, J. (2004). A latitudinal cline in flowering time in Arabidopsis thaliana modulated by the
flowering time gene FRIGIDA. Proceedings of the National Academy of Sciences of the United
States of America, 101(13), 4712-4717.
Supek, F., Bošnjak, M., Škunca, N., & Šmuc, T. (2011). REVIGO summarizes and visualizes
long lists of gene ontology terms. PloS one, 6(7), e21800.
Thomas, C. G., Wang, W., Jovelin, R., Ghosh, R., Lomasko, T., Trinh, Q., ... & Cutter, A. D.
(2015). Full-genome evolutionary histories of selfing, splitting, and selection in Caenorhabditis.
Genome research, 25(5), 667-678.
Thomas, D. (2010). Gene–environment-wide association studies: emerging approaches. Nature
Reviews Genetics, 11(4), 259-272.
Tirosh, I., Reikhav, S., Levy, A. A., & Barkai, N. (2009). A yeast hybrid provides insight into the
evolution of gene expression regulation. Science, 324(5927), 659-662.
Tu, S., Wu, M. Z., Wang, J., Cutter, A. D., Weng, Z., & Claycomb, J. M. (2015). Comparative
functional characterization of the CSR-1 22G-RNA pathway in Caenorhabditis nematodes.
Nucleic acids research, 43(1), 208-224.
100
100
Walsh, N. P., Alba, B. M., Bose, B., Gross, C. A., & Sauer, R. T. (2003). OMP peptide signals
initiate the envelope-stress response by activating DegS protease via relief of inhibition mediated
by its PDZ domain. Cell, 113(1), 61-71.
Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: a revolutionary tool for
transcriptomics. Nature reviews genetics, 10(1), 57-63.
Wittkopp, P. J., Haerum, B. K., & Clark, A. G. (2008). Regulatory changes underlying
expression differences within and between Drosophila species. Nature genetics, 40(3), 346-350.
Wray, G. A., Hahn, M. W., Abouheif, E., Balhoff, J. P., Pizer, M., Rockman, M. V., & Romano,
L. A. (2003). The evolution of transcriptional regulation in eukaryotes. Molecular biology and
evolution, 20(9), 1377-1419.
Wray, G. A. (2007). The evolutionary significance of cis-regulatory mutations. Nature Reviews
Genetics, 8(3), 206-216.
Zhang, B., & Horvath, S. (2005). A general framework for weighted gene co-expression network
analysis. Statistical applications in genetics and molecular biology, 4(1), 1128.
101
1