supplementary information tiny rnas associated with ... · tiny rnas associated with transcription...

31
Taft RJ et al. – tiRNAs 1 Supplementary Information Tiny RNAs associated with transcription start sites in animals Ryan J. Taft 1 , Evgeny A. Glazov 2 , Nicole Cloonan 1 , Cas Simons 1 , Stuart Stephen 1 , Geoff Faulkner 1 , Timo Lassmann 3 , Alistair R.R. Forrest 3,4 , Sean M. Grimmond 1 , Kate Schroder 1 , Katharine Irvine 1 , Takahiro Arakawa 3 , Mari Nakamura 3 , Atsutaka Kubosaki 3 , Kengo Hayashida 3 , Chika Kawazu 3 , Mitsuyoshi Murata 3 , Hiromi Nishiyori 3 , Shiro Fukuda 3 , Jun Kawai 3 , Carsten O. Daub 3 , David A. Hume 1,5 , Harukazu Suzuki 3 , Valerio Orlando 6 , Piero Carninci 3 , Yoshihide Hayashizaki 3 and John S. Mattick 1 1 Institute for Molecular Bioscience, The University of Queensland, St. Lucia, QLD 4072, Australia. 2 Diamantina Institute for Cancer, Immunology and Metabolic Medicine, The University of Queensland, Princess Alexandra Hospital, Ipswich Road, Woolloongabba, Qld, 4102, Australia. 3 RIKEN Omics Science Center, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho Tsurumi-ku Yokohama, Kanagawa, 230-0045 Japan. 4 The Eskitis Institute for Cell and Molecular Therapies, Griffith University, QLD 4111, Australia. 5 The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Roslin, EH259PS, UK. 6 Dulbecco Telethon Institute, IGB CNR, Epigenetics and Genome Reprograming lab, Via Pietro Castellino 111, Napoli, 80131, and Dulbecco Telethon Institute, IRCCS Santa Lucia at EBRI, Via del Fosso di Fiorano 64, Rome 00146, Italy. Bioinformatics correspondence should be addressed to R.J.T ([email protected] ). Experimental correspondence should be addressed to V.O ([email protected] ) or P.C ([email protected] ). General correspondence should be addressed to Y.H. ([email protected] ) or J.S.M. ([email protected] ). Nature Genetics: doi:10.1038/ng.312

Upload: lamnhi

Post on 20-Feb-2019

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

1

Supplementary Information

Tiny RNAs associated with transcription start sites

in animals

Ryan J. Taft1, Evgeny A. Glazov2, Nicole Cloonan1, Cas Simons1, Stuart Stephen1, Geoff Faulkner1, Timo Lassmann3, Alistair R.R. Forrest3,4, Sean M. Grimmond1, Kate Schroder1, Katharine Irvine1, Takahiro Arakawa3, Mari Nakamura3, Atsutaka Kubosaki3, Kengo Hayashida3, Chika Kawazu3, Mitsuyoshi Murata3, Hiromi Nishiyori3, Shiro Fukuda3, Jun Kawai3, Carsten O. Daub3, David A. Hume1,5, Harukazu Suzuki3, Valerio Orlando6, Piero Carninci3, Yoshihide Hayashizaki3 and John S. Mattick1 1Institute for Molecular Bioscience, The University of Queensland, St. Lucia, QLD 4072, Australia.

2Diamantina Institute for Cancer, Immunology and Metabolic Medicine, The University of Queensland, Princess Alexandra Hospital, Ipswich Road, Woolloongabba, Qld, 4102, Australia.

3RIKEN Omics Science Center, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho Tsurumi-ku Yokohama, Kanagawa, 230-0045 Japan.

4The Eskitis Institute for Cell and Molecular Therapies, Griffith University, QLD 4111, Australia.

5The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Roslin, EH259PS, UK.

6Dulbecco Telethon Institute, IGB CNR, Epigenetics and Genome Reprograming lab, Via Pietro Castellino 111, Napoli, 80131, and Dulbecco Telethon Institute, IRCCS Santa Lucia at EBRI, Via del Fosso di Fiorano 64, Rome 00146, Italy.

Bioinformatics correspondence should be addressed to R.J.T ([email protected]). Experimental correspondence should be addressed to V.O ([email protected]) or P.C ([email protected]). General correspondence should be addressed to Y.H. ([email protected]) or J.S.M. ([email protected]).

Nature Genetics: doi:10.1038/ng.312

Page 2: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

2

Supplementary Figures

Supplementary Figure 1

Example tiRNA loci. (a) Human (red) and chicken (light brown) tiRNAs are conserved at the EIF4G2 transcription start site in human. Regions of RNA PolII binding are depicted in dark brown, Sp1 binding regions in yellow, and CpG islands in green. The strand orientation of tiRNAs and deepCAGE clusters (dark blue) are indicated as white chevrons. (b) Drosophila tiRNAs (dark red) downstream of the Adh TSS. TiRNA strand orientation is indicated as white chevrons.

Nature Genetics: doi:10.1038/ng.312

Page 3: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

3

Supplementary Figure 2

Chicken small RNA size distributions by embryonic stage. The size distribution of all uniquely mapping small RNA tags from chicken embryonic stage day 5 (brown), day 7 (orange), and day 9 (yellow). Day 5 displayed the weakest enrichment (16-fold) at Refgene TSSs, while both CE7 and CE9 showed ~60-fold enrichment.

Nature Genetics: doi:10.1038/ng.312

Page 4: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

4

Supplementary Figure 3

Drosophila tiRNAs size and position characteristics. Small RNAs were obtained from Ruby et al.1. (a) The black line indicates the transcription start site, and the black arrow depicts the direction of transcription. Gray bars represent windows of 10 nt, and those above the x axis depict small RNAs with the same strand orientation as the TSS. Bars below the x axis (negative values) indicate small RNAs antisense to the TSS. Small RNAs are dominantly upstream and in the same orientation as the TSS. (b) Small RNAs that map to the same strand and are found in the region -60 to +120 relative to the TSS, or on the opposite strand within 400 nt upstream of the TSS, are dominantly 18 nt.

Nature Genetics: doi:10.1038/ng.312

Page 5: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

5

Supplementary Figure 4

Small RNA density and abundance with respect to TSSs in human. (a) The genome-wide distribution of THP-1 small RNA 5’ends (red) and deepCAGE abundance (gray line) relative to transcription start sites (black bar and arrow, indicating the direction of transcription) shows an ~ 20 nt offset between peak densities, indicating that tiRNAs are not truncated 5’ capped transcripts. (b) The distribution of THP-1 small RNAs at 1 nt resolution with respect the most highly expressed deepCAGE tag from active promoters identified as either broad with peak (PB) or single peak (SP). These promoter types have single dominant transcription start sites (see text). The black bar and arrow indicate transcription start and the direction of transcription, respectively.

Nature Genetics: doi:10.1038/ng.312

Page 6: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

6

Supplementary Figure 5

3’ end small RNAs. (a) The size distribution of unannotated human THP-1 small RNAs from the 3’ end of annotated Refgenes. 3’ end associated small RNAs and tiRNAs are significantly different in size (P < 10-4; one tailed T-test). (b) The size distribution of chicken small RNAs from the most 3’ end of Refgenes. Chicken 3’ end small RNAs and tiRNAs are also significantly different in size (P < 10-4; one tailed T-test).

Nature Genetics: doi:10.1038/ng.312

Page 7: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

7

Supplementary Figure 6

The relationship between deepCAGE and tiRNA abundance. Human THP-1 tiRNA and deepCAGE abundance do not exhibit a linear relationship, suggesting that although tiRNAs are generally associated with highly expressed genes tiRNAs can also be associated with weakly expressed transcripts.

Nature Genetics: doi:10.1038/ng.312

Page 8: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

8

Supplementary Figure 7

Gene and tiRNA expression in 0-1h Drosophila embryo. Gene expression values from Arbeitman et al.2 0-1h embryos were compared with tiRNA abundance in either 0-1h replicate from Chung et al. (top two panels, GEO datasets GSM286604 and GSM286613), with the abundance of tiRNAs observed in both replicates (bottom left panel), or with the abundance of tiRNAs observed in either replicate (bottom right panel).

Nature Genetics: doi:10.1038/ng.312

Page 9: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

9

Supplementary Figure 8

Gene and tiRNA expression in 2-6h Drosophila embryo. Gene expression values from Arbeitman et al.2 2-6h embryos were compared with tiRNA abundance in either 2-6h replicate from Chung et al. (top two panels, GEO datasets GSM286605 and GSM286606), with the abundance of tiRNAs observed in both replicates (bottom left panel), or with the abundance of tiRNAs observed in either replicate (bottom right panel).

Nature Genetics: doi:10.1038/ng.312

Page 10: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

10

Supplementary Figure 9

Gene and tiRNA expression in 6-10h Drosophila embryo. Gene expression values from Arbeitman et al.2 6-10h embryos were compared with tiRNA abundance in either 6-10h replicate from Chung et al. (top two panels, GEO datasets GSM286607 and GSM286611), with the abundance of tiRNAs observed in both replicates (bottom left panel), or with the abundance of tiRNAs observed in either replicate (bottom right panel).

Nature Genetics: doi:10.1038/ng.312

Page 11: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

11

Supplementary Figure 10

Gene and tiRNA expression relationships. The relative expression of Drosophila genes and their corresponding tiRNAs at three embryonic time points. Heatmap data are sorted by gene expression values from 0-1h embryo, from high (red) to low (green). TiRNA abundance values range from highly expressed (red) to absent (black). Log2 median ratio gene expression values range from +3.3 to -6.0. TiRNA abundance was normalized as a proportion of the total abundance of the unannotated small RNAs in each library (see Table 1). TiRNA abundance is generally low, ranging from 1 to 1,000 normalized counts. We observed no consistent relationship between relative gene expression and tiRNA abundance. Gene expression data were obtained from Arbeitman et al.2 and tiRNA abundance heatmaps were generated using Mayday3 (see Supplementary Methods).

Nature Genetics: doi:10.1038/ng.312

Page 12: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

12

Supplementary Figure 11

TiRNAs in mutant Drosophila and Argonaute immunoprecipitations. (a) The size distribution of tiRNAs in Dcr-2-/-, loqs-/-, and wild type Drosophila ovaries from Czech et al. 4. TiRNAs are not affected by loss of Dcr-2 but may be affected by the loss of loquacious, although the absence of the characteristic 18 nt tiRNA peak in the loqs-/- library is likely due to alterations in library preparation. (b) The relative proportion and size of small RNAs which map to -60 to +120 nt relative to Refgene TSSs derived from AGO1 and AGO2 IPs. TiRNAs are not observed in these libraries.

Nature Genetics: doi:10.1038/ng.312

Page 13: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

13

Supplementary Figure 12

Properties of tiRNAs from undifferentiated human THP-1 cells. The density distribution of small RNAs 5’ ends in undifferentiated THP-1 cells at (a) 10 nt and (b) 1nt resolution relative to TSSs. The black bar and arrow indicate the transcription start site and the direction of transcription, respectively. (c) The size distribution of tiRNAs in undifferentiated THP-1 cells. (d) Genes with tiRNAs in undifferentiated THP-1 cells (red) are more highly expressed than those without tiRNAs (gray). (e) The proportion of deepCAGE tag defined promoters (black), deepCAGE promoters with tiRNAs that are not associated with Refgenes (blue), and deepCAGE promoters with tiRNAs and associated with Refgenes (red) that are associated with regions of the genome showing H3K9-aceylation or PU.1, RNA PoII, or Sp1 binding in undifferentiated THP-1 cells.

Nature Genetics: doi:10.1038/ng.312

Page 14: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

14

Supplementary Methods

THP-1 small RNA deep sequencing

Cell culture and RNA extraction

THP-1 cells were cultured in RPMI, 10% FBS, penicillin/streptomycin, 10mM HEPES, 1mM

sodium pyruvate, 50µM 2-mercaptoethanol, and treated with 30ng/ml PMA (Sigma) to

differentiate them into macrophage-like cells. We prepared 5 short RNA libraries from

undifferentiated THP-1 cells from specific size fractions (11-22 nt, 22-32 nt, 32-42nt, 42-

52nt, and 52-82 nt) and from an additional 6 small RNA libraries (~15 nt to ~40 nt) from

THP-1 cells over a time-course of PMA differentiation (0, 2, 4, 12, 24, 96h).

Total RNA was extracted using the AGPC (acid–guanidinium-phenol-chloroform)

method, and all precipitations were done with ethanol, instead of isopropyl alcohol, in order

to ensure the recovery of short oligonucleotides. CTAB selective precipitation of long RNA 5,

was performed to separate long and short RNAs. Short RNAs (<75bp) were isolated from the

CTAB precipitation supernatant by precipitation with 2 volumes of ethanol. The RNA pellet

was resuspended in 7M GuCl and ethanol precipitated a second time.

Mixed short RNA library construction

Short RNAs derived from each time point were tagged with a 4nt tissue ID tag during the

adaptor ligation step. RNA-DNA hybrid oligonucleotide adaptor ligation was carried out

using 10µg total short RNA, 100µM of a 5’ adaptor containing an EcoRI recognition site

(Supplementary Table 2) and 100µM of a specific 3’ adaptor containing an EcoRI

recognition site and a 4 nt Tissue ID tag (Supplementary Table 2), with T4 RNA Ligase

(TaKaRa) for 16hrs at 15°C. The sample:adaptor:adaptor mixture ratio was 1µg short RNA:

100µM 5’adaptor 0.7µl : 100µM 3’adaptor 0.7µl. At the end of reaction, samples for each

mixed library were pooled, treated with 20mg/ml Proteinase K (15 mins, 37°C) and purified

by phenol/chloroform extraction and ethanol precipitated.

Purified short RNAs were separated from adaptor dimers on an 8% denaturing PAGE

gel. Short RNAs were excised and eluted from the gel in TEN elution buffer (10mM Tris·HCl

pH7.5, 1mM EDTA pH 7.5, 250mM NaCl) for ~16hrs at 4°C. Gel extracted short RNA tags

were filtered through MicroSpin Empty Columns (Amersham Biosciences) in TEN buffer

three times to remove any polyacrylamide contaminant and then purified by ethanol

precipitation.

Nature Genetics: doi:10.1038/ng.312

Page 15: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

15

cDNA synthesis was carried out on purified short RNAs by RT-PCR

(Supplementary Table 2) with M-MLV Reverse Transcriptase RNase H Minus, Point

Mutant (Promega). RT products were calibrated to determine the ratio of products derived

from individual time points in the libraries.

cDNAs derived from short RNA tags were amplified by PCR using adaptor-specific

primers (Supplementary Table 2). PCR was performed on 5 µl of template RT mixture,

with 1x buffer, 3 µl of DMSO, 12 µl of 2.5 mM dNTPs, 1.5 µl of 100uM Primer 1

(Supplementary Table 2), 1.5 µl of 100uM Primer 2 (Supplementary Table 2), 0.5 µl of

EX Taq polymerase (5 units/µl, TaKaRa) in a total volume of 50ul. After incubating at 94°C

for 1 min, ~12-14 cycles were performed for 30 sec at 94°C, 30 sec at 57°C, 1 min at 70°C;

followed by 5 min incubation at 70°C. PCR products were pooled, purified, ethanol

precipitated and resuspended in 40 µl of TE buffer. The PCR products were purified on a

12% polyacrylamide gel. The 60~80 bp fraction was cut out of the gel, eluted in 500 µl of

SAGE elution buffer (2.5mM Tris·HCl pH7.5 /1.25mM ammonium acetate /0.17mM EDTA

pH 7.5) for 16hrs at room temperature. Gel extracted short RNA tags were filtered twice

through with MicroSpin Empty Columns by centrifugation at 3000rpm for 2 min in SAGE

buffer (2.5mM Tris·HCl pH7.5, 1.25mM ammonium acetate, 0.17mM EDTA pH 7.5),

purified by ethanol precipitation, and re-suspended in 25 µl of 0.1x TE buffer and quantified

with Picogreen.

PCR-amplified, gel-purified short RNA tags were re-amplified in a total volume of

100 µl containing 2ng of short RNA tags, 6 µl of DMSO, 12 µl of 2.5 mM dNTPs, 2 µl of

100uM Primer 1 (Supplementary Table 2), 2µl of 100uM Primer 2 (Supplementary Table

2), 0.8 µl of EX taq polymerase (5 units/µl, TaKaRa). After incubating at 94°C for 1 min, ~8-

9 cycles were performed at 30 sec at 94°C, 30 sec at 57°C, 1 min at 70°C followed by 5 min

at 70°C. The PCR products were pooled, purified, ethanol-precipitated and re-dissolved in 50

µl of TE buffer.

PCR products were further purified with G-50 micro-columns (GE Healthcare),

ethanol precipitated and resuspended in 100 µl of TE buffer. The concentration was measured

with Picogreen. PCR products were digested with EcoRI (Fermentas, 3µg/reaction), followed

by Proteinase K treatment (20mg/ml, 45C, 15 minutes).

DNA tags derived from short RNAs were separated from the free DNA ends derived

from the ligated adaptors (cut off during restriction) by incubation with streptavidin-coated

magnetic beads. The cleaved tags were mixed with the beads (700 µl) and incubated at room

Nature Genetics: doi:10.1038/ng.312

Page 16: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

16

temperature for 15 mins with mild agitation. The beads were rinsed with 50 µl of 1x BW

buffer (1M NaCl, 0.5mM EDTA, 5mM Tris-HCl (pH7.5)), extracted by phenol/chloroform

followed by ethanol precipitation and resuspension in 40µl of TE buffer, or purified through

Microcon YM10 columns with buffer exchange into 0.1x TE. Short RNA tags were further

purified on a 12% polyacrylamide gel. The desired fraction was cut out of the gel, crushed,

and eluted in SAGE buffer for 16hrs at room temperature, followed by purification,

concentration with YM10 columns, and ethanol precipitation. The DNA was finally

resuspended in 6 µl of 0.1x TE buffer and quantified with Picogreen.

The short RNA tags (total yield) and adaptors (1/20 quantity of short RNA tags) were

concatenated in a 10 µl reaction with T4 DNA ligase (NEB) for 16hrs at 15°C. Proteinase K

digestion was carried out by adding 70µl of TE buffer and 20mg/ml Proteinase K and

digesting at 45°C for 15 minutes. Concatenated tags were purified with GFX columns

(Amersham) to eliminate short concatamers (<100bp). The eluted sample (50ul) was

transferred for sequencing, and concatamerized tags derived from short RNAs were

sequenced using the Roche FLX Genome Sequencer6.

Preparation of the capped small RNA library with a modified oligo-capping protocol

Capped short RNAs were identified using an oligo-capping protocol, similar to what has been

previously described for the capture of capped 5’ ends of full-length mRNAs 7-9. Briefly, total

RNA was dephosphorylated, then decapped with Tobacco Acid Pyrophosphatase and

subsequently ligated to RNA/DNA linkers (as described above). To dephosphorylate small

RNAs we began with 3 µg of short RNAs in 30 µl total volume, which we heated for 5

minutes at 65°C and then chilled on ice. We added 6 µl of 10x Antarctic phosphatase reaction

buffer, 2 µl Cloned RNase inhibitor (40 U/µl TaKaRa), 6 µl Antarctic phosphatase (5 U/µl,

NEB), and 16 µl of water and incubated the solution at 37°C for 2 hours. The sample was

then phenol-chloroform extracted, ethanol precipitated, and dissolved in 42.9 µl of water. To

decap we heated the sample at 65°C for 5min, chilled it on ice, and then added 5 µl 10x TAP

buffer, 2 µl Cloned RNase Inhibitor, (40 U/µl, TaKaRa), 0.1 µl Tobacco acid

pyrophosphatase (150 U/µl, Nippon Gene). The solution was incubated at 37°C for 1 hour.

Next, cDNA was prepared by cleavage of the linkers and purification of the insert by

electrophoresis and concatenation, as described in the Mixed short RNA library construction,

above. Finally, the sample was subjected to Roche FLX Genome Sequencing.

Nature Genetics: doi:10.1038/ng.312

Page 17: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

17

Short RNA library sequencing and tag extraction

We used in-house algorithms for linker masking and the extraction of short RNA tags. Short

RNA tags were extracted with the following parameters: EcoRI ligated doublet linker (12-

16bp) masking: maximum mismatch, 2 bp allowed; short RNA tag length, no limits.

General bioinformatics

Bioinformatic analysis was done on a high performance computing station which houses a

local mirror of the UCSC Genome Browser10. All small RNA datasets were mapped using

Vmatch (http://www.vmatch.de/). We required small RNAs to map uniquely to the genome

of interest without any mismatches. The resulting set was further filtered to remove any small

RNAs that intersected with repeat masker annotations, random chromosomes, the

mitochondrial genome, miRNA and snoRNA loci, unannotated genomic sequences with high

homology to tRNAs or rRNAs, and assembly gaps. Filtered small RNA datasets are hereafter

referred to as “unannotated small RNAs”. Genomic features and annotations, unless

otherwise noted, were obtained through the local UCSC mirror. Transfer RNA and rRNA

sequences not annotated by Repeat masker were identified by BLAST homology searches

(requiring 95% sequence identity) against rRNAs and tRNAs identified in Genbank from the

species of interest. Intersections between features (e.g. small RNAs and repeats) required a

minimum of 1 base of overlap, and were accomplished using a modified version of UCSC’s

back end tool, bedIntersect.

Bootstrap analysis

A perl script executing a bootstrap analysis was used to estimate the likelihood of small

RNAs overlaping deepCAGE promoters (for THP-1 small RNAs) or a Refgene TSSs (for

human, chicken and Drosophila small RNAs). For these analyses small RNAs and promoters

were collapsed down to individual loci using UCSC’s featureBits tool, eliminating the

possibility that multiple small RNAs and promoters mapping to the same region could

artificially inflate the results. Small RNAs were randomly assigned new chromosomal

locations, and the number intersecting with promoters or Refgene TSSs was tabulated. This

process was repeated for 105 iterations. Fold enrichment was determined by dividing the

number of observed overlaps by the average number of overlaps in all iterations.

Nature Genetics: doi:10.1038/ng.312

Page 18: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

18

THP-1 small RNA mapping and analysis

THP-1 small RNA tags were mapped to human genome (UCSC hg18, NCBI Build 36.1) and

pooled across time points to increase the effective depth of the analysis, consistent with the

analysis of promoters identified by deepCAGE (see below). Intersections with genomic

features (e.g. known small RNA loci, repeats) were performed as described above. We

obtained a total of ~10 million human small RNA reads, which collapsed down to 46,076

uniquely mapping sequences with a total abundance of ~2 million reads. We found a total of

23,628 unannotated small RNAs with a total abundance of 345,753 reads (~7 counts per

unannotated small RNA per million mapped tags). We found 2312 tiRNAs with a total

abundance of 3702 (~0.8 counts per tiRNA per million mapped tags). To estimate the number

of tiRNAs per cell we calculated ratio of mir-15a abundance (~20,000 counts) to average

tiRNA abundance (~ 0.8 counts) and compared this ratio with the number of mir-15a copies

per cell (~4000)11. We generated 7,518 control and 8,374 cap-trapped THP-1 small RNA

sequencing reads, and found a total abundance of 6 and 8 tiRNAs, respctively. Small RNA

distributions with respect to the TSS (both sense and antisense) were calculated by tabulating

the number of small RNA 5’ ends in 1 nt or 10 nt windows – e.g. the number of small RNA

5’ ends that map to bases 0 to +10 relative to the transcription start (either a Refgene

annotated TSS or the most abundant deepCAGE tag in a clustered promoter). Because some

TSSs map close to one another, a small RNA can be counted in more than one bin. However,

we found that this occurred for less than 15% of small RNAs and did not substantially affect

the results.

To ensure that sequence composition biases at promoters were not affecting small

RNA mapping we examined all promoter regions (-60 to +120 nts relative to the most highly

expressed CAGE tag) with evidence of tiRNAs and created an index of all unique Nmers (14

-23 nts) in the human genome. We found that unique 18mer Nmers are not overrepresented at

these at promoters. We then analyzed the number of unique small RNA mappings at these

regions and compared them with the expected number of mappings, based on the unique

Nmer index. We found fewer small RNAs of every size class (except 14mers, which are the

most weakly represented), with respect to 18mers, than we would expect by chance. We also

examined multi-mapping tags to assess if restricting our analyses to uniquely mapping tags

was biasing our results. We found ~5x more 14mers and ~1.5x more 15mer multi-mapping

than uniquely mapping tags. However, we found fewer multi-mapping than uniquely

mapping tags in all other size classes. The inclusion of multi-mapping tags increases the

Nature Genetics: doi:10.1038/ng.312

Page 19: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

19

number of 14 and 15mer tags in the tiRNA dataset (as expected by random chance), but these

are still more than two fold less abundant than 18mer tags.

Evofold, phastCons, and CpG island loci were obtained from the local mirror of the

UCSC Genome Browser. Intersections between tiRNAs and these genomic features were

performed using a modified version of UCSC’s bedIntersect. Sequence analysis was

performed using python scrips and Unix tools. Refgene 3’ end associated small RNAs were

identified by dividing all Refgenes into deciles (to normalize for size), and extracting small

RNAs which mapped to the same strand and in the 3’ 10% of any Refgene. A one-tailed T-

test was used to test if size distributions were different between tiRNAs and 3’ end small

RNAs.

Analysis of THP-1 promoters

DeepCAGE8,9,12,13 was performed in triplicate at five time points (1, 4, 12, 24, and 96 hours)

during THP-1 differentiation in response to phorbol 12-myristate 13-acetate (PMA)

stimulation. DeepCAGE tags were mapped to the human genome (UCSC hg18, NCBI Build

36.1) by aligning perfectly matching tags first, then those tags that map with a single base

pair substitution and finally tags which contain a single insertion or deletion. A filter was

applied to remove rRNA-derived tags. For tags that map to multiple locations a probabilistic

model, previously described by Faulkner et al.14, was used to assign weights to each of the

possible genomic mappings. To identify promoters we first normalized the CAGE data from

each sample by scaling CAGE tag counts such that the distribution of the number of tags per

position matches a common reference (power-law) distribution. We used technical replicates

to estimate experimental noise. We found that the noise distribution is well described by a

convolution of multiplicative noise and Poisson sampling noise. Using this noise model, a

Bayesian procedure was used to calculate, for each consecutive pair of TSSs, the probability

that both TSSs were expressed in a fixed relative proportion across all samples. Neighbouring

TSSs with a high probability of expression in a constant proportion were then hierarchically

joined into clusters. Promoters were defined as significantly expressed clusters, i.e. those that

have at least 1 tag in at least 2 samples and whose maximum expression across all samples is

at least 10 tags per million. All other TSS clusters were discarded.

DeepCAGE tags were clustered into a total of ~18,000 high confidence active

promoters. These promoters contain ~20% (~250,000) of all mapped deepCAGE tags. On

average these active promoters spanned 33 nt and were composed of 16 tags, with a mean tag

abundance of 2 counts per million (cpm) sequenced tags. Promoters that mapped to repeat

Nature Genetics: doi:10.1038/ng.312

Page 20: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

20

masker annotations, random chromosomes, assembly gaps, the mitochondrial genome, or

annotated small RNAs were removed from the analysis. The remaining 14,818 promoters

were used for all subsequent analysis. Less than 0.07% of promoters overlap any annotated

small RNA loci (including miRNAs and snoRNAs), indicating that the CAGE libraries are

not contaminated with small RNAs. Promoter architecture was assessed using a python script

incorporating previously published criteria8. Promoters with less than 10 total tags were

excluded from promoter architecture analysis. Using previously reported promoter

architecture definitions we found that the promoters used in all tiRNA analyses were

predominantly broad with peak (PB, 46.1%), followed by generally broad (BR, 34.4%),

single peak (SP, 14.4%), and multimodal (MU, 5.1%)8.

THP-1 gene expression analysis

Refgene annotations were obtained from the local mirror of the UCSC Genome Browser. A

deepCAGE promoter cluster mapping within -300 to +100 nt relative to an annotated

Refgene TSSs was defined as Refgene associated. Correspondingly, these genes were

identified as 'present' by deepCAGE. The most highly expressed deepCAGE tags from

promoters mapping within Refgene promoter regions are tightly associated with annotated

TSSs. Nearly one third map to the first nucleotide of an annotated Refgene TSS, and nearly

two thirds map within 50 nt of the annotated Refgene TSS. A two-tailed T-test was used to

test if deepCAGE expression levels between promoters with and without tiRNAs were

different.

To determine relative expression levels by microarray we queried THP-1 RNA

samples identical to those used for deepCAGE libraries (derived undifferentiated THP-1

cells, and at 1, 4, 12, 24, and 96 hours after macrophage differentiation in response to PMA).

RNA was purified for expression analysis by Qiagen RNeasy columns, Takara FastPure RNA

Kit or by TRIzol. RNA quality was analyzed by Nanodrop and Bioanalyser. RNA (500 ng)

was amplified using the Illumina TotalPrep RNA Amplification Kit, according to

manufacturer’s instructions. cRNA was hybridized to the Illumina Human Sentrix-6 bead

chips Ver.2, according to standard Illumina protocols (http://www.illumina.com). Chip scans

were processed using Illumina BeadScan and BeadStudio software packages and summarized

data was generated in BeadStudio (version 3.1). Quantile normalization of Illumina data and

B-statistic calculations were carried out using the lumi and limma packages of Bioconductor

in the R statistical language 15-17. Refgenes associated with tiRNA promoters were identified,

and refSeq mRNA accession numbers were retrieved and mapped to the Human Illumina V2

Nature Genetics: doi:10.1038/ng.312

Page 21: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

21

probe centric "genome" in Genespring v7.3.1. Quantile normalized data generated from PMA

treated THP-1 biological replicates were used to examine expression levels. A chi-squared

test was used to determine statistical significance.

THP-1 promoter ChIP-chip analysis

THP-1 cells were cross-linked with 1% formaldehyde for 10 min, and 125mM glycine in

PBS was added. Cross-linked cells were collected by centrifugation and washed twice in cold

1 x PBS. The cells were sonicated for 5~7 min with a Branson 450 Sonicator to shear the

chromatin. Complexes containing DNA bound to histone H3 acetylated at lysine 9 (H3K9Ac)

were immunoprecipitated with an antibody against H3K9Ac (07-352, Upstate) by overnight

rotation at 4°C. The immunoprecipitated sample was incubated with magnetic beads/Protein

G (Dynal) for 1 hr at 4°C followed by one wash with each of (1) Low salt wash buffer (0.1%

SDS, 1% Triton X-100, 2mM EDTA, 20mM Tris.HCl (pH8.1), 150mM NaCl), (2) High salt

wash buffer (0.1% SDS, 1% Triton X-100, 2mM EDTA, 20mM Tris.HCl (pH 8.1), 500mM

NaCl) and (3) LiCl wash buffer (10mM Tris.HCl (pH8.1), 0.25M LiCl, 0.5% NP-40, 0.5%

Sodium deoxycholate, 1mM EDTA, and two washes with TE buffer). The antibody-

H3K9Ac-DNA complexes were eluted from the magnetic beads by addition of 1% SDS and

100 mM NaHCO3. Beads were vortexed for 60 min at RT. The supernatants were incubated

for 3.5 hr at 65°C to reverse the cross-links, and incubated for further 30 min at 65°C in the

presence of 20mg/ml RNaseA. To purify the DNA, proteinase K solution was added at a final

concentration of 100mg/ml, and the samples were incubated overnight at 45°C, followed by a

phenol:chloroform:isoamyl alcohol extraction and ethanol precipitation to recover the DNA.

PU.1, Sp1 and RNA Polymerase II (PolII) DNA complexes were likewise

immunoprecipitated using antibodies T-21 (Santa-cruz), 07-645 (Upstate), and 8WG16

(Abcam), for PU.1, Sp1 and PolII, respectively.

Immunoprecipitated DNA was blunted using 0.25U/µl T4 DNA polymerase (Nippon

Gene). Linker oligonucleotides (5’-accgcgcgtaatacgactcactataggg-3’ and Phosphate-5’-

ccctatagtgagtcgtattaca-3’) were annealed to the DNA while the temperature was decreased

gradually from 99°C to 15°C over 90 min. The blunted immunoprecipitated DNA sample

was ligated to the annealed oligonucleotides with 500U of T4 DNA ligase (Nippon Gene).

The cassette DNA fragments (45ug/reaction) were amplified with Blend Taq Plus (Toyobo)

using the linker-specific oligonucleotide 5’-accgcgcgtaatacgactcactataggg-3’. PCR cycling

conditions were as follows: denaturation at 95°C for 1 min; 25 cycles of 95°C for 30 s, 55°C

for 30 s, 72°C for 2 min; and a final extension at 72°C for 7 min. Amplified DNA was

Nature Genetics: doi:10.1038/ng.312

Page 22: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

22

purified, fragmented with DNase I (Epicentre), and end-labeled with biotin-ddATP using

terminal deoxytransferase (Roche). Amplified DNA was hybridized to Affymetrix whole

genome tiling or promoter arrays for 18 h at 45°C, washed, and scanned using the Affymetrix

GeneChip System. Each sample was hybridized in triplicate. Affymetrix Human Tiling

Arrays (1.0) were used to measure H3K9Ac enrichment. PU.1 and Sp1 enrichment were

measured using Affymetrix Human Promoter arrays (1.0R). Three technical replicates were

performed for ChIP-chip experiments of H3K9, Sp1 and PU.1, and two technical replicates

for those of PolII.

RNA Polymerase II-immunoprecipitated DNA was treated with CIP and poly-dT

tailed using terminal transferase. The T7 poly-A primer (5’-CATTAGCGGCCGCGAAATT

AATACGACTCACTATAGGGAGAAAAAAAAAAAAAAAAAA [C or T or G] -3’) was

annealed and the DNA sample was subjected to second strand synthesis using DNA

polymerase I (Invitrogen) as follows; 94°C for 2min, ramp down to 35°C (1°C/sec), hold at

35°C for 2 min, ramp down to 25°C (0.5°C/sec), hold and add DNA polymerase I at 37°C for

90 min. After second strand synthesis, the reaction was terminated by EDTA addition and the

DNA was column-purified. DNA was amplified by in vitro transcription (IVT) using CUGA

T7-RNA polymerase (Nippon gene). RNA obtained from poly-dT-tailed DNA was purified

using the RNeasy Mini kit (Qiagen) and used to synthesize (cDNA) with SuperScriptII

(Invitrogen) and random primers. The DNA T7-polyA primer was annealed to the first strand

DNA to synthesize second strand DNA. The second strand DNA was amplified in a second

round of IVT, performed as described above. The amplified RNA (cRNA) was also purified

in the IVT amplification. The collected cRNA was used to synthesize double-strand cDNA.

The double-stranded cDNA, fragmented with DNase I (Epicentre), was end-labelled with

biotin-ddATP by using terminal deoxytransferase (Roche). After hybridizing the end-labelled

DNA fragments to the tiling arrays (Affymetrix Human Tiling Array 2.0R) for 18 h at 45°C,

the arrays were washed and scanned using the Affymetrix GeneChip System. Each of the

treatment and control samples was hybridized twice, to provide technical replicates.

The enrichment of DNA fragments immunoprecipitated with H3K9Ac compared to

the human genome was determined using the Affymetrix whole-genome tiling array (1.0R).

This array tiles the non-repetitive portion of the human genome at 35-bp intervals with more

than 41 M pairs of 25-mer probe sequences. The hybridization intensities (background-

subtracted intensity; PM – MM, where PM and MM indicate intensities detected by a 25-mer

perfectly matching and another one-base-mismatching the genome, respectively) of the

probes were measured in three technical replicates and quantile-normalized for each of the

Nature Genetics: doi:10.1038/ng.312

Page 23: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

23

treatment and control samples. A shift of the intensities in the treatment relative to control

data in a 400-bp window centered at each probe was evaluated by a Wilcoxon Rank Sum test,

which assigned a P-value to the probe position. We used the Affymetrix software, GTAS

(http://www.affymetrix.com/support/developer/downloads/TilingArrayTools) for the P-value

calculation. Enrichment of DNA fragments precipitated with RNA PolII compared to the

human genome was measured by using Affymetrix Human tiling array (2.0R). This array

tiles the same portion of human genome as 1.0R with only PM probes. Two technical

replicates were performed for both treatment and control samples in measurement of the PolII

enrichment, and the enrichment measure, P-value was calculated by using GTAS as

described for H3K9Ac.

Enrichment of PU.1 and Sp1-precipitated DNA was measured using the Affymetrix

Human Promoter arrays that tile promoter regions (7.5 kb upstream and 2.45 kb downstream

of transcription start sites) of annotated genes at 35-bp intervals with 25-mer probes.

Hybridization intensities were measured in three technical replicates for each of the treatment

and control samples. The enrichment measure expressed as a P-value was calculated by using

GTAS as described above.

The genome coordinates of the 25-mer probes, originally based on the version hg16

of human genome, were converted to hg18. The positions of the probes on hg18 were

determined by aligning the probe sequences to the human genome (hg18) using Vmatch

(http://www.vmatch.de).

ChIP-chip data were analysed such that a base must be bound to the protein or marker

of interest in both replicates in both undifferentiated cells and cells after 96h of exposure to

PMA at statistically significant levels. Undifferentiated and 96h ChIP-chip data were pooled

and clustered such that any 'present' base must have at least one other 'present' base within 35

nt. Intersections between ChIP-chip features, deepCAGE, and tiRNAs were completed using

a modified version of UCSC’s bedIntersect.

Undifferentiated THP-1 small RNA analysis

To ensure that pooling the deepCAGE and small RNA deep sequencing data across time

points was not distorting our results we restricted our analysis to small RNAs from

undifferentiated THP-1 cells (i.e. 0h). Using deepCAGE tags detected in at least two

replicates at 0h, we found that all trends observed for the pooled dataset are recapitulated at

0h, although overall less robustly. We found 156 small RNAs >200 fold enriched at 240

active promoters, which map to regions -60 to +120 nt relative to the TSS, and exhibit high

Nature Genetics: doi:10.1038/ng.312

Page 24: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

24

density 10 nt or further downstream (Supplementary Fig. 12a,b,c online). The vast majority

of these tiRNAs and their associated promoters map to Refgene TSSs (79% and 83%

respectively), which are highly expressed (Supplementary Fig. 12d online) and are enriched

for Sp1 and RNA PolII binding (Supplementary Fig. 12e online). 0h tiRNAs are dominantly

18nt and have no intersection with Evofold predictions. Only one third intersect with a

phastCons element. Consistent with tiRNAs from the pooled dataset we found that 0h

tiRNAs were ~72% GC.

Chicken small RNA analysis

Approximately 3 million sequences from embryonic chicken small RNA libraries made from

embryos collected at day 5, day 7 and day 9 of incubation (hereafter referred to as CE5, CE7

and CE9) were obtained from Glasov et al., GEO Series ID GSE1068618. Tags were mapped

to UCSC genome build galGal3 (v2.1 draft assembly, Genome Sequencing Center,

Washington University School of Medicine). We obtained a total of 130,588 uniquely

mapping sequences (69,011, 39,964, 21,613 from CE5, CE7, and CE9 respectively) with a

total abundance of 3,559,917 reads (1,192,303, 1,193,318, and 1,174,296 reads from CE5,

CE7, and CE9 respectively). We found 115,271 unannotated small RNAs (53,694, 39,964,

and 21,613 from CE5, CE7, and CE9 respectively) with a total abundance of 484,124 reads

(210,811, 185,168, and 88,145 reads from CE5, CE7, and CE9 respectively), or ~1.2 counts

per unannotated small RNA per million mapped tags. We identified a total of 1628 tiRNAs

(485, 822, 321 from CE5, CE7, and CE9 respectively) with a total abundance of 1769 counts

(512, 917, 340 reads from CE5, CE7, and CE9 respectively), or ~0.3 counts per tiRNA per

million mapped tags. Refgene, phastCons, and CpG island coordinates were obtained directly

through the UCSC Genome Browser mirror. Known small RNA loci were compiled from

miRBase (v 10.0), and sequence homology searches with known mammalian snoRNAs19.

Refgene TSSs coordinates were extracted from the UCSC Genome Browser.

Bootstrap enrichment was preformed as described above. Small RNA distributions with

respect to the TSS (both sense and antisense) were calculated as described above. Due to the

paucity of Refgene annotations in the Gallus gallus genome, and therefore the limited

number of TSSs used in this analysis, small RNAs mapping to more than one window was

observed in less than 2% of cases. A one-tailed T-test was used to asses the significance of

difference between tiRNAs and 3’ end small RNA sizes.

Nature Genetics: doi:10.1038/ng.312

Page 25: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

25

Drosophila small RNA and gene expression analysis

Drosophila melanogaster deep sequencing libraries were obtained through NCBI GEO.

Libraries GSE74481, GSE1162420, and GSE110864 were mapped to the Drosophila genome

(UCSC dm3, BDGP Release 5) as described above. Acquisition of genomic features and

removal of small tags that mapped to small RNAs, repeats, etc. was accomplished as

described above. GSE7448 showed 111,017 uniquely mapping tags with a total abundance of

358,893, and 78,276 unannotated small RNAs with a total abundance of 123,183 or ~4

counts per unannotated small RNA per million mapped tags. We found 1972 tiRNAs in

GSE7448, with a total abundance of 3060, or ~4 counts per tiRNA per million mapped tags.

GSE11624 showed 1,055,295 uniquely mapping tags with a total abundance of 5,650,248,

and 664,962 unannotated small RNAs with a total abundance of 1,644,447 or ~0.4 counts per

unannotated small RNA per million mapped tags. We found 29,722 tiRNAs with a total

abundance of 52,941, or ~0.3 counts per tiRNA per million mapped tags. Bootstrap

enrichment was preformed as described above. Small RNA distributions with respect to the

TSS (both sense and antisense) were calculated as described above. Small RNAs mapping to

multiple windows was observed in less than 10% of cases.

To examine the relationship between tiRNA abundance and gene expression we

obtained median normalized gene expression values for all genes detected at all life cycle

time points (3,318 Flybase genes) in the Arbeitman et al.2 dataset from our in-house UCSC

mirror (UCSC Genome Browser table hgFixed.arbFlyLifeMedianRatio). We created a

MySQL relational database of gene expression values and tiRNA abundance per gene. The

significance of differences in gene expression levels of genes with and without tiRNAs at

different embryonic time points was assessed by a Welch Two Sample t-test. Gene

expression and tiRNA abundance heatmaps were generated using Mayday3.TiRNA

abundance was normalized as a proportion of the total abundance of the unannotated small

RNAs in each library (see Table 1).

Gene Ontology analysis

Gene Ontology enrichment for human and Drosophila genes was assessed using a local

installation of GeneMerge21, which utilizes a hypergeometric test and a Bonferroni correction

to asses statistical significance. Gene Ontology enrichment for genes with tiRNAs and

present in the Arebeitman et al. gene expression data2 was done against a background of all

Arbeitman et al. Flybase genes. All other enrichments were done against all human or

Drosophila Refgenes.

Nature Genetics: doi:10.1038/ng.312

Page 26: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

26

References 1. Ruby, J.G. et al. Evolution, biogenesis, expression, and target predictions of a

substantially expanded set of Drosophila microRNAs. Genome Res. 17, 1850-64

(2007).

2. Arbeitman, M.N. et al. Gene expression during the life cycle of Drosophila

melanogaster. Science 297, 2270-5 (2002).

3. Dietzsch, J., Gehlenborg, N. & Nieselt, K. Mayday--a microarray data analysis

workbench. Bioinformatics 22, 1010-2 (2006).

4. Czech, B. et al. An endogenous small interfering RNA pathway in Drosophila. Nature

453, 798-802 (2008).

5. Lagonigro, M.S. et al. CTAB-urea method purifies RNA from melanin for cDNA

microarray analysis. Pigment Cell Res. 17, 312-5 (2004).

6. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre

reactors. Nature 437, 376-80 (2005).

7. Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP

trapper. Genomics 37, 327-36 (1996).

8. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and

evolution. Nat. Genet. 38, 626-35 (2006).

9. Fromont-Racine, M., Bertrand, E., Pictet, R. & Grange, T. A highly sensitive method

for mapping the 5' termini of mRNAs. Nucl. Acids Res. 21, 1683-4 (1993).

10. Karolchik, D. et al. The UCSC Genome Browser Database: 2008 update. Nucl. Acids

Res. 36, D773-779 (2008).

11. Eis, P.S. et al. Accumulation of miR-155 and BIC RNA in human B cell lymphomas.

Proc. Natl. Acad. Sci. U S A 102, 3627-32 (2005).

12. Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of

transcriptional starting point and identification of promoter usage. Proc. Natl. Acad.

Sci. U S A 100, 15776-81 (2003).

13. de Hoon, M. & Hayashizaki, Y. Deep cap analysis gene expression (CAGE): genome-

wide identification of promoters, quantification of their expression, and network

inference. Biotechniques 44, 627-8, 630, 632 (2008).

14. Faulkner, G.J. et al. A rescue strategy for multimapping short sequence tags refines

surveys of transcriptional activity by CAGE. Genomics 91, 281-8 (2008).

Nature Genetics: doi:10.1038/ng.312

Page 27: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

27

15. Lin, S.M., Du, P., Huber, W. & Kibbe, W.A. Model-based variance-stabilizing

transformation for Illumina microarray data. Nucl. Acids Res. 36, e11 (2008).

16. Smyth, G.K. Linear models and empirical bayes methods for assessing differential

expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, Article3

(2004).

17. Smyth, G.K., Yang, Y.H. & Speed, T. Statistical issues in cDNA microarray data

analysis. Methods Mol. Biol. 224, 111-36 (2003).

18. Glazov, E.A. et al. A microRNA catalog of the developing chicken embryo identified

by a deep sequencing approach. Genome Res. 18, 957-64 (2008).

19. Griffiths-Jones, S., Saini, H.K., van Dongen, S. & Enright, A.J. miRBase: tools for

microRNA genomics. Nucl. Acids Res. 36, D154-8 (2008).

20. Chung, W.J., Okamura, K., Martin, R. & Lai, E.C. Endogenous RNA interference

provides a somatic defense against Drosophila transposons. Curr. Biol. 18, 795-802

(2008).

21. Castillo-Davis, C.I. & Hartl, D.L. GeneMerge--post-genomic analysis, data mining,

and hypothesis testing. Bioinformatics 19, 891-2 (2003).

Nature Genetics: doi:10.1038/ng.312

Page 28: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Biological Process Molecular Function Cellular CompartmentGO Term Fold pValue Description GO Term Fold pValue Description GO Term Fold pValue Description

Human refGenesGO:0006412 3.23 6.87E-10 translation GO:0003735 3.40 7.77E-09 structural constituent of ribosome GO:0005842 5.31 2.59E-05 cytosolic large ribosomal subunit

GO:0004842 2.83 3.56E-03 ubiquitin-protein ligase activity GO:0005843 4.50 9.05E-03 cytosolic small ribosomal subunit GO:0005515 1.33 1.49E-05 protein binding GO:0005840 3.76 3.76E-08 ribosome

GO:0005737 1.57 4.70E-06 cytoplasmGO:0005634 1.36 1.89E-07 nucleus

(GSM286604 & GSM286613)0 -1h Embryo : expressed > 0.5 median ratioGO:0006413 4.21 1.463E-03 translational initiation GO:0003743 4.21 7.049E-04 translation initiation factor activity GO:0005842 4.66 8.558E-05 cytosolic large ribosomal subunit GO:0006412 3.38 2.754E-07 translation GO:0003735 3.23 5.638E-05 structural constituent of ribosome GO:0005829 3.42 2.959E-03 cytosol

0 -1h Embryo : expressed < 0.5 median ratioNA NA GO:0016459 9.29 6.720E-04 myosin complex

(GSM286605 & GSM286606)2 -6h Embryo : expressed > 0.5 median ratioGO:0006412 3.70 1.863E-05 translation GO:0003735 4.03 1.545E-05 structural constituent of ribosome GO:0005843 7.52 5.393E-04 cytosolic small ribosomal subunit

GO:0003676 2.31 9.321E-06 nucleic acid binding GO:0005842 5.65 1.855E-04 cytosolic large ribosomal subunit 2 -6h Embryo : expressed < 0.5 median ratioGO:0035152 15.61 8.538E-03 regulation of tracheal tube architecture NA NA

(GSM286607 & GSM286611)6 -10h Embryo : expressed > 0.5 median ratioGO:0016360 9.76 5.457E-03 sensory organ precursor cell fate determination GO:0003735 4.47 3.735E-08 structural constituent of ribosome GO:0005842 6.68 2.003E-07 cytosolic large ribosomal subunit GO:0007424 4.09 2.021E-03 tracheal system development (sensu Insecta) GO:0003700 2.54 7.826E-04 transcription factor activity GO:0005634 2.09 2.355E-08 nucleusGO:0006412 3.68 4.142E-06 translation GO:0003676 2.29 2.628E-06 nucleic acid binding

6 - 10h Embryo : expressed < 0.5 median ratioNA NA NA

Genes with tiRNAs in either replicate at all embryonic time pointsGO:0015992 13.66 2.821E-05 proton transport GO:0008553 14.64 5.149E-08 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0000276 19.52 5.064E-04 proton-transporting ATP synthase complex, coupling factor F(o) GO:0015986 9.76 2.101E-05 ATP synthesis coupled proton transport GO:0046961 9.76 8.702E-06 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005843 11.71 2.231E-04 cytosolic small ribosomal subunit GO:0006412 5.44 1.278E-06 translation GO:0046933 9.76 8.702E-06 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005842 9.25 8.901E-06 cytosolic large ribosomal subunit

GO:0003735 6.91 7.509E-09 structural constituent of ribosomeGO:0003676 2.63 2.509E-04 nucleic acid binding

GSM286604: 0 -1h EmbryoGO:0015992 5.45 1.711E-09 proton transport GO:0004129 5.21 2.188E-03 cytochrome-c oxidase activity GO:0005842 6.73 1.253E-21 cytosolic large ribosomal subunit GO:0006099 3.44 2.545E-03 tricarboxylic acid cycle GO:0008553 5.05 1.110E-10 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005843 6.44 1.339E-14 cytosolic small ribosomal subunit GO:0015986 3.40 5.935E-05 ATP synthesis coupled proton transport GO:0003735 3.78 7.326E-23 structural constituent of ribosome GO:0005747 5.28 1.094E-10 respiratory chain complex I GO:0006412 1.94 1.758E-06 translation GO:0046961 3.30 1.268E-04 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005751 5.24 2.713E-03 respiratory chain complex IV GO:0006118 1.88 7.402E-03 electron transport GO:0046933 3.30 1.268E-04 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005840 4.42 1.544E-10 ribosome

GO:0003779 2.32 4.374E-03 actin binding GO:0005759 3.67 2.632E-05 mitochondrial matrixGO:0003676 1.81 5.218E-08 nucleic acid binding GO:0005739 1.95 7.426E-03 mitochondrion

GO:0005737 1.64 3.013E-03 cytoplasm

GSM286613: 0 -1h Embryo (replicate)GO:0016477 6.31 8.006E-03 cell migration GO:0016538 6.66 6.152E-04 cyclin-dependent protein kinase regulator activity GO:0005830 11.84 8.873E-04 cytosolic ribosome GO:0001708 5.92 1.262E-03 cell fate specification GO:0003743 3.95 4.368E-05 translation initiation factor activity GO:0005842 6.26 1.568E-14 cytosolic large ribosomal subunit GO:0007431 5.15 6.173E-03 salivary gland development GO:0003735 3.91 6.685E-20 structural constituent of ribosome GO:0005843 5.92 1.633E-09 cytosolic small ribosomal subunit GO:0007409 4.31 6.990E-03 axonogenesis GO:0003676 2.07 2.062E-11 nucleic acid binding GO:0005840 4.19 2.492E-07 ribosomeGO:0006413 4.10 9.464E-05 translational initiation GO:0005515 1.84 8.727E-06 protein binding GO:0005737 1.77 5.726E-04 cytoplasmGO:0007411 3.29 1.084E-03 axon guidance GO:0005634 1.56 6.082E-06 nucleusGO:0007422 3.00 8.728E-03 peripheral nervous system developmentGO:0009993 2.55 3.437E-04 oogenesis (sensu Insecta)GO:0006412 2.01 5.656E-06 translation

GSM286605: 2 - 6h EmbryoGO:0007400 4.91 8.542E-04 neuroblast fate determination GO:0016538 5.88 3.905E-04 cyclin-dependent protein kinase regulator activity GO:0005843 5.17 1.367E-09 cytosolic small ribosomal subunit GO:0007419 4.37 1.647E-03 ventral cord development GO:0016564 4.15 3.038E-04 transcriptional repressor activity GO:0005751 4.98 4.444E-03 respiratory chain complex IV GO:0007456 4.16 1.771E-08 eye development (sensu Endopterygota) GO:0008553 3.13 5.578E-03 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005842 4.26 3.215E-08 cytosolic large ribosomal subunit GO:0007219 4.03 9.349E-04 Notch signaling pathway GO:0003735 3.05 5.258E-14 structural constituent of ribosome GO:0005840 3.18 1.054E-04 ribosomeGO:0015992 3.76 1.250E-03 proton transport GO:0003779 2.35 1.447E-03 actin binding GO:0005886 1.91 3.269E-04 plasma membraneGO:0007411 3.53 1.644E-06 axon guidance GO:0003700 1.88 9.340E-06 transcription factor activity GO:0005737 1.83 3.656E-06 cytoplasmGO:0007391 3.34 6.516E-06 dorsal closure GO:0003676 1.86 7.547E-10 nucleic acid binding GO:0005634 1.82 1.708E-16 nucleusGO:0006917 3.08 4.490E-03 induction of apoptosis GO:0005515 1.84 9.274E-08 protein binding

Drosophila library GSE11624 enrichments: Drosophila Refgenes with tiRNAs from Chung et al. were queried for Gene Ontology enrichments based on their library/tissue/developmental time point of origin. The results are generally consistent with tiRNAassociation with highly expressed genes and/or genes expected to be highly expressed in a particular library. For example, tiRNAs are heavily associated with developmental and CNS genes in 6- 10 h Drosophila embryos.

HUMAN: Human Refgenes with tiRNAs derived from THP-1 cells were queried for Gene Ontology enrichments. Results show enrichemnt for ribosom components and translation, consistent with tiRNA association with highly expressed transcripts

Drosophila Embryo Expression: We analyzed highly and weakly expressed genes (from Arbeitman et al.) with tiRNAs (from Chung et al.)for Gene Ontology enrichments across three developmental time points. Highly expressed genes with tiRNA are consistently associated with the ribosome and translational machinery. We observed no consistent enrichment for weakly expressed genes with tiRNAs.

Supplementary Table 1 Taft RJ et al - tiRNAs

28

Nature Genetics: doi:10.1038/ng.312

Page 29: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

GO:0007417 3.04 1.848E-03 central nervous system development GO:0005524 1.57 1.439E-04 ATP bindingGO:0007422 3.01 4.236E-04 peripheral nervous system development GO:0008270 1.57 7.721E-05 zinc ion bindingGO:0007424 2.83 8.847E-04 tracheal system development (sensu Insecta)GO:0009993 2.34 4.108E-04 oogenesis (sensu Insecta)GO:0007010 2.17 5.125E-04 cytoskeleton organization and biogenesisGO:0045449 2.14 4.089E-03 regulation of transcriptionGO:0007398 2.04 9.985E-04 ectoderm development

GSM286606: 2 - 6h Embryo (replicate)NA GO:0016566 18.86 6.144E-05 specific transcriptional repressor activity GO:0005843 9.20 3.903E-05 cytosolic small ribosomal subunit

GO:0016538 12.77 5.998E-03 cyclin-dependent GO:0005842 7.71 5.090E-05 cytosolic large ribosomal subunit GO:0003735 4.42 2.429E-06 structural constituent of ribosome GO:0005840 5.66 2.662E-03 ribosomeGO:0003700 2.90 9.637E-05 transcription factor activity GO:0005634 2.28 1.209E-07 nucleusGO:0003676 2.61 3.507E-06 nucleic acid binding

GSM286607: 6 - 10h EmbryoGO:0016318 6.80 7.371E-03 ommatidial rotation GO:0016566 9.15 1.184E-03 specific transcriptional repressor activity GO:0005842 7.06 1.030E-11 cytosolic large ribosomal subunit GO:0007219 6.80 3.099E-06 Notch signaling pathway GO:0003735 3.94 3.707E-13 structural constituent of ribosome GO:0005843 5.10 3.715E-04 cytosolic small ribosomal subunit GO:0001708 6.80 7.371E-03 cell fate specification GO:0003676 2.17 6.370E-09 nucleic acid binding GO:0005840 4.18 1.474E-04 ribosomeGO:0007422 6.12 3.947E-12 peripheral nervous system development GO:0005515 1.99 3.562E-05 protein binding GO:0005886 2.46 2.134E-05 plasma membraneGO:0007423 5.00 1.069E-04 sensory organ development GO:0005737 2.01 1.715E-04 cytoplasmGO:0007456 4.74 3.805E-05 eye development (sensu Endopterygota) GO:0005634 2.01 1.918E-12 nucleusGO:0007411 4.49 1.661E-05 axon guidanceGO:0007507 4.25 1.083E-03 heart developmentGO:0007417 4.18 6.064E-04 central nervous system developmentGO:0007391 3.58 5.509E-03 dorsal closureGO:0007424 3.48 4.249E-03 tracheal system development (sensu Insecta)GO:0007399 2.73 3.846E-04 nervous system developmentGO:0007398 2.54 5.689E-04 ectoderm developmentGO:0006412 1.96 3.545E-03 translation

GSM286611: 6 - 10h Embryo (replicate)GO:0046667 11.52 4.656E-03 retinal cell programmed cell death (sensu Endopterygota) GO:0016566 7.97 6.412E-05 specific transcriptional repressor activity GO:0005830 11.52 1.049E-03 cytosolic ribosome GO:0030723 9.87 2.611E-03 ovarian fusome organization and biogenesis GO:0016564 4.74 1.545E-04 transcriptional repressor activity GO:0005842 5.87 3.859E-13 cytosolic large ribosomal subunit GO:0008356 7.04 3.460E-05 asymmetric cell division GO:0008553 4.06 5.502E-05 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005843 5.47 2.867E-08 cytosolic small ribosomal subunit GO:0007219 6.25 6.008E-09 Notch signaling pathway GO:0003735 3.62 4.440E-17 structural constituent of ribosome GO:0005840 4.61 1.539E-09 ribosomeGO:0006414 5.76 6.117E-03 translational elongation GO:0046961 3.14 3.838E-03 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005886 2.59 3.927E-10 plasma membraneGO:0007400 5.51 9.765E-04 neuroblast fate determination GO:0046933 3.14 3.838E-03 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005737 2.08 2.654E-08 cytoplasmGO:0007419 5.35 1.545E-04 ventral cord development GO:0003779 2.43 5.277E-03 actin binding GO:0005634 1.87 8.912E-15 nucleusGO:0016481 5.12 8.392E-04 negative regulation of transcription GO:0005515 2.32 8.235E-15 protein bindingGO:0007431 5.01 8.914E-03 salivary gland development GO:0003676 2.10 1.606E-12 nucleic acid bindingGO:0001736 5.01 8.914E-03 establishment of planar polarity GO:0003700 2.10 4.277E-07 transcription factor activityGO:0007409 4.89 2.071E-04 axonogenesis GO:0005524 1.65 1.015E-04 ATP bindingGO:0030707 4.55 3.683E-05 ovarian follicle cell development (sensu Insecta) GO:0008270 1.51 9.295E-03 zinc ion bindingGO:0007411 4.48 2.423E-09 axon guidanceGO:0007422 4.45 1.170E-09 peripheral nervous system developmentGO:0015992 4.32 5.267E-04 proton transportGO:0008340 4.06 1.154E-04 determination of adult life spanGO:0007456 3.96 1.716E-05 eye development (sensu Endopterygota)GO:0008355 3.93 4.592E-03 olfactory learningGO:0007611 3.93 4.592E-03 learning andGO:0007417 3.72 6.102E-05 central nervous system developmentGO:0007424 3.61 3.543E-06 tracheal system development (sensu Insecta)GO:0015986 3.27 2.405E-03 ATP synthesis coupled proton transportGO:0007476 3.22 3.068E-03 wing morphogenesisGO:0016337 2.91 3.284E-03 cell-cell adhesionGO:0009993 2.56 2.352E-04 oogenesis (sensu Insecta)GO:0007399 2.53 2.105E-05 nervous system developmentGO:0007155 2.44 2.026E-05 cell adhesionGO:0007398 2.40 1.648E-05 ectoderm developmentGO:0007498 2.19 9.839E-03 mesoderm developmentGO:0006412 1.82 9.594E-04 translation

GSM240749: Female headsGO:0015992 7.89 3.944E-05 proton transport GO:0008553 6.19 3.032E-04 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005859 28.70 5.719E-03 muscle myosin complexGO:0015986 4.71 9.360E-03 ATP synthesis coupled proton transport GO:0046961 4.78 4.263E-03 hydrogen ion transporting ATPase activity, rotational mechanism GO:0000275 16.40 6.376E-03 proton-transporting ATP synthase complex, catalytic core F(1)GO:0006952 2.37 5.802E-04 defense response GO:0046933 4.78 4.263E-03 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005840 8.39 5.431E-11 ribosome

GO:0003735 4.29 8.673E-09 structural constituent of ribosome GO:0005842 7.04 3.059E-06 cytosolic large ribosomal subunit GO:0020037 3.78 2.622E-03 heme binding GO:0005843 6.46 9.996E-04 cytosolic small ribosomal subunit

GSM286601: Male headsNA GO:0004022 23.80 2.805E-03 alcohol dehydrogenase activity GO:0005576 2.74 5.931E-03 extracellular region

GSM272651: S2 and KC cellsGO:0045034 6.76 3.676E-04 neuroblast division GO:0003735 3.07 6.037E-09 structural constituent of ribosome GO:0005842 7.65 1.983E-18 cytosolic large ribosomal subunit GO:0001736 5.88 1.853E-03 establishment of planar polarity GO:0003779 2.64 2.908E-03 actin binding GO:0005840 4.37 7.972E-07 ribosomeGO:0007422 3.97 1.040E-05 peripheral nervous system development GO:0005515 2.01 4.549E-07 protein binding GO:0005783 2.88 2.905E-03 endoplasmic reticulumGO:0007391 3.56 3.404E-04 dorsal closure GO:0005737 2.11 3.196E-07 cytoplasmGO:0007476 3.38 5.340E-03 wing morphogenesis GO:0005634 1.80 2.803E-10 nucleusGO:0008360 3.22 1.898E-03 regulation of cell shapeGO:0000910 3.12 9.633E-03 cytokinesis

Taft RJ et al - tiRNAs

29

Nature Genetics: doi:10.1038/ng.312

Page 30: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

GO:0009993 2.83 5.305E-05 oogenesis (sensu Insecta)GO:0007010 2.27 8.715E-03 cytoskeleton organization and biogenesis

GSM272652: S2 cellsGO:0007430 5.50 4.057E-03 terminal branching of trachea, cytoplasmic projection extension (sensu Insecta) GO:0003735 2.90 4.226E-21 structural constituent of ribosome GO:0005842 4.67 4.393E-19 cytosolic large ribosomal subunit GO:0046843 4.02 1.430E-03 dorsal appendage formation GO:0003779 2.56 5.162E-09 actin binding GO:0005843 4.33 1.269E-11 cytosolic small ribosomal subunit GO:0007411 3.61 5.010E-13 axon guidance GO:0005525 2.19 9.133E-07 GTP binding GO:0005840 3.43 1.370E-10 ribosomeGO:0007298 3.56 8.010E-05 border follicle cell migration (sensu Insecta) GO:0003924 2.17 4.339E-05 GTPase activity GO:0005700 3.00 2.235E-03 polytene chromosomeGO:0007409 3.37 5.399E-04 axonogenesis GO:0000166 2.01 1.507E-04 nucleotide binding GO:0005783 2.50 4.307E-06 endoplasmic reticulumGO:0007422 3.22 9.004E-10 peripheral nervous system development GO:0004702 1.92 2.910E-03 receptor signaling protein serine GO:0005737 2.08 4.462E-18 cytoplasmGO:0008340 3.15 1.026E-05 determination of adult life span GO:0004674 1.87 8.431E-03 protein serine GO:0005622 1.79 1.309E-03 intracellularGO:0007391 3.01 4.413E-08 dorsal closure GO:0003676 1.78 1.488E-13 nucleic acid binding GO:0005886 1.79 1.378E-05 plasma membraneGO:0007015 2.79 1.229E-03 actin filament organization GO:0005515 1.73 4.859E-10 protein binding GO:0005634 1.65 1.255E-17 nucleusGO:0006897 2.75 9.997E-04 endocytosis GO:0005524 1.62 1.323E-09 ATP bindingGO:0008360 2.65 6.541E-06 regulation of cell shape GO:0008270 1.39 1.488E-03 zinc ion bindingGO:0007476 2.64 2.219E-04 wing morphogenesisGO:0009993 2.63 8.831E-12 oogenesis (sensu Insecta)GO:0007264 2.42 2.615E-04 small GTPase mediated signal transductionGO:0000910 2.38 1.965E-03 cytokinesisGO:0006457 2.13 6.587E-04 protein foldingGO:0007010 1.98 4.989E-05 cytoskeleton organization and biogenesisGO:0006468 1.84 4.525E-06 protein amino acid phosphorylationGO:0006886 1.82 4.118E-06 intracellular protein transportGO:0006412 1.55 1.649E-03 translation

GSM272653: KC cellsGO:0045034 5.03 4.180E-04 neuroblast division GO:0003735 2.77 4.026E-12 structural constituent of ribosome GO:0005853 8.38 6.131E-03 eukaryotic translation elongation factor 1 complexGO:0007298 4.06 2.603E-04 border follicle cell migration (sensu Insecta) GO:0003779 2.42 9.892E-05 actin binding GO:0005842 4.74 1.586E-12 cytosolic large ribosomal subunit GO:0008355 3.27 9.022E-03 olfactory learning GO:0000166 2.31 3.541E-05 nucleotide binding GO:0005843 3.77 4.744E-05 cytosolic small ribosomal subunit GO:0007611 3.27 9.022E-03 learning and GO:0005515 1.86 2.767E-09 protein binding GO:0005840 3.48 3.771E-07 ribosomeGO:0007411 3.26 4.580E-06 axon guidance GO:0005524 1.75 1.216E-09 ATP binding GO:0005783 2.50 6.997E-04 endoplasmic reticulumGO:0007015 3.12 2.984E-03 actin filament organization GO:0005200 1.75 4.910E-03 structural constituent of cytoskeleton GO:0005737 2.10 1.675E-12 cytoplasmGO:0007422 3.02 5.892E-05 peripheral nervous system development GO:0003676 1.63 6.660E-06 nucleic acid binding GO:0005886 2.00 4.395E-06 plasma membraneGO:0007391 2.98 8.109E-05 dorsal closure GO:0005634 1.47 4.740E-06 nucleusGO:0008360 2.79 2.202E-04 regulation of cell shapeGO:0007424 2.63 2.267E-03 tracheal system development (sensu Insecta)GO:0009993 2.52 1.454E-06 oogenesis (sensu Insecta)GO:0007010 2.41 2.257E-07 cytoskeleton organization and biogenesisGO:0000074 2.21 1.156E-03 regulation of progression through cell cycleGO:0007155 1.97 1.928E-03 cell adhesionGO:0006468 1.80 2.690E-03 protein amino acid phosphorylation

GSM275691: Imaginal discGO:0042049 8.35 5.998E-03 cell acyl-CoA homeostasis GO:0050809 9.74 5.649E-04 diazepam binding GO:0000221 5.56 4.401E-03 hydrogen ion transporting ATPase V1 domainGO:0015992 4.14 1.027E-04 proton transport GO:0004263 5.08 2.939E-04 chymotrypsin activity GO:0005842 5.14 2.493E-12 cytosolic large ribosomal subunit GO:0015986 3.78 5.431E-07 ATP synthesis coupled proton transport GO:0008553 4.96 1.534E-10 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005843 5.11 6.247E-09 cytosolic small ribosomal subunit

GO:0046961 3.84 2.214E-07 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005840 3.75 3.796E-07 ribosomeGO:0046933 3.84 2.214E-07 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0003735 3.06 1.687E-13 structural constituent of ribosomeGO:0003779 2.66 1.980E-05 actin bindingGO:0005200 1.79 6.974E-03 structural constituent of cytoskeleton

GSM286602: Male bodyGO:0005977 15.22 2.021E-04 glycogen metabolic process GO:0004129 8.54 5.235E-04 cytochrome-c oxidase activity GO:0000275 14.50 8.035E-04 proton-transporting ATP synthase complex, catalytic core F(1)GO:0006123 10.15 9.725E-04 mitochondrial electron transport, cytochrome c to oxygen GO:0008553 7.96 3.394E-11 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0000276 13.53 1.504E-04 proton-transporting ATP synthase complex, coupling factor F(o)GO:0015992 9.13 5.463E-11 proton transport GO:0046961 5.23 4.321E-06 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005747 10.82 5.964E-18 respiratory chain complex I GO:0015986 5.45 1.285E-06 ATP synthesis coupled proton transport GO:0046933 5.23 4.321E-06 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005751 9.55 7.936E-05 respiratory chain complex IV GO:0006099 4.97 1.797E-03 tricarboxylic acid cycle GO:0003735 4.08 1.247E-11 structural constituent of ribosome GO:0005842 7.66 3.481E-11 cytosolic large ribosomal subunit GO:0006936 4.40 4.878E-05 muscle contraction GO:0005843 7.10 4.482E-07 cytosolic small ribosomal subunit GO:0006412 2.01 7.285E-03 translation GO:0005840 5.31 1.450E-06 ribosome

GO:0005743 3.69 2.976E-03 mitochondrial inner membrane

GSM286603: Female bodyGO:0005977 10.47 2.310E-03 glycogen metabolic process GO:0045735 11.63 4.678E-03 nutrient reservoir activity GO:0000275 9.97 6.258E-03 proton-transporting ATP synthase complex, catalytic core F(1)GO:0006123 7.98 9.644E-04 mitochondrial electron transport, cytochrome c to oxygen GO:0042708 8.37 9.656E-03 elastase activity GO:0000276 9.30 1.671E-03 proton-transporting ATP synthase complex, coupling factor F(o)GO:0015992 7.68 1.209E-12 proton transport GO:0004263 6.68 6.514E-05 chymotrypsin activity GO:0000221 7.98 2.476E-04 hydrogen ion transporting ATPase V1 domainGO:0006119 5.71 7.140E-03 oxidative phosphorylation GO:0004129 6.61 1.020E-03 cytochrome-c oxidase activity GO:0005842 7.64 1.156E-17 cytosolic large ribosomal subunit GO:0006099 5.41 2.531E-07 tricarboxylic acid cycle GO:0008553 6.57 4.066E-12 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005751 7.39 1.244E-04 respiratory chain complex IV GO:0015986 4.58 4.442E-07 ATP synthesis coupled proton transport GO:0003735 4.60 7.960E-24 structural constituent of ribosome GO:0005843 7.33 4.775E-12 cytosolic small ribosomal subunit GO:0007010 2.34 3.792E-03 cytoskeleton organization and biogenesis GO:0046961 4.44 1.317E-06 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005747 7.13 5.986E-13 respiratory chain complex I GO:0006412 2.27 5.962E-08 translation GO:0046933 4.44 1.317E-06 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005840 5.37 1.328E-10 ribosomeGO:0006508 1.70 5.976E-04 proteolysis GO:0003676 1.96 1.719E-07 nucleic acid binding GO:0005737 1.87 3.211E-04 cytoplasm

Taft RJ et al - tiRNAs

30

Nature Genetics: doi:10.1038/ng.312

Page 31: Supplementary Information Tiny RNAs associated with ... · Tiny RNAs associated with transcription start sites in animals Ryan J. Taft1, Evgeny A. Glazov2, ... Pietro Castellino 111,

Taft RJ et al. – tiRNAs

31

Supplementary Table 2

Primer Name Primer sequence (RNA bases are upper case)5’ adaptor 5’-acgctcacagaattcAAA-3’3’ adaptor 5’-phosphate-UXXxxgaattctcacgaggccagcgt-biotin-3’3’ RT-PCR primer (Primer 1) 5’-biotin-gcacgctggcctcgtgagaattc-3’5’ PCR primer (Primer 2) 5'-biotin-cagccgacgctcacagaattcaaa-3'

Nature Genetics: doi:10.1038/ng.312