evolution of the u2 spliceosome for processing …...the non-canonical (and canonical) introns of f....

22
Article Evolution of the U2 Spliceosome for Processing Numerous and Highly Diverse Non-canonical Introns in the Chordate Fritillaria borealis Graphical Abstract Highlights d F. borealis has lost most of its old introns, and it gained new ones by transposition d New introns do not conform to the GT/AG rule, and they display various splice sites d The U2 spliceosome is conserved and responsible for removing the new introns d Larvacean tunicates have evolved specific mechanisms to remove non-GT/AG introns Authors Simon Henriet, Berta Colom Sanmartı´, Sara Sumic, Daniel Chourrout Correspondence [email protected] In Brief The origins of introns in eukaryotes is a mystery, and the majority have nothing in common except GT/AG ends that play a crucial role during splicing. Henriet et al. show that transposition creates a large amount of new introns with non-GT/AG splice sites. A spliceosome with conserved components, but with a new selectivity, removes these introns. Henriet et al., 2019, Current Biology 29, 3193–3199 October 7, 2019 ª 2019 Elsevier Ltd. https://doi.org/10.1016/j.cub.2019.07.092

Upload: others

Post on 25-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Article

Evolution of the U2 Splice

osome for ProcessingNumerous and Highly Diverse Non-canonical Intronsin the Chordate Fritillaria borealis

Graphical Abstract

Highlights

d F. borealis has lost most of its old introns, and it gained new

ones by transposition

d New introns do not conform to the GT/AG rule, and they

display various splice sites

d The U2 spliceosome is conserved and responsible for

removing the new introns

d Larvacean tunicates have evolved specific mechanisms to

remove non-GT/AG introns

Henriet et al., 2019, Current Biology 29, 3193–3199October 7, 2019 ª 2019 Elsevier Ltd.https://doi.org/10.1016/j.cub.2019.07.092

Authors

Simon Henriet, Berta Colom Sanmartı,

Sara Sumic, Daniel Chourrout

[email protected]

In Brief

The origins of introns in eukaryotes is a

mystery, and the majority have nothing in

common except GT/AG ends that play a

crucial role during splicing. Henriet et al.

show that transposition creates a large

amount of new introns with non-GT/AG

splice sites. A spliceosome with

conserved components, but with a new

selectivity, removes these introns.

Page 2: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Current Biology

Article

Evolution of the U2 Spliceosome for ProcessingNumerous and Highly Diverse Non-canonical Intronsin the Chordate Fritillaria borealisSimon Henriet,1 Berta Colom Sanmartı,1 Sara Sumic,1 and Daniel Chourrout1,2,3,*1Sars International Centre for Marine Molecular Biology, University of Bergen, 5006 Bergen, Norway2Key Laboratory of Marine Genetics and Breeding, Ocean University of China, Ministry of Education, Qingdao 266003, China3Lead Contact*Correspondence: [email protected]

https://doi.org/10.1016/j.cub.2019.07.092

SUMMARY

An overwhelming majority of eukaryotic introns haveGT/AG ends, whose identities play a critical role fortheir recognition and removal by the U2 spliceosome,a well-conserved complex of protein and RNAs. In-trons with other splice sites exist at very low fre-quencies in various genomes, and some of them areprocessed by the U12 spliceosome. Here, we showthat, in the chordate Fritillaria borealis, the majorityof old introns have been lost and replaced by intronswith highly divergent splice sites. The new introns ofF. borealis are exceptionally diverse, though morefrequentlyAG/ACorAG/AT,and featuresof thousandsof them support an origin from transposons. Theycannot be processed in human cells, but their splicingis rescued by mutating terminal dinucleotides to GT/AG. With lariat sequencing and splicing inhibitor as-says, we show that F. borealis introns are spliced bythe U2 spliceosome, which thus evolved to a differentselectivity, with neither novel U1 small nuclear RNA(snRNA) types nor major remodeling of its proteinand snRNA complements. This genome-wide recolo-nization by non-canonical introns emphasizes theimportance of transposons as a resource of novel in-trons in a context of massive intron loss. An evolutionof thespliceosomemayalsopermit toneutralizeharm-ful transposons through their conversion into introns.

INTRODUCTION

Spliceosomal introns are generally stable over long evolutionary

times [1], but in some rapidly evolving lineages, many old introns

have been lost and new introns with canonical splice sites have

been gained. In the tunicate larvacean Oikopleura dioica, an

important turnover of introns was revealed and possibly ex-

plained by at least two distinct mechanisms of intron gains [2].

Notably, a significant minority of introns had slightly modified

G(non-T)/AG splice sites, together with characteristic size distri-

butions and an unusual A-rich tail. More drastic changes of

intron-exon organization were observed outside metazoans,

Current

such as in two species of microalgae, for which thousands of in-

trons show features of transposable elements, such as repeated

and palindromic sequences [3]. How such new introns arose

from intragenic insertions was not addressed, presumably

because their splice siteswere either canonical (GT/AG) or nearly

canonical (GC/AG). In a limited sample of introns from the unicel-

lular algae Euglena gracilis, introns with more divergent splice

sites were found and they also seem to originate from transpons

[4, 5]. They were named ‘‘nonconventional’’ as opposed to spli-

ceosomal GT/AG, even though no experiment permits to show

for either type the involvement or lack of involvement of the spli-

ceosome [5]. This issue is central for the case that we report

here, as most introns of fully sequenced tunicate genomes

also acquired from transposable elements have highly divergent

splice sites. Our results support that an evolution of the splicing

machinery was required for transforming transposons into in-

trons on a genome scale.

RESULTS

Most Introns of Fritillaria borealis Are Non-Canonicaland Were Acquired from TransposonsWhen studying the newly sequenced genomes of eight larva-

cean species [6], we discovered that the vast majority of

F. borealis introns have non-canonical splice sites (Figures 1A,

1B, and S1). For their precise delimitation, we developed and im-

plemented an ad hoc approach, which also detects canonical in-

trons and the rare non-canonical introns of other genomes (see

STARMethods and Figure 7A). In F. borealis, the most abundant

intron types are AG/AC and AG/AT, but a broad diversity of other

splice site combinations is also found. Overall, F. borealis genes

have fewer introns than in other animal species, and most of

them have one or more non-canonical introns (Figure S2). Half

of the minority GT/AG introns have positions conserved beyond

tunicates, supporting a relatively ancient origin (Figure S2). Non-

canonical introns have species-specific positions in the coding

sequences, and their origin from transposable elements is sup-

ported by observing that they have (1) several to many copies

in the genome (Figures 1A, 1B, S3, and S4), (2) a distinctive

and narrow size distribution (Figure 1A), (3) terminal inverted re-

peats (TIRs) that can be identical (Figures 1B, S3, and S4), and (4)

short flanking direct repeats evoking target site duplications (Fig-

ures 1A, 1B, S1, S4B, and S4C). Most non-canonical introns are

indeed preceded by an exonic triplet TAC (or less frequently TAT;

Biology 29, 3193–3199, October 7, 2019 ª 2019 Elsevier Ltd. 3193

Page 3: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure 1. Main Features of F. borealis

Introns

(A) Left side: classification of 5,645 introns based

on transcriptome and genome alignment and

annotation after longest open reading frame (ORF)

orientation (large pie). Small pies represent for

each intron type the relative proportions of TAC or

TAT exonic triplets preceding the intron, which are

in majority for all non-canonical types and in small

minority for canonical introns. Right side, top:

distribution of intron sizes for each main category

of introns is shown. The very large majority of AG/

AN introns are 100–250 bp long, an important part

of GT/AG introns are longer, and GC/AN introns

have an intermediate size distribution. Right side,

bottom: relative repetition level in each intron

category based on pairwise alignments of all in-

trons is shown. AG/AN introns are by far those

having more homologs, and GT/AG introns have

virtually none.

(B) Logos of intron sequences, most of which are

non-canonical, with the characteristic TAC or TAT

pre-intron triplet. Canonical introns have a T-rich

tail, in contrast to non-canonical ones. The third

highly conservative logo for 107 repeated introns

of one subgroup displays palindromic ends.

See also Figures S1, S2, S3, and S4 and Table S2.

Figure S4D), compatible with the duplication of a TA target site,

which generated the intron 30 end. These features altogether

support an origin from MITEs (miniature inverted-repeat trans-

posable elements) integrating after a TA site (Figure S4E), as

Tc1/Mariner transposons do [7]. An unbiased survey led to

retrieve 10,295 non-redundant copies of MITEs in the genome

of F. borealis [6]. Homology search showed sequence similarities

of 6,015 of them with 3,609 annotated introns. In these intron-

related MITEs, 939 are part of a collection of 19,214 introns iden-

tified by genome to transcriptome alignment. By comparing the

protein-coding potential of their flanking regions with those of

annotated introns (BLASTX on UniProtKB/SwissProt), we esti-

mate that another 15% of the MITEs not validated by alignments

with transcriptsmay also be introns. Themechanisms of intron or

MITE mobilization in F. borealis are still unclear, as DNA transpo-

sons present in the genome did not show conspicuous similarity

with TIRs present at intron and MITE ends.

3194 Current Biology 29, 3193–3199, October 7, 2019

F. borealis Non-Canonical Introns Are SpliceosomalBecause a presence/absence polymorphism was revealed for

recently gained introns [3], care should be taken to not mistake

ordinary genomic insertions for introns. The compelling argu-

ment for non-canonical insertions into F. borealis genes to repre-

sent real introns is the identification of splicing products (Fig-

ure 2A). The first step of splicing ligates the branch point to the

50 end of the intron, thus producing an intron lariat that is

released after the second step of splicing. We were able to

detect lariats by sequencing RNA resistant to Rnase R [8] and

aligning the reads with our collection of putative introns. Lariat

production was detected for 111 introns in O. dioica and for 44

introns in F. borealis, including 40 non-canonical ones. Lariat se-

quences in every case confirmed the proposed 50 splice site and

pinpointed a branch point very near the 30 splice site (<13 nt; Fig-

ure 2B). Lariats may also be produced by self-splicing group II

and group III introns, which were shown to exist in some animal

Figure 2. Identification of Splicing Interme-diates with Lariat-Seq

(A) Outline of the procedure. RNase R digests lariat

30 tail, leaving circular RNA with a 20-50 bond at the

branchpoint (BP) (black dot). During cDNA syn-

thesis (dotted lines), the BP is an error-prone po-

sition (pie charts show error rate for adenine BPs).

Reads are mapped back to the genome, and BP

position is inferred from split alignment.

(B) Bar graphs represent the BP identity for lariats

identified in F. borealis and O. dioica.

(C) BP distance from the 30 end of the intron.

Page 4: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure 3. Splicing Inhibitor Assay

(A) RT-PCR was used to visualize splicing effi-

ciency for seven F. borealis introns in specimens

treated with 10 mMPladienolide B (PlaB), as well as

non-treated animals (DMSO). Primers matched the

flanking exons. Red arrows show products con-

taining an intron; black arrows show splicing

products (lanes 1 and 2, cDNA; lanes 3 and 4,

control without RT; lane 5, control with water; lane

6, control with genomic DNA).

(B) Genome browser view showing RNA-seq

coverage over two genes, in PlaB treated and non-

treated specimens (log scale). Exon coverage is

comparable in the two samples, whereas intron

coverage (indicating intron retention) is increased

by PlaB treatment.

(C) Genome-wide intron retention rate measured

on RNA-seq reads with or without PlaB treatment

(see details in STAR Methods). Only introns with flanking exons covered by at least 50 sequencing reads in control and in treated samples are considered here.

High retention rates are induced by the PlaB treatment.

genomes [9] and whose activity relies on conserved RNA motifs

and an IEP protein. These features were found neither in

F. borealis introns nor in its genome sequence. In a comple-

mentary experiment (Figure 3), the splicing of both canonical

and non-canonical introns was partially inhibited in F. borealis

specimens treated with Pladienolide B (PlaB), a drug known

to interfere with the recognition of branchpoint by the SF3B

complex during pre-mRNA splicing [10, 11]. The global analysis

of all RNA-seq) reads encompassing exon-intron borders

(381,722 for DMSO control and 266,948 for PlaB treated)

Figure 4. Splicing Assays in HEK293T Cells

(A) Twelve introns from six F. borealis genes were cloned in a mammalian expr

introns was tested with RT-PCR. Stars indicate the three introns modified with in

(B) Effect of splice site mutation on intron removal. For three distinct introns, splice

Arrows show primers, red bars indicate at which position mutations were introd

which splice sites were mapped, and the fractions indicate their frequency (lowe

(C) Gel showing a representative RT-PCR experiment for detecting the splicing of

been modified (lanes mut1–mut5).

See also Table S1.

showed an increase of intron retention rate in treated speci-

mens, up to levels that were similar for canonical (from 2%

up to 11.4%) and non-canonical introns (from 1.4% up to

12.9%). The analysis of 1,067 introns, for which at least 50

RNA-seq reads were exploitable in both control and treated

groups, showed that intron retention rates could reach much

higher levels (Figures 3B and 3C), as confirmed with RT-PCR

assays (Figure 3A). All these results concur to support that

the non-canonical (and canonical) introns of F. borealis are

spliceosomal.

ession vector. After cell transfection, splicing of canonical and non-canonical

vitro mutagenesis.

sites mutants (mut1–5) were produced and splicing was tested using RT-PCR.

uced, and black bars indicate cryptic splice sites. The connecting lines show

rcase, exon sequence; uppercase, intron sequence).

either the wild-type intron Fb6nc (lane WT) or mutants whose splice sites have

Current Biology 29, 3193–3199, October 7, 2019 3195

Page 5: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure 5. Characterization of the Spliceo-

some Components in F. borealis

(A) Two U1 variants were found (arrows show

SNPs), with conserved 50 end and stem-loop mo-

tifs. In stem Ic of snRNA U5, the base pairs prox-

imal to loop I are strictly conserved in chordates

but have diverged in fritillarids. In snRNA U6, res-

idues predicted to interact with U2 and U4 are

highlighted. Residues colored in red are conserved

within chordates.

(B) Cap-dependent RNA-seq. The list shows the

most abundant RNAs present in the input, in the

100- to 300-nt range. Note that it does not include

the splice leader RNA (SL RNA), whose size is

under 100 nt. Cap enrichment is determined as the

increase in read amount caused by decapping

prior to 50 adaptor ligation.(C) RIP-seq experiments with anti-TMG and anti-

Sm immunoglobulin Gs (IgGs). We compared each

immunoprecipitation (IPP) to a control (Ctl) per-

formed with pre-immune serum. Dots show reads

abundance for exons (gray), introns (yellow),

snRNA (red), U3 (pink), and rRNA (blue). Dark dots

represent background.

(D) Orthologs of spliceosomal proteins found with

genome mining, grouped either by snRNP or by

function during pre-mRNA processing.

(E) The maximum-likelihood phylogeny shows the

relationships between four groups of SR proteins

(SRSF2/8, SRSF3/7, SRSF1/9, and SRSF4/5/6).

Nodes with bootstrap values over 0.7 are shown

with red circles; the scale bar indicates the number

of amino acid changes per position.

See also Figures S5 and S6.

F. borealis Has Evolved a Specific Ability to ProcessNon-Canonical IntronsWhether F. borealis introns can or cannot be processed by a

conserved spliceosome was addressed in splicing assays.

Gene fragments from F. borealis containing canonical or non-ca-

nonical introns, native or in vitro mutated, were introduced into

human HEK293T cells, and their transcripts were characterized

(Figure 4). Although canonical introns of F. borealis were accu-

rately removed, its non-canonical introns remained unspliced

(Figure 4A). For one non-canonical intron construct, splicing

did in fact occur but at a cryptic canonical splice site. We then

tested whether changing 50 and 30ss identity could rescue the

splicing of three F. borealis AG/AT introns (Figure 4B). For each

type of mutation, the results showed variations among introns,

but the overall pattern was as follows: (1) sites left non-canonical

were not used, (2) splicing was most often detected if one of the

sites was rendered canonical, and (3) splicing happened exactly

at the intron borders only when both sites were canonical,

though other alternative canonical sites could also be selected.

Overall, the splicing of F. borealis introns in human cells is depen-

dent on the presence of terminal dinucleotides GT/AG. These re-

sults support that the spliceosome of F. borealis evolved new

properties for accurately processing non-canonical introns.

F. borealis Possesses a Single Spliceosome of the U2TypeMost GT/AG introns are spliced out by the U2 spliceosome,

which comprises five small nuclear ribonucleoprotein particles

3196 Current Biology 29, 3193–3199, October 7, 2019

(snRNPs), corresponding to snRNA U1–U6 and several other

splicing factors [12]. All snRNAs detected in the F. borealis

genome were of the U2 type (Figure S5A) and not of the U12

type, as earlier observed for the tunicate larvacean O. dioica

[13]. In the classical model, the 50 splice site is recognized by

the complementaryU1 snRNA 50 end [14]. In F. borealis, we found

twoU1 snRNA genes (Figure 5A) with well conserved 50 ends thatmatch canonical but not non-canonical splice sites. The only

change found in snRNAs is aweakly conserved stem Ic inU5 (Fig-

ures 5A andS5B),whichmight affect the helical conformation and

the contacts between Loop I and the pre-mRNA [15]. However,

the co-transfection of F. borealisU5 snRNA with gene constructs

containingnon-canonical introns intoHEK293Tcells didnot result

in their splicing. Because standardgenomeminingmaynot reveal

highly divergent snRNAs, we experimentally explored the ncRNA

complement for candidates presenting known features of

snRNAs, i.e., a short size, high expression levels, presence of 50

tri-methylguanosine cap (TMG), and a predicted ability to bind

Sm proteins (Figures 5B and 5C). We sequenced 100- to 300-

base-longRNAs,whose50 end is resistant toexonucleasebuteffi-ciently ligated after decapping.We also performed RIP-seq using

antibodies against TMGor the Sm antigen. Both approaches effi-

ciently identified all snRNAs of the U2 type but none that might

recognize non-canonical introns through sequence complemen-

tarity.Wealso identifiedF. borealis candidate orthologs of 71 pro-

teinsknown toparticipate inspliceosome function [12] (Figure5D).

The predicted sequences of proteins involved in splice site recog-

nition showed strong conservation between F. borealis and other

Page 6: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure 6. Frequency of Main Intron Types and Intron Densities

across Deuterostomes

Emphasis is on newly sequenced genomes of larvacean tunicates, all other

species showing a large majority of canonical introns. Scale bar, number of

substitution per site. Pies indicate the relative frequencies of intron types:

yellow area for canonical introns; blue area for nearly canonical introns G(non-

T)/AG; and green area for non-canonical introns. Nearly canonical introns have

relatively high frequencies in oikopleurids, and other non-canonical introns

dominate in fritillarids. Very low frequencies of non-canonical introns are

confirmed in human (H. sapiens), amphioxus (B. floridae), ascidian

(C. intestinalis), and sea urchin (S. purpuratus). All counts result from align-

ments of transcriptomes with genome sequences, except for O. longicauda,

A. sicula, and F. pellucida, for which introns were localized and annotated after

visual inspection of highly conserved gene sequences. Intron densities were

calculated for conserved coding regions of 109 genes from six species,

including F. borealis. For A. sicula, whose genome sequencing coverage is low

(approx. 43) and transcriptome is not available, a subset of 40 of these 109

genes was used and introns localized using TBLASTN and query sequence

from human and F. borealis (star).

See also Figure S7.

species. Amongproteins of theU1andU2snRNP, sequence sim-

ilarity with human orthologous proteins ranged from 63.7% to

85.5% in well-conserved domains. Higher divergence was

observed in low-complexity regions, such as the C terminus of

U1andU2proteins that arealso lessconserved inothergenomes.

Intriguing exceptions were deviations in highly conserved motifs

of U1C and Prp8 (Figures S6A and S6B) and an expansion of

SRSF2/8 genes with conserved RRMs but a variable distribution

of R(X) repeats (Figures 5E and S6C). Because SRSF2 plays an

important role in splice site selection, this expansion could be

investigated further in relation with the processing of new intron

types [16–18].

The Burst of Introns with Highly Divergent Splice SitesDates from the Radiation of Tunicate LarvaceansAn interesting question is when did the ability to process

numerous non-canonicals arise? First, we calculated the

frequencies of intron types in a broad panel of species from

sea urchin to human, based on genome to transcriptome align-

ments. We could confirm for all of them a very small percentage

of non-canonical introns (Figure 6), including in the tunicate

ascidian Ciona intestinalis. The frequency of divergent intron

types becomes notable in tunicate larvaceans, for which we

have sequenced three new genomes of oikopleurids

(O. albicans, O. vanhoeffni, and O. longicauda) and two addi-

tional genomes of fritillarids (F. pellucida and Appendicularia

sicula). High frequencies and again a broad diversity of non-ca-

nonical introns are observed for fritillarids (Figure 6). Among the

three fritillarid species, there are differences in the types and

sizes of non-canonical introns but also key similarities, like the

prevalence of TAT/TAC pre-intron exonic triplets, which sug-

gests an origin from transposons (Figure S7). Interestingly, all

four genomes of oikopleurids show an important minority of

G(non-T)/AG introns, which were previously revealed for

O. dioica, but not investigated further (Figure 6). After transfec-

tions of HEK293T cells with O. dioica intron constructs (Table

S1), canonical introns were spliced out and nearly canonical

ones were not. This suggests for O. dioica too an adaptation of

the splicing machinery. We conclude that, within tunicates, lar-

vaceans at large have developed specific abilities to process

new intron types.

DISCUSSION

Our results reveal that invasions of host genes by transposons

produced numerous non-canonical introns, whose removal ne-

cessitates an adapted splicing machinery. Larvacean tunicates

have experienced a particularly rapid evolution, reflected by

long branches in phylogenetic trees and important genomic

changes [2, 3, 6]. Among those, highly frequent changes of intron

positions were revealed when analyzing the genome ofO. dioica.

Massive losses of old introns were interpreted as the outcome of

mRNA-mediated mechanisms (reverse transcription followed by

homologous or illegitimate genome integration) [2]. For the gain

of new introns, the features of forty introns were found compat-

ible with an origin via at least two mechanisms, reverse splicing

and transposon insertion [2]. The respective contribution of

those cannot be apprehended, because intron sequence evolu-

tion is too rapid for preserving similarities with intron donor

elements.

Large effective population size was supported for O. dioica by

genome data [2], making genetic changes more likely to be

adaptive. Tuning down the selectivity of the spliceosome for pro-

cessing non-canonical introns is in principle not without risk, as it

could also augment the frequency of incorrect splicing. An evo-

lution of the spliceosome selectivity must have conferred selec-

tive advantages able to compensate for such risks. A spliceo-

some that is less dependent on splice site identity could have

favored the ‘‘neutralization’’ of a broad variety of intragenic inser-

tions via their removal from pre-mRNAs. Because the spliceo-

some selectivity here appears little dependent on the intron

ends, accurate splicing may rely on other intron-associated fea-

tures, such as other sequence motifs, epigenetic marks, or RNA

structure. Our analysis has not recognized new splicing signals in

introns or in the flanking exons, and we found a surprisingly well-

conserved U1 snRNA, including in its region that matches the 50

Current Biology 29, 3193–3199, October 7, 2019 3197

Page 7: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure 7. Forces That May Drive the

Replacement of Canonical by Non-Canoni-

cal Introns

A genome segment containing four genes is fol-

lowed along the process, with exons in blue color.

Transposable elements insertions are represented

with pairs of opposite arrows, in gray for intergenic

insertions, in black for disruptive insertions into

exons, and in red for insertions that have been

intronized.

splice site in other organisms. This conservation is probably

required for splicing the few GT/AG introns left in the

F. borealis genome. It does not exclude a role of U1 snRNA in

the splicing of non-canonical introns, perhaps using an alterna-

tive base pair register [19]. In human cells that possess well-

conserved U1 snRNA, non-canonical introns of F. borealis

were not spliced. We therefore assume that, if U1 snRNA recog-

nizes non-canonical introns, this occurs with the assistance of

other factors that are specific to F. borealis.

The homology of MITEs and introns is probably eroded with

time, as shown by the degeneration of their palindromic struc-

tures, but this does not mask the strong relationship existing be-

tween the F. borealis complements of MITEs and of introns. As

indicated in Figure 6, the intron density is relatively low in larva-

ceans and very low in fritillarids whose genes received a small

number of non-canonical introns. There, only a very tiny fraction

(around 1%) of ancient introns was retained. In this context of

massive intron loss, MITEs thus became the essential supplier

of introns, perhaps the only possible one. Whatever the reasons

for the settlement of non-canonical introns, its historical dy-

namics is an exciting question. Before non-canonical introns

could be spliced out, MITE insertions into genes must have

been limited in number. The risk of an excessive mutation load

may have fostered the evolution of new intron recognition mech-

anisms, which, when in place, have allowed a secondary coloni-

zation of their coding sequences (Figure 7).

STAR+METHODS

Detailed methods are provided in the online version of this paper

and include the following:

d KEY RESOURCES TABLE

d LEAD CONTACT AND MATERIALS AVAILABILITY

d EXPERIMENTAL MODEL AND SUBJECT DETAILS

d METHOD DETAILS

3198

B Genome assembly

B Transcriptome assembly and intron annotation

B Lariat-seq

B Splicing inhibitor assay

B Cap-dependent RNA-seq

B RNA-immunoprecipitation

B Splicing assays in mammalian cells

Current Biology 29, 3193–3199, October 7, 2019

B Homology searches

d QUANTIFICATION AND STATISTICAL ANALYSIS

B RNA-immunoprecipitation

B Splicing inhibitor assay

d DATA AND CODE AVAILABILITY

SUPPLEMENTAL INFORMATION

Supplemental Information can be found online at https://doi.org/10.1016/j.

cub.2019.07.092.

ACKNOWLEDGMENTS

We thank Anne Aasjord and Magnus Reeve for excellent technical assistance

in the Sars Centre Oikopleura facility, as well as Don Deibel (Memorial Univer-

sity of Newfoundland), Fabien Lombard and Gaby Gorsky (Observatoire Oce-

anologique de Villefranche sur Mer), and Linda Holland (Scripps Institution of

Oceanography) for organizing the collection of species. We thank Martin

Chourrout for help in statistical analysis and anonymous reviewers for their

helpful suggestions. We thank the Genecore facility of EMBL (Heidelberg) for

most Illumina sequencing. This project has been funded by two major grants

of the Research Council of Norway, of whichD.C. is the PI: 250005 accelerated

evolution in chordates and the origin of larvaceans and 234817 Sars Interna-

tional Centre for Marine Molecular Biology Research, 2013–2022.

AUTHOR CONTRIBUTIONS

D.C. conceived the study, D.C. and S.H. designed the experiments, S.H. and

B.C.S. performed the experiments, S.S. assembled the genomes and the tran-

scriptomes, D.C. performed most genomic analysis, S.H. annotated spliceo-

somal RNA and protein genes, and D.C. and S.H. wrote the manuscript.

DECLARATION OF INTERESTS

The authors declare no competing interests.

Received: March 20, 2019

Revised: June 27, 2019

Accepted: July 31, 2019

Published: September 19, 2019

REFERENCES

1. Raible, F., Tessmar-Raible, K., Osoegawa, K., Wincker, P., Jubin, C.,

Balavoine, G., Ferrier, D., Benes, V., de Jong, P., Weissenbach, J., et al.

(2005). Vertebrate-type intron-rich genes in the marine annelid

Platynereis dumerilii. Science 310, 1325–1326.

Page 8: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

2. Denoeud, F., Henriet, S., Mungpakdee, S., Aury, J.M., Da Silva, C.,

Brinkmann, H., Mikhaleva, J., Olsen, L.C., Jubin, C., Canestro, C., et al.

(2010). Plasticity of animal genome architecture unmasked by rapid evolu-

tion of a pelagic tunicate. Science 330, 1381–1385.

3. Huff, J.T., Zilberman, D., and Roy, S.W. (2016). Mechanism for DNA trans-

posons to generate introns on genomic scales. Nature 538, 533–536.

4. Canaday, J., Tessier, L.H., Imbault, P., and Paulus, F. (2001). Analysis of

Euglena gracilis Alpha-, Beta- and Gamma-Tubulin Genes: Introns and

Pre-mRNA Maturation. Mol. Genet. Genomics 265, 153–160.

5. Gumi�nska, N., P1echa, M., Zakry�s, B., and Milanovski, R. (2018). Order of

Removal of Conventional and Nonconventional Introns from Nuclear

Transcripts of Euglena gracilis. PLoS Genet. 14, e1007761.

6. Naville, M., Henriet, S., Warren, I., Sumic, S., Reeve, M., Volff, J.N., and

Chourrout, D. (2019). Massive changes of genome size driven by expan-

sions of non-autonomous transposable elements. Curr. Biol 29, 1161–

1168.e6.

7. Tellier, M., Bouuaert, C.C., and Chalmers, R. (2015). Mariner and the ITm

superfamily of transposons. Microbiol. Spectr 3, MDNA3-0033-2014.

8. Suzuki, H., Zuo, Y., Wang, J., Zhang, M.Q., Malhotra, A., and Mayeda, A.

(2006). Characterization of RNase R-digested cellular RNA source that

consists of lariat and circular RNAs from pre-mRNA splicing. Nucleic

Acids Res. 34, e63.

9. Valles, Y., Halanych, K.M., and Boore, J.L. (2008). Group II introns break

new boundaries: presence in a bilaterian’s genome. PLoS ONE 3, e1488.

10. Cretu, C., Agrawal, A.A., Cook, A., Will, C.L., Fekkes, P., Smith, P.G.,

Luhrmann, R., Larsen, N., Buonamici, S., and Pena, V. (2018). Structural

basis of splicing modulation by antitumor macrolide compounds. Mol.

Cell 70, 265–273.e8.

11. Kotake, Y., Sagane, K., Owa, T., Mimori-Kiyosue, Y., Shimizu, H., Uesugi,

M., Ishihama, Y., Iwata, M., and Mizui, Y. (2007). Splicing factor SF3b as a

target of the antitumor natural product pladienolide. Nat. Chem. Biol. 3,

570–575.

12. Wahl, M.C., Will, C.L., and Luhrmann, R. (2009). The spliceosome: design

principles of a dynamic RNP machine. Cell 136, 701–718.

13. Marz, M., Kirsten, T., and Stadler, P.F. (2008). Evolution of spliceosomal

snRNA genes in metazoan animals. J. Mol. Evol. 67, 594–607.

14. Zhuang, Y., and Weiner, A.M. (1986). A compensatory base change in U1

snRNA suppresses a 50 splice site mutation. Cell 46, 827–835.

15. McGrail, J.C., andO’Keefe, R.T. (2008). The U1, U2 andU5 snRNAs cross-

link to the 50 exon during yeast pre-mRNA splicing. Nucleic Acids Res. 36,

814–825.

16. Pandit, S., Zhou, Y., Shiue, L., Coutinho-Mansfield, G., Li, H., Qiu, J.,

Huang, J., Yeo, G.W., Ares, M., Jr., and Fu, X.D. (2013). Genome-wide

analysis reveals SR protein cooperation and competition in regulated

splicing. Mol. Cell 50, 223–235.

17. Shepard, P.J., and Hertel, K.J. (2009). The SR protein family. GenomeBiol.

10, 242.

18. Tarn, W.Y., and Steitz, J.A. (1994). SR proteins can compensate for the

loss of U1 snRNP functions in vitro. Genes Dev. 8, 2704–2717.

19. Roca, X., Krainer, A.R., and Eperon, I.C. (2013). Pick one, but be quick: 50

splice sites and the problems of too many choices. Genes Dev. 27,

129–144.

20. Brozovic, M., Dantec, C., Dardaillon, J., Dauga, D., Faure, E., Gineste, M.,

Louis, A., Naville, M., Nitta, K.R., Piette, J., et al. (2018). ANISEED 2017:

extending the integrated ascidian database to the exploration and evolu-

tionary comparison of genome-scale datasets. Nucleic Acids Res. 46 (D1),

D718–D725.

21. Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Mistry, J., Mitchell,

A.L., Potter, S.C., Punta, M., Qureshi, M., Sangrador-Vegas, A., et al.

(2016). The Pfam protein families database: towards a more sustainable

future. Nucleic Acids Res. 44 (D1), D279–D285.

22. Nawrocki, E.P., Burge, S.W., Bateman, A., Daub, J., Eberhardt, R.Y.,

Eddy, S.R., Floden, E.W., Gardner, P.P., Jones, T.A., Tate, J., and Finn,

R.D. (2015). Rfam 12.0: updates to the RNA families database. Nucleic

Acids Res. 43, D130–D137.

23. Kumar, S., Jones,M., Koutsovoulos, G., Clarke,M., andBlaxter, M. (2013).

Blobology: exploring raw genome data for contaminants, symbionts and

parasites using taxon-annotated GC-coverage plots. Front. Genet. 4, 237.

24. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible

trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120.

25. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov,

A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., et al. (2012).

SPAdes: a new genome assembly algorithm and its applications to sin-

gle-cell sequencing. J. Comput. Biol. 19, 455–477.

26. Rice, P., Longden, I., and Bleasby, A. (2000). EMBOSS: the European mo-

lecular biology open software suite. Trends Genet. 16, 276–277.

27. Lorenz, R., Bernhart, S.H., Honer Zu Siederdissen, C., Tafer, H., Flamm,

C., Stadler, P.F., and Hofacker, I.L. (2011). ViennaRNA package 2.0.

Algorithms Mol. Biol. 6, 26.

28. Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F.,

Dufayard, J.F., Guindon, S., Lefort, V., Lescot, M., et al. (2008).

Phylogeny.fr: robust phylogenetic analysis for the non-specialist.

Nucleic Acids Res. 36, W465–W469.

29. Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S.,

Batut, P., Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast univer-

sal RNA-seq aligner. Bioinformatics 29, 15–21.

30. Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D.,

Bowden, J., Couger, M.B., Eccles, D., Li, B., Lieber, M., et al. (2013). De

novo transcript sequence reconstruction from RNA-seq using the Trinity

platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512.

31. Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced

aligner with low memory requirements. Nat. Methods 12, 357–360.

32. Nawrocki, E.P., and Eddy, S.R. (2013). Infernal 1.1: 100-fold faster RNA

homology searches. Bioinformatics 29, 2933–2935.

33. Bouquet, J.M., Spriet, E., Troedsson, C., Ottera, H., Chourrout, D., and

Thompson, E.M. (2009). Culture optimization for the emergent zooplank-

tonic model organism Oikopleura dioica. J. Plankton Res. 31, 359–370.

34. Lamble, S., Batty, E., Attar, M., Buck, D., Bowden, R., Lunter, G., Crook,

D., El-Fahmawi, B., and Piazza, P. (2013). Improved workflows for high

throughput library preparation using the transposome-based Nextera sys-

tem. BMC Biotechnol. 13, 104.

35. Brena, C., Cima, F., and Burighel, P. (2003). Alimentary tract of

Kowalevskiidae (Appendicularia, Tunicata) and evolutionary implications.

J. Morphol. 258, 225–238.

36. Henriet, S., Sumic, S., Doufoundou-Guilengui, C., Jensen, M.F.,

Grandmougin, C., Fal, K., Thompson, E., Volff, J.N., and Chourrout, D.

(2015). Embryonic expression of endogenous retroviral RNAs in somatic

tissues adjacent to the Oikopleura germline. Nucleic Acids Res. 43,

3701–3711.

37. Lu, Z., Guan, X., Schmidt, C.A., and Matera, A.G. (2014). RIP-seq analysis

of eukaryotic Sm proteins identifies three major categories of Sm-contain-

ing ribonucleoproteins. Genome Biol. 15, R7.

Current Biology 29, 3193–3199, October 7, 2019 3199

Page 9: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

STAR+METHODS

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER

Antibodies

Anti-2,2,7-Trimethylguanosine Antibody, clone K121 Merck MABE302; RRID:AB_213109

Anti-Smith Antigen antibody [Y12] AbCam ab3138; RRID:AB_303543

Biological Samples

Fritillaria borealis Marine Biological Station, Espegrend

and Rosslandspollen, Rossland

N/A

Oikopleura dioica Sars Centre, University of Bergen N/A

Fritillaria pellucida Linda Holland, La Jolla N/A

Appendicularia sicula Marine Biological Station, Espegrend N/A

Chemicals, Peptides, and Recombinant Proteins

Trizol reagent Invitrogen Cat#15596026

RNase R Epicenter Cat#RNR07250

Terminator 50-Phosphate-Dependent Exonuclease Epicenter Cat#TER51020

Tobacco Acid Pyrophosphatase Epicenter Cat#T81050

Pladienolide B Santa Cruz Biotechnology Cat# sc-391691

Isoginkgetin Sigma-Aldrich Cat#416154

Critical Commercial Assays

REPLI-g Single Cell kit QIAGEN Cat#150343

Nextera DNA Library Prep Kit Illumina Cat#FC-121-1030

MiSeq Reagent Kit v3 (600-cycle) Illumina Cat#MS-102-3003

TruSeq Stranded Total RNA library prep Illumina Cat#RS-122-2201

Nucleospin RNA XS Macherey-Nagel Cat#740902.10

SMART-seq v4 Ultra low input RNA kit Takara Cat#634888

Deposited Data

Raw and analyzed data This paper

Fritillaria borealis genome [6] GenBank: SDII00000000

Oikopleura dioica genome [2] http://www.genoscope.cns.fr/externe/

Download/Projets/Projet_HG/data/assembly/

Other larvacean genomes [6] GenBank: SCLD01000000 to SCLH01000000

Ascidian genomes [20] https://www.aniseed.cnrs.fr/

RFAM [21] http://rfam.xfam.org/

PFAM [22] https://pfam.xfam.org/

Experimental Models: Cell Lines

HEK293T ATCC Cat#CRL-3216

Software and Algorithms

Blobology [23] https://github.com/blaxterlab/blobology

Trimmomatic [24] http://www.usadellab.org/cms/index.php?

page=trimmomatic

Spades [25] https://github.com/ablab/spades

EMBOSS package [26] http://emboss.sourceforge.net/

BLAST package https://blast.ncbi.nlm.nih.gov/Blast.cgi?

CMD=Web&PAGE_TYPE=BlastDocs&DOC_

TYPE=Download

The viennaRNA package [27] https://www.tbi.univie.ac.at/RNA/#download

MUSCLE [28] http://www.phylogeny.fr/simple_phylogeny.cgi

Gblocks [28] http://www.phylogeny.fr/simple_phylogeny.cgi

(Continued on next page)

e1 Current Biology 29, 3193–3199.e1–e4, October 7, 2019

Page 10: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Continued

REAGENT or RESOURCE SOURCE IDENTIFIER

PhyML [28] http://www.phylogeny.fr/simple_phylogeny.cgi

STAR [29] https://github.com/alexdobin/STAR

Trinity [30] https://github.com/trinityrnaseq/trinityrnaseq/

releases

HISAT2 [31] https://ccb.jhu.edu/software/hisat2/index.shtml

Infernal [32] http://eddylab.org/infernal/

LEAD CONTACT AND MATERIALS AVAILABILITY

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Daniel

Chourrout ([email protected]). This study did not generate new unique reagents.

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Specimens of Oikopleura dioica, Fritillaria borealis and Appendicularia sicula, were collected in the fjords around Bergen. Fritillaria

pellucida specimens were collected near La Jolla (CA, USA). O. dioica was collected and kept in culture in the lab according to

described procedures [33]. F. borealis specimens were collected between February and April by scooping seawater 2 to 15 m below

the surface, then grown in the lab at 14�C. Animals become well visible when their gonads start to develop. At this stage, 150-300

individuals are transferred in an 18L beaker containing UV-treated, filtered seawater supplemented with an algi diet [33]. Spawning

usually takes place after two days. Three days after spawning, the culture beaker is diluted two-fold and animals become visible again

after one to two days.

METHOD DETAILS

Genome assemblyWhole-genome sequencing and assembly for six larvacean species, including F. borealis, were recently reported [6]. For sequencing

the F. pellucida genome, DNA from a pool of six animals was prepared with a modified tagmentation procedure [34] based on the

Nextera kit (Illumina), and sequenced on MiSeq. This approach did not succeed with A. sicula DNA, possibly due to contaminants

from the blind gut [35]. To address this issue, we amplified the genome of a single animal with the REPLI-g Single Cell Kit (QIAGEN),

prior to tagmentation and sequencing. The 300-nts PE reads were trimmed with Trimmomatic [24]. All reads whose length was at

least 36 bp were subsequently assembled with SPAdes genome assembler [25]. The assemblies were then checked for contamina-

tion using Blobology [23]. The sizes of the resulting assemblies were 174 Mb for F. pellucida and 172 Mb for A. sicula, with scaffold

N50 values 855 bp and 2438 bp, respectively.

Transcriptome assembly and intron annotationWe used the Trizol reagent (Thermo) to extract RNA from pools of F. borealis at different developmental stages. Transcriptomes from

juveniles and adults were prepared and sequenced on MiSeq (150 nts PE) at Eurofins Genomics (Ebersberg, Germany), resulting in

6267518 raw reads. Transcriptomes fromembryos and larvaewere preparedwith SMART-seq v3Ultra low input RNA kit (Takara) and

sequenced on MiSeq (300 nts PE), resulting in 25787032 raw reads. Reads were trimmed using Trimmomatic and checked with

FASTQC. Reads were then assembled into transcripts with Trinity software [30], using the default parameters.

The mass annotation of F. borealis introns is based on alignments with the two transcriptomes. Most transcripts are present in the

adult transcriptome, which was therefore preferentially used. An issue for accurate determination of intron limits is the frequent pres-

ence of one or a few identical nucleotides at both ends of the alignment gap (in most cases due to the frequent exonic TAC/TAT triplet

preceding the intron). It was solved when observing for 99.2% of unambiguous gaps (no repeated nucleotides), that the second last

nucleotide of the intron is an adenosine. Using this information, introns could be precisely annotated. Prior to determining intron

limits, intron orientation was determined using GETORF (http://www.bioinformatics.nl/cgi-bin/emboss/getorf) [26] applied on tran-

script sequences. The sizes of the longest ORF for each orientation were measured and compared. Transcript orientation was

considered reliable when the longest ORF in one orientation was at least twice as long as the longest ORF in the other. When this

rule was not satisfied, the intron was not considered further. The number of orientation errors was shown to be minimal, by checking

the GETORF based orientation for highly conserved genes using BLASTX on Mouse protein Refseq (NCBI). The majority of introns

oriented in such a way had an adenosine as a second last nucleotide. In a few cases, there were two or more adenosines as candi-

dates for the second last nucleotide, preventing a reliable annotation, and those introns were not included further. A collection of

5.645 introns was considered annotated with sufficient level of confidence from an initial set of 19.214 gaps in the transcriptome

to genome alignment. Further experiments detecting lariats of non-canonical introns confirmed that the annotation was correct.

Due to the absence of a transcriptome, introns in F. pellucida, A. sicula andO. longicaudawere restricted to a few hundred elements

Current Biology 29, 3193–3199.e1–e4, October 7, 2019 e2

Page 11: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

from genes highly conserved with vertebrates (BLASTX). Gaps were determined based on the best BLASTX alignment (ambiguous

cases were not considered), while the sequence orientation was not problematic. The second last intron base rule was also applied.

For counting the incidence of the main intron types in all other genomes (human, amphioxus B. floridae, ascidian Ciona intestinalis,

sea urchin S. purpuratus, O. dioica, O. albicans and O. vanhoeffeni), the same annotation method was used after aligning reference

transcriptome on reference genome sequences. It used GETORF for sequence orientation and the second last base rule. GT-AG and

nearly-canonical G(nonT)-AG introns were identified first, the rest of introns being considered as non-canonical (Figure 5A).

Intron densities were compared among distinct genomes using a sample of 109 coding sequences from highly conserved genes

present in the genome sequence and represented in transcriptomes. Altogether, approximately 126 kb (42 kb of AA sequences) of

these coding sequences could be aligned among species and BLASTN alignments of genomes and transcripts allowed to count the

number of introns for each of them. An incomplete dataset (40 of the 109 genes) fromA. siculawas added to check whether or not the

trend for massive loss of introns may be a feature of fritillarids. For this species, the genome sequencing coverage is low (approx 4X)

andwe have no transcriptome data. Therefore, intronswere counted based on interruptions of TBLASTN alignments between protein

sequences of human and F. borealis and the A. sicula genome data.

Lariat-seqTrizol-extracted RNA from F. borealis or O. dioica was treated with DNase (TURBO DNA-free kit, Ambion), dissolved in ddH2O (final

concentration 0.45 mg.ml-1 in 5 ml), denatured 5 min at 65�C and placed on ice. Linear RNA was degraded 4h at 37�C with 10U RNase

R (Epicenter) in the manufacturer’s buffer, then 2.5 mM EDTA were added. We checked the efficiency of RNase R treatment by

comparing the electrophoretic profile of the sample against untreated control. After buffer exchange, RNA was annealed to random

10mers (Ambion) and prepared for sequencingwith the TruSeqStranded Total RNA kit (Illumina), omitting the rRNAdepletion step. Dur-

ing library preparation, we used size-selection on AMPure beads (Beckman coulter) to remove free adapters. This step may have

depleted short lariat cDNA and could account for someover-representation of longer, non-GTAG lariats (> 100 nts) in theO. dioicadata-

set. Lariat cDNA libraries were run together on MiSeq (250 nts PE), producing 11104078 and 9969578 filtered paired reads for the

F. borealis and theO. dioica sample, respectively. Reads were aligned with BLAST against a database of genes with annotated introns.

We selected reads that yield two sub alignments andwemapped the branch points by examining the transition between the alignments.

Splicing inhibitor assayJuvenile F. borealis were placed in 1 mL plastic dishes coated with 1% agarose, in filtered artificial sea water (Red Sea,

30.1-30.5 g.L-1 salinity) supplemented with 10 mM Pladienolide B in DMSO (PlaB, Santa Cruz Biotech), or 0.6% DMSO. After 3h in-

cubation at 10�C, animals were transferred in a collection tube and total RNA was extracted with Nucleospin RNA XS (Macherey-

Nagel). We prepared cDNA for RT-PCR assays as previously described [36], and we used 100 pg of total RNA to prepare Illumina

libraries (SMART-seq v4 Ultra Low Input RNA kit, Clontech). Libraries were run together on MiSeq (300 nts PE), producing

10463635 and 11843234 filtered paired reads for the PlaB sample and the DMSO control, respectively. Reads were aligned on

the F. borealis genome assembly, with HISAT2 [31] by using a collection of intron positions and by disabling the penalty against

non-canonical splice sites, resulting in an alignment rate of 81.83% for the PlaB sample and 82.01% for the control. Intron retention

rates were scored by examining a subset of reads encompassing a collection of exon-intron junctions (see ‘‘Quantification and sta-

tistical analysis’’).

Cap-dependent RNA-seqRNA (final concentration 5 mg.ml-1 in 100 ml) was either digested for 1h at 30�C with 5U of Terminator 50-Phosphate-DependentExonuclease (Epicenter) or incubated without enzyme for the input control. Reactions were stopped with 5 mM EDTA, and RNA

was purified with Phenol/Chloroform extraction. We treated one sample with Tobacco Acid Pyrophosphatase (TAP, Epicenter) in or-

der to increase the ligation efficiency of RNA-seq adapters to the 50 end, and compared it to the non-treated control. RNA fragments

in the 100-300 nts range were gel-purified and prepared for stranded RNA-seq at Ocean Ridge Biosciences (Deerfield Beach, FL,

USA). Trinity software was used to assemble the reads into transcripts. Subsequently, Infernal cmscan was used to scan the

RFAM database and annotate the transcripts accordingly [22, 32]. To search for RNA involved in splicing, we first ranked transcripts

by their abundance in the input sample, in the TAP-treated sample and in the non-treated sample. Canonical U2-type snRNAs re-

mained present among the 15 most abundant RNA in all samples, and TAP treatment generally increased their reads number (Fig-

ure 4B). U1 snRNA identified with RFAM were checked for the conservation of the prominent motif - ATACTTACCTG, located in the

first 11 nucleotides of the sequence. For other RNAs, neither snRNA-like secondary structure motifs or complementarity to non-ca-

nonical 50ss, such as TAC(T)AG (Figure 1B) were found.

RNA-immunoprecipitationBetween 150 to 200 F. borealis individuals were collected at the onset of gonadmaturation, washed in artificial seawater and pelleted

in Eppendorf tubes. Seawater was removed and animals were homogenized in RIP lysis buffer (DTT 2 mM, Ribonucleoside Vanadyl

Complex 5mM,NaCl 0.1M,MgCl2 5mM, glycerol 10%, HEPES 50mMpH7.5, 0.1%Triton X-100, PMSF 1mM,Complete-EDTA free

(Roche)). Sample droplets were snap-frozen in liquid N2, grinded with mortar and pestle, and the material was further homogenized

with ten passages in a 25G needle then diluted with IPP buffer (NaCl 0.15 M, HEPES 20 mM pH7.5, Triton X-100 0.05%, MgCl21.5 mM). An amount corresponding to 5 mg of RNA was incubated 2h at 4�C in presence of 10 mg yeast tRNA and 7.5 mg of either

e3 Current Biology 29, 3193–3199.e1–e4, October 7, 2019

Page 12: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

rabbit serum, anti-TMG IgG (K121MABE302,Merck) or anti-Sm IgG (ab3138, AbCam), in a final volume of 0.75ml. After binding, 75 ml

of Dynabeads-protein G (Thermo) pre-washed with IPP buffer and yeast tRNA were added and the samples were further incubated

1h at 4�C. Beadswerewashed three timeswith 0.5mL ice-cold IPP buffer, transferred to a new tube andwashed oncemore. Proteins

were digested for 45 min at 37�C in 0.4 mL of PK buffer (Tris-Cl 10 mM pH7.5, EDTA 10 mM, NaCl 0.1 M, SDS 0.5%, 4 mg proteinase

K), and RNA was recovered with Phenol/Chloroform extraction and treated with DNase. We prepared Illumina libraries using the

TruSeq Stranded Total RNA kit as previously mentioned, and sequenced the cDNA on MiSeq (300 nts PE). The reads were mapped

onto the genome assembly using STAR software [29] with default parameters, and collecting all alignments with 20 or more matched

bases (–outFilterMatchNmin 20). The number of uniquely mapped reads were respectively 1693034, 3279711 and 2452203 for the

control, anti-Sm and anti-TMG experiments. Library normalization, read count and background analysis were performed as

described by Lu et al. [37] (see ‘‘Quantification and statistical analysis’’).

Splicing assays in mammalian cellsGene fragments consisting in full-length introns flanked on both sides by 50 bp of exon sequences, were PCR amplified from

O. dioica and F. borealis DNA and cloned between the restriction sites SacI and XmaI in the mammalian expression vector

pEGFP-N1 (Clontech). To test sequence requirement for splicing, we changed the wild-type splice sites to mutants sequences using

PCR mutagenesis. We delivered the constructs with Polyethylene Imine to adherent HEK293T cells at 40% confluency. Cells were

harvested after 3 days growth at 37�C in DMEM. RNA was extracted with Trizol, treated with DNase, and amplified with RT-PCR as

previously described [36]. After gel electrophoresis, PCR products corresponding to spliced RNA were cloned and sequenced.

Homology searchesWe looked for U2- and U12-type snRNA in the fritillarid genome assemblies with BLAST using parameters for short queries. To build

our set of query sequences, we retrieved U2- and U12-type snRNA obtained from the RFAM library (http://rfam.xfam.org/), and we

supplemented snRNA candidates recovered with BLAST searches on tunicate genomes. Those include newly sequenced larva-

ceans [6] and ascidians genomes available online [20]. We confirmed snRNA identity using RNA structure prediction, RNA multiple

alignment [27], and RFAM-based annotation.

Based on proteomic studies of the snRNPs [12], we checked if the major protein components of the spliceosome were present in

F. borealis. We performed TBLASTN searches against the F. borealis transcriptome, using proteins queries from human,

D. melanogaster, C. elegans and S. cerevisiae. In most cases, the results revealed a single homolog with significantly higher scores

than other hits and for few exceptions, a duplicate was also found. Some ambiguous cases, corresponding to either proteins with

higher divergence in F. borealis or gene families, were resolved with multiple sequence alignment and by examining protein domains

with PFAM [21]. For each positive hit found in F. borealis, we performed reciprocal BLAST against NCBI-refseq and UNIPROT to

confirm the annotation. Similarity scores were calculated over local alignments that exclude low-complexity regions, using

BLOSUM50 matrices. Protein phylogenies were established with the ‘‘one click’’ pipeline on the Phylogeny.fr server [28].

QUANTIFICATION AND STATISTICAL ANALYSIS

RNA-immunoprecipitationFor RNA-immunoprecipitation (RNA-IPs) experiments, we measured the read coverage on targets corresponding to either snRNA

(U1, U2, U3, U4, U5, U6 and splice leader), rRNA (LSU, SSU and 5S), exons or introns. We excluded targets with less than ten reads

and we normalized the coverage with library size. For each target, the read enrichment (E) in IPs performed with anti-Sm and anti-

TMG IgGs, was calculated with E = log2ððIgG + 2=control + 2ÞÞ, where IgG and control are the respective coverage values for the

IPs, and for the control with serum only. Assuming that the majority of reads would correspond to non-specific binding events, we

used a two-component Gaussian mixture model to calculate the distribution of E in each sample, and to determine a threshold value

representing the background.

Splicing inhibitor assayThe intron retention rates were estimated genome wide based on RNA-seq reads. For statistical assessment of splicing inhibitor

effects, 1067 distinct intron-exon regions (1002 for non-canonical introns and 65 for canonical introns) were selected because pro-

ducing at least 50 exploitable sequencing reads from each of the DMSO control and the PlaB treated sample. Of the 1067 pairwise

comparisons of retention rates between control and treated groups, 238 fulfilled the conditions for application of the z-test. In these

comparisons, 227 showed significantly different retention rates (p < 0,05), with 223 and only 4 corresponding to increase and

decrease of retention rate, respectively.

DATA AND CODE AVAILABILITY

The experimental datasets supporting the current study have not been deposited in a public repository but are available from the

Lead Contact on request.

Current Biology 29, 3193–3199.e1–e4, October 7, 2019 e4

Page 13: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Current Biology, Volume 29

Supplemental Information

Evolution of the U2 Spliceosome for Processing

Numerous and Highly Diverse Non-canonical Introns

in the Chordate Fritillaria borealis

Simon Henriet, Berta Colom Sanmartí, Sara Sumic, and Daniel Chourrout

Page 14: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

SUPPLEMENTAL INFORMATION :

Figure S1: Logos of the main categories of non-canonical introns in F. borealis. Related to

Figure 1. In all cases, the intron tail is not T-rich like in canonical introns. In all cases,

conservation extends to the last triplet of the upstream exon.

Page 15: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure S2: Genome wide prevalence of non-canonical introns in F. borealis. Related to

Figure 1. A Annotated introns in two representative segments of the genome sequence, based

on alignment with transcriptome. RNA-seq coverage is represented over the predicted genes.

Terminal Inverted Repeats (TIRs) were scored with einverted [S1]. B Number of canonical and

non-canonical introns counted in 406 distinct genes having highly conserved sequences.

BLASTX alignments of these F. borealis genes with their vertebrate orthologues (refSEQ or

Swissprot) permits to ensure that they are most likely full length. Intron annotation results from

alignments with transcripts or the visual inspection of predicted protein translated product with

vertebrate proteins. Overall, almost all genes contain one or more non-canonical introns. C

Conservation of intron positions in other deuterostomes. F. borealis GT/AG introns were

considered only if they could be precisely localized in alignments with their putative

orthologues of four other species, including an ascidian and the larvacean O. dioica. Not more

than six of the 76 introns have the same position in all five species, but 42 of them have

conserved position with at least one species. In contrast, none of the non-canonical introns of

F. borealis have a conserved position (bottom row). D Gene ontology analysis of F. borealis

conserved genes, which have or not, canonical introns (same selection of putative full-length

genes as for B. For the GO analysis with the David package (BP1 GO terms), mouse

orthologous proteins were used to measure the level of enrichment vs the mouse proteome. We

observed overall enrichment for genes involved in biological regulation and development for

genes having canonical introns, but not for genes without canonical introns.

Page 16: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure S3: Groups of repeated introns the in F. borealis genome. Related to Figure 1.

Image of ClustalW matrix for 2330 sequences from 1165 introns (forward and reverse

orientation) repeated at least ten times in the genome (selection based on BLASTN in a

collection of 19214 introns). Each sequence is obtained by assembling the 50 bp long ends of

each introns with a 10N spacer in between. Darker dots correspond to higher sequence

similarity. Overall, three main groups of repeated introns are formed (dashed squares), and one

subgroup of highly conserved introns in each of them. The sequence logos for each group and

subgroup are provided, showing distinct consensus sequences but a conserved palindromic

arrangement between the 5’ and 3’ ends.

Page 17: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure S4: MITE transposition is the source of new introns in F. borealis. Related to

Figure 1. A Multiple sequence alignment of nine non-canonical introns (uppercase) and their

flanking sequences (lowercase), showing the conservation of Terminal Inverted Repeats (TIRs)

and Target Site Duplications (TSDs). The middle part of the alignment is not shown (NNN).

TIRs, TSDs, high similarity between copies and the absence of an internal orf are hallmarks of

MITEs (Miniature Inverted repeats Transposable Element) [S2]. Note that the beginning of

inverted repeats (IR) are shifted relative to the intron borders (-1 nt in 5’, -3 nts in 3’), in such

a way that the 5’ ss is included in the 5’ IR and the 3’ ss is included in the 3’ TSD. On the

bottom, arrows indicate which intervals that have been considered for identifying TSDs

genome-wide. B Mapping of TSDs in a collection of 3657 introns, by taking into account

successive intervals away from the intron borders. Results show an overrepresentation of the

dinucleotide TA, suggesting a Tc1/Mariner transposase could be involved in MITE integration

[S3]. C Breakdown of TSD identity for different intron classes. A majority of canonical introns

has no flanking TSDs, even when considering different combinations of intervals away from

intron borders. D The highly prevalent TAC or TAT triplet preceding non-canonical introns is

Page 18: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

supposed to predate MITE insertion at a TA site, before it is eventually intronized. That the

TAC or TAT triplet indeed predated the insertion could be assessed by measuring the level of

conservation of this site across chordates. In phase 0 introns interrupting a protein coding

sequence, both TAC and TAT codons encode a tyrosine (Y). Alignments of genes highly

conserved between amphioxus (B. floridae) and F. borealis show that tyrosine residues are

equally well preserved, irrespective of whether their codon precedes or not an intron. This result

supports that the triplet did not experience specific evolution after intron acquisition. Similar

analysis for phase 1 introns supports that both AC or AT pairs of nucleotides adjacent to introns

did not evolve differently from those located far from introns (data not shown). E Model for

MITE transposition, based on integration site preferences: 1) a MITE and its IRs are recognized

by a transposase and excised, forming an active transposome; 2) the transposase cuts the target

DNA after a TA dinucleotide preceding a C or T, it is possible that exonic sites are preferred

due to compact gene arrangement or better chromatin accessibility [S4]; 3) after MITE

integration, the repair of flanking sequences generates the TSDs.

Page 19: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure S5: The F. borealis snRNA complement. Related to Figure 5. A Secondary structure

models of full-length snRNA identified in F. borealis, based on genome and transcriptome

mining. For U1, arrows show substitutions found in the variant U1b. For U5, blue arrows

indicate the position of the multiple alignment, gray arrows show substitutions found in A.

sicula or F pellucida. Residues predicted to form a U2/U6 hybrid are highlighted in pink. B

Multiple alignment of the first stem-loop in U5, showing the fritillarid-specific sequence

change.

Page 20: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure S6: Species-specific changes in spliceosomal proteins. Related to Figure 5. A The

drawing represent Prp8 domain organization. The alignment shows sequence variation at two

motifs interacting with 5’ and 3’ss [S5], where fritillarid-specific substitutions are present. B

Conservation of the N-terminal domain of U1C. The drawing represents the secondary structure

based on published structure of U1 snRNP [S6] and the position of non-conservative

substitutions in fritillarids. C The F. borealis SR proteins. Left, domain organization of SR

proteins from human (H. sapiens), an ascidian (C. intestinalis) and three larvaceans (O. dioica,

O. albicans and F. borealis). Right, distribution of RX repeats in the protein. Darker shades

correspond to higher number of repeats. In F. borealis, the repertoire of SRSF2 proteins has

expanded and most of them have acquired an N-terminal, RS-rich extension.

Page 21: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Figure S7: Prevalence of non-canonical introns in two other fritillarid species. Related to

Figure 6. In the absence of transcriptome for these two species, introns were localized by

inspecting highly conserved gene sequences based on blastX alignments. In total, 252 and 175

introns of F. pellucida and A. sicula could be annotated. The trend is similar to that observed

in F. borealis, with a prevalence of TAC/TAT triplets before non-canonical but not canonical

introns. This prevalence is less obvious in nearly canonical introns.

Page 22: Evolution of the U2 Spliceosome for Processing …...the non-canonical (and canonical) introns of F. borealis are spliceosomal. Figure 3. Splicing Inhibitor Assay (A) RT-PCR was used

Gene Intron Borders Splicing ? Intron Borders Splicing ?

Unknown Od1c GTAG yes

Od2nc GAAG no Od3c GTAG yes

Tektin domain Od6c GTAG yes Od4nc GAAG no

Od5nc GAAG no

DEAD helicase Od7c GTAG yes Od8nc GAAG no

Table S1: In vitro splicing assays with O. dioica introns. Related to Figure 4. Gene

fragments containing canonical and non-canonical introns were expressed in HEK293T cells

and splicing was monitored using RT-PCR with primers flanking the introns.

SUPPLEMENTAL REFERENCES :

S1. Rice, P., Longden, I., and Bleasby, A. (2000). EMBOSS: the European Molecular

Biology Open Software Suite. Trends Genet 16, 276-277.

S2. Yang, G., Nagel, D.H., Feschotte, C., Hancock, C.N., and Wessler, S.R. (2009). Tuned

for transposition: molecular determinants underlying the hyperactivity of a Stowaway

MITE. Science 325, 1391-1394.

S3. Tellier, M., Bouuaert, C.C., and Chalmers, R. (2015). Mariner and the ITm

Superfamily of Transposons. Microbiol Spectr 3, MDNA3-0033-2014.

S4. Huff, J.T., Zilberman, D., and Roy, S.W. (2016). Mechanism for DNA transposons to

generate introns on genomic scales. Nature 538, 533-536.

S5. Shi, Y. (2017). The Spliceosome: A Protein-Directed Metalloribozyme. J Mol Biol

429, 2640-2653.

S6. Kondo, Y., Oubridge, C., van Roon, A.M., and Nagai, K. (2015). Crystal structure of

human U1 snRNP, a small nuclear ribonucleoprotein particle, reveals the mechanism

of 5' splice site recognition. Elife 4.