de novo genome sequencing of skeletonema marinoi and ... · photo: per johander skeletonema marinoi...

1
De novo genome sequencing of Skeletonema marinoi and Surirella brebissonii Mats Töpel¹*, Magnus Alm Rosenblad¹, Ulrika Lind¹, Susanna Gross², Sandra Karlsten¹, Jens Persson¹, Mattias Backman¹, Anna Godhe², Anders Blomberg¹ 1. Department of Chemistry and Molecular Biology, University of Gothenburg 2. Department of Biology and Environmental Sciences, University of Gothenburg *[email protected] Introduction De novo whole genome sequencing of the two diatom species Suri- rella brebissonii (CCMP2919) and Skeletonema marinoi (GUMACC St54) are currently conducted as part of the Linnaeus Centre for Marine Evolutionary Biology (CeMEB) initiative at the University of Gothenburg. This work is part of the Infrastructure for Marine Genetic model Organisms (IMAGO) project, aimed at developing new marine model systems and provide genomic and genetic tools to study vital phenomena and components of coastal marine ecosystems. Protein translocation in diatoms A chloroplast’s genome only encodes ~100 proteins, but the organ- elle requires many more proteins in order to perform its functions in the cell. Translocons at the Outer and Inner Chloroplast enve- lope membranes (TOC and TIC, respectively) are the two multi- protein complexes in plants, red- and green algae that enable chlo- roplasts to import these essential nuclear-encoded proteins. Diatom plastids, on the other hand, are surrounded by four mem- branes where the outermost is continuous with the endoplasmic reticulum (ER) [8]. The second membrane (known as the periplas- tid membrane [PPM]) is the remnant of the secondary endosymbiont’s plasma membrane (proposed to be of red algal origin) [9]. The two innermost membranes are homologous to the outer and inner envelope membranes in plant plastids and are de- rived from the membranes surrounding the cyanobiont of primary plastids [10]. The identity of the TOC and TIC translocons in diatoms and most other chromalveolate organism groups (e.g brown algae, dinofla- gellates and apicomplexan parasites) is mainly unknown. How- ever, bioinformatics analyses of whole genome sequences from dia- toms has shown that these systems are also present in diatoms. Bullman et al. [11] reported the discovery of an Omp85 protein that is localized in the third outermost plastid membrane (homologous to the outer envelope membrane in plants) of the diatom Phaeodactylum tricornutum. To date, this Omp85 protein is the only reported putative member of the TOC complex in diatoms. Our phylogenetic analyses (including bacterial, plant and diatom sequences) reveals that the diatom sequences are of red algal origin and more specifically belongs to the Toc75 gene family. Unexpectedly long branches in the diatom part of the tree indicates a rapid, albeit even, evolu- tionary rate. This phenomenon has been reported on previously, but the significance of the phenomenon has not yet been thor- oughly investigated. Assembly statistics DNA libraries (insert size 150 and 3000 bp), of which one were gen- erated from an axenic culture, and one RNA library (300 bp) from Surirella brebissonii has been sequenced. One axenic 300 bp library from Skeletonema marinoi has been generated. Both genomes have been assembled using the CLC de novo assembler software package. Sequence reads where preprocessed using cutadapt [1] and the fastx toolkit [2] (for details see http://matstopel.se/notebook). Skeletonema Surirella Total nt sequenced (Gb) 33 46 Total input to assembly (Gb) 26 33 Assembly size (Mb) 49 136 Number of contigs (K) 53 244 Average coverage (x/cont) 443 207 N50 (bp) 1673 694 Average contig lenght (bp) 929 557 Longest contig (Kb) 506* 76 *Putative bacterial symbiont. Preliminary findings Organelles - The plastid of Surirella brebissonii contains a group II intron with an ORF, the first group II intron to be identified in dia- toms. Interestingly, the mitochondrial genome of S. brebissonii has lost the group II intron present in both Thalassiosira pseudonana and Phaeodactylum tricornutum mtDNA. Three putative components of the Translocon at the Outer Chloro- plast envelope membrane (TOC) have been identified in S. brebissonii and one in S. marinoi. Cell wall - Six silicon transporter genes (SIT’s) have been predicted to be present in the S. brebissonii genome, and two in S. marinoi. Bio- informatics analyses have also identified centric and pennate specific motives in these sequences. Frustulin and Silaffin/Cingulin proteins, that also are involved in diatom cell wall biogenesis, have been identified in both genomes, and preliminary analyses have found novel motifs in these sequences. Phylogenetic analysis of the OMP85 superfamily that includes the Toc75 gene family of channel proteins. Preliminary analyses using data from Töpel et al. [4] as query sequences have identified at least one protein from the Omp85 superfamily, in the genomes of Surire- lla brebissonii and Skeletonema marinoi, respectively. Identified contigs where translated in all six reading frames, using the program getorf [5], and aligned to the query dataset using MAFFT [6]. Correct reading frames identified in this way were then used in BLAST searches of the publicly available diatom gene predictions, and subsequently anal- ysed together using MrBayes 3.2 [7]. Surirella brebissonii is an assymetric pennate bentic diatom which is approximately 45 um long and mostly found in brackish water. It was selected for sequencing because of its rather large size and assymetric form. It has since long been used for studies on chromosome separation. Photo: Per Johander Skeletonema marinoi is a main primary producer during spring blooms in the North Atlantic and a valuable food source for zooplankton. Its generation time is 24 hours, which makes it ideal for studies of pheno- typic response. Benthic cells act as resting stages, with up to 50 000 per gram of sediment, and can survive for at least hundred years and thereby provide short-term evolutionary archives in sediments. Photo: Anna Godhe. Evolutionary relationship between genera where whole genome data (WGS) is available. Albeit sparse (the number of diatoms have been estimated to ~200 000 species [3]), these seven species constitutes a broad phylogenetic sample from the diatom tree of life, covering many large morphological groups. Access to WGS data from either of the two groups Coscinodiscophycidae or Rhizosoleniophycidae would however signifi- cantly help improve our understanding of diatom evolution by including the crown node of the group in the analyses. Tree modified from [12]. Coscinodiscophycidae Fragilariopsis Phaeodactylum Pseudonitzschia Rhizosoleniophycidae Thalassiosira Surirella Skeletonema Radial Centrics Bi(multi)polar Centrics Raphid Pennates The chloroplast protein translocation machinery in plants. The prepro- tein (black line) is first recognised by one of the TOC receptors (green), and subsequently transported through the Toc75 channel, and the TIC complex, to the chloroplast stroma. The identity of the TOC and TIC translocons in diatoms and most other chromalveolates is mainly unknown. Numbers indicate the names of the proteins. Graphics: Paula Töpel. T O C OE M I E M I MS T I C Cy t os ol S tr oma Hs p70 Hs p70 Hs p60 64 12 34 22 62 55 32 SPP 20 40 75 159 159 21 110 Hsp93 References 1. https://code.google.com/p/cutadapt/. 2. http://hannonlab.cshl.edu/fastx_toolkit/. 3. Bowler C., Vardi A., Allen A.E. (2010). Oceanographic and Biogeochemical Insights from Diatom Genomes. Annu. Rev. Mar. Sci. 2, 333–65. 4. Töpel, M., Ling Q. and Jarvis, P. (2012) Neofunctionalization within the Omp85 protein superfamily during chloroplast evolution. Plant Signaling and Behaviour. 7:2. 5. http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html. 6. Katoh, Standley (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. & Evol. 30, 772780. 7. Huelsenbeck JP, Ronquist F. (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17(8), 754755. 8. Gibbs, S. P. (1981) The chloroplast endoplasmic reticulum: structure, function, and evolutionary significance. Int. Rev. Cytol. 72, 49–99. 9. CavalierSmith T. (2003) Genomic reduction and evolution of novel genetic membranes and proteintargeting machinery in eukaryoteeukaryote chimaeras (metaalgae). Philos. Trans. R. Soc. Lond. B. Biol. Sci. 358, 109–134. 10. Palmer, J. D. (2003) The symbiotic birth and spread of plastids: how many times and whodunit? J. Phycol. 39, 1–9. 11. Bullmann L., Haarmann R., Mirus O., Bredemeier R., Hempel F., Maier U. G., Schleiff E. (2010) Filling the Gap, Evolutionarily Conserved Omp85 in Plastids of Chromalveolates. J. Biol. Chem. 285, 68486856. 12. Sorhannus U. (2004) Diatom phylogenetics inferred based on direct optimization of nuclearencoded SSU rRNA sequences. Cladistics 20, 487–497. 0.4 Cyanobacteria Plant OEP80 Diatom Toc75 Plant Toc75 Microcoleus_vaginatus Oscillatoria_sp Cyanothece_sp Thermosynechococcus_elongatus Gloeobacter_violaceus Brachypodium_distachyon Oryza_sativa Physcomitrella_patens Arabidopsis_thaliana (atToc75V) Brachypodium_distachyon Selaginella_moellendorffii Arabidopsis_lyrata Populus_trichocarpa Volvox_carteri Arabidopsis_thaliana Aquilegia_coerulea Zea_mays Selaginella_moellendorffii Aquilegia_coerulea Physcomitrella_patens Arabidopsis_lyrata Chlamydomonas_reinhardtii Arabidopsis_thaliana Aquilegia_coerulea Oryza_sativa Zea_mays Ricinus_communis Surirella_toc75 Thalassiosira_pseudonana Phaeodactylum_2 Pseudonitzschia Fragilariopsis Pseudonitzschia_2 Skeletonema Phaeodactylum Thalassiosira_oceanica Cyanidioschyzon_merolae Arabidopsis_lyrata Galdieria_sulphuraria Arabidopsis_thaliana (atToc75III) Arabidopsis_lyrata Oryza_sativa Volvox_carteri Pisum_sativum Arabidopsis_thaliana (atToc75I) Oryza_sativa Aquilegia_coerulea Physcomitrella_patens Physcomitrella_patens Selaginella_moellendorffii Brachypodium_distachyon Zea_mays Brachypodium_distachyon Chlamydomonas_reinhardtii Physcomitrella_patens Arabidopsis_thaliana (atToc75IV) Arabidopsis_lyrata Primary endosymbiosis Secondary endosymbiosis Gene duplication

Upload: others

Post on 12-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: De novo genome sequencing of Skeletonema marinoi and ... · Photo: Per Johander Skeletonema marinoi is a main primary producer during spring blooms in the North Atlantic and a valuable

De novo genome sequencing ofSkeletonema marinoi and Surirella brebissonii

Mats Töpel!*, Magnus Alm Rosenblad!, Ulrika Lind!, Susanna Gross", Sandra Karlsten!,Jens Persson!, Mattias Backman!, Anna Godhe", Anders Blomberg!

1. Department of Chemistry and Molecular Biology, University of Gothenburg2. Department of Biology and Environmental Sciences, University of Gothenburg

*[email protected]

Introduction

De novo whole genome sequencing of the two diatom species Suri-rella brebissonii (CCMP2919) and Skeletonema marinoi (GUMACC St54) are currently conducted as part of the Linnaeus Centre for Marine Evolutionary Biology (CeMEB) initiative at the University of Gothenburg. This work is part of the Infrastructure for Marine Genetic model Organisms (IMAGO) project, aimed at developing new marine model systems and provide genomic and genetic tools to study vital phenomena and components of coastal marine ecosystems.

Protein translocation in diatoms

A chloroplast’s genome only encodes ~100 proteins, but the organ-elle requires many more proteins in order to perform its functions in the cell. Translocons at the Outer and Inner Chloroplast enve-lope membranes (TOC and TIC, respectively) are the two multi-protein complexes in plants, red- and green algae that enable chlo-roplasts to import these essential nuclear-encoded proteins.

Diatom plastids, on the other hand, are surrounded by four mem-branes where the outermost is continuous with the endoplasmic reticulum (ER) [8]. The second membrane (known as the periplas-tid membrane [PPM]) is the remnant of the secondary endosymbiont’s plasma membrane (proposed to be of red algal origin) [9]. The two innermost membranes are homologous to the outer and inner envelope membranes in plant plastids and are de-rived from the membranes surrounding the cyanobiont of primary plastids [10].

The identity of the TOC and TIC translocons in diatoms and most other chromalveolate organism groups (e.g brown algae, dinofla-gellates and apicomplexan parasites) is mainly unknown. How-ever, bioinformatics analyses of whole genome sequences from dia-toms has shown that these systems are also present in diatoms. Bullman et al. [11] reported the discovery of an Omp85 protein that is localized in the third outermost plastid membrane (homologous to the outer envelope membrane in plants) of the diatom Phaeodactylum tricornutum.

To date, this Omp85 protein is the only reported putative member of the TOC complex in diatoms. Our phylogenetic analyses (including bacterial, plant and diatom sequences) reveals that the diatom sequences are of red algal origin and more specifically belongs to the Toc75 gene family. Unexpectedly long branches in the diatom part of the tree indicates a rapid, albeit even, evolu-tionary rate. This phenomenon has been reported on previously, but the significance of the phenomenon has not yet been thor-oughly investigated.

Assembly statistics

DNA libraries (insert size 150 and 3000 bp), of which one were gen-erated from an axenic culture, and one RNA library (300 bp) from Surirella brebissonii has been sequenced. One axenic 300 bp library from Skeletonema marinoi has been generated. Both genomes have been assembled using the CLC de novo assembler software package. Sequence reads where preprocessed using cutadapt [1] and the fastx toolkit [2] (for details see http://matstopel.se/notebook).

  Skeletonema   SurirellaTotal  nt  sequenced  (Gb)   33   46Total  input  to  assembly  (Gb)   26   33Assembly  size  (Mb)   49   136Number  of  contigs  (K)   53   244Average  coverage  (x/cont)   443   207N50  (bp)   1673   694Average  contig  lenght  (bp)   929   557Longest  contig  (Kb)   506*   76*Putative  bacterial  symbiont.

Preliminary findings

Organelles ! The plastid of Surirella brebissonii contains a group II intron with an ORF, the first group II intron to be identified in dia-toms. Interestingly, the mitochondrial genome of S. brebissonii has lost the group II intron present in both Thalassiosira pseudonana and Phaeodactylum tricornutum mtDNA.Three putative components of the Translocon at the Outer Chloro-plast envelope membrane (TOC) have been identified in S. brebissonii and one in S. marinoi.

Cell wall ! Six silicon transporter genes (SIT’s) have been predicted to be present in the S. brebissonii genome, and two in S. marinoi. Bio-informatics analyses have also identified centric and pennate specific motives in these sequences. Frustulin and Silaffin/Cingulin proteins, that also are involved in diatom cell wall biogenesis, have been identified in both genomes, and preliminary analyses have found novel motifs in these sequences.

Phylogenetic analysis of the OMP85 superfamily that includes the Toc75 gene family of channel proteins. Preliminary analyses using data from Töpel et al. [4] as query sequences have identified at least one protein from the Omp85 superfamily, in the genomes of Surire-lla brebissonii and Skeletonema marinoi, respectively. Identified contigs where translated in all six reading frames, using the program getorf [5], and aligned to the query dataset using MAFFT [6]. Correct reading frames identified in this way were then used in BLAST searches of the publicly available diatom gene predictions, and subsequently anal-ysed together using MrBayes 3.2 [7].

Surirella brebissonii is an assymetric pennate bentic diatom which is approximately 45 um long and mostly found in brackish water. It was selected for sequencing because of its rather large size and assymetric form. It has since long been used for studies on chromosome separation.Photo: Per Johander

Skeletonema marinoi is a main primary producer during spring blooms in the North Atlantic and a valuable food source for zooplankton. Its generation time is 24 hours, which makes it ideal for studies of pheno-typic response. Benthic cells act as resting stages, with up to 50 000 per gram of sediment, and can survive for at least hundred years and thereby provide short-term evolutionary archives in sediments.Photo: Anna Godhe.

Evolutionary relationship between genera where whole genome data (WGS) is available. Albeit sparse (the number of diatoms have been estimated to ~200 000 species [3]), these seven species constitutes a broad phylogenetic sample from the diatom tree of life, covering many large morphological groups. Access to WGS data from either of the two groups Coscinodiscophycidae or Rhizosoleniophycidae would however signifi-cantly help improve our understanding of diatom evolution by including the crown node of the group in the analyses. Tree modified from [12].

Coscinodiscophycidae

Fragilariopsis

Phaeodactylum

Pseudo-­nitzschia

Rhizosoleniophycidae

Thalassiosira

Surirella

Skeletonema

Radial  Centrics

Bi(multi)polar  Centrics

Raphid  Pennates

The chloroplast protein translocation machinery in plants. The prepro-tein (black line) is first recognised by one of the TOC receptors (green), and subsequently transported through the Toc75 channel, and the TIC complex, to the chloroplast stroma. The identity of the TOC and TIC translocons in diatoms and most other chromalveolates is mainly unknown. Numbers indicate the names of the proteins. Graphics: Paula Töpel.

TOC

OEM

IEMIM

S

TIC

Cytoso

lStro

ma

Hs p 70

Hs p 70Hs p 60

6412

34

22

625532

SPP

2040

75

159

159

21110

Hsp93

References1.  https://code.google.com/p/cutadapt/.    2.  http://hannonlab.cshl.edu/fastx_toolkit/.    3. Bowler  C.,  Vardi  A.,  Allen  A.E.  (2010).  Oceanographic  and  Biogeochemical  Insights  from  Diatom  Genomes.  Annu.  Rev.  Mar.  Sci.  2,  333–65.    4.  Töpel,  M.,  Ling  Q.  and  Jarvis,  P.  (2012)  Neofunctionalization  within  the  Omp85  protein  superfamily  during  chloroplast  evolution.  Plant  Signaling  and  Behaviour.  7:2.    5.  http://emboss.sourceforge.net/apps/cvs/emboss/apps/getorf.html.    6.  Katoh,  Standley  (2013)  MAFFT  multiple  sequence  alignment  software  version  7:  improvements  in  performance  and  usability.  Mol.  Biol.  &  Evol.  30,  772-­780.    7.  Huelsenbeck  JP,  Ronquist  F.  (2001)  MRBAYES:  Bayesian  inference  of  phylogenetic  trees.  Bioinformatics  17(8),  754-­755.    8.  Gibbs,  S.  P.  (1981)  The  chloroplast  endoplasmic  reticulum:  structure,  function,  and  evolutionary  significance.  Int.  Rev.  Cytol.  72,  49–99.    9.  Cavalier-­Smith  T.  (2003)  Genomic  reduction  and  evolution  of  novel  genetic  membranes  and  protein-­targeting  machinery  in  eukaryote-­eukaryote  chimaeras  (meta-­algae).  Philos.  Trans.  R.  Soc.  Lond.  B.  Biol.  Sci.  358,  109–134.    10.  Palmer,  J.  D.  (2003)  The  symbiotic  birth  and  spread  of  plastids:  how  many  times  and  whodunit?  J.  Phycol.  39,  1–9.    11.  Bullmann  L.,  Haarmann  R.,  Mirus  O.,  Bredemeier  R.,  Hempel  F.,  Maier  U.  G.,  Schleiff  E.  (2010)  Filling  the  Gap,  Evolutionarily  Conserved  Omp85  in  Plastids  of  Chromalveolates.  J.  Biol.  Chem.  285,  6848-­6856.    12. Sorhannus  U.  (2004)  Diatom  phylogenetics  inferred  based  on  direct  optimization  of  nuclear-­encoded  SSU  rRNA  sequences.  Cladistics  20,  487–497.

0.4

Cyanobacteria

Plant  OEP80

Diatom  Toc75

Plant  Toc75

Microcoleus_vaginatusOscillatoria_spCyanothece_sp

Thermosynechococcus_elongatusGloeobacter_violaceus

Brachypodium_distachyonOryza_sativa

Physcomitrella_patens

Arabidopsis_thaliana  (atToc75-­V)

Brachypodium_distachyon

Selaginella_moellendorffii

Arabidopsis_lyrata

Populus_trichocarpa

Volvox_carteri

Arabidopsis_thalianaAquilegia_coerulea

Zea_mays

Selaginella_moellendorffiiAquilegia_coerulea

Physcomitrella_patens

Arabidopsis_lyrata

Chlamydomonas_reinhardtii

Arabidopsis_thaliana

Aquilegia_coerulea

Oryza_sativa

Zea_mays

Ricinus_communis

Surirella_toc75

Thalassiosira_pseudonana

Phaeodactylum_2

Pseudo-­nitzschiaFragilariopsis

Pseudo-­nitzschia_2

Skeletonema

Phaeodactylum

Thalassiosira_oceanica

Cyanidioschyzon_merolae

Arabidopsis_lyrata

Galdieria_sulphuraria

Arabidopsis_thaliana  (atToc75-­III)Arabidopsis_lyrata

Oryza_sativa

Volvox_carteri

Pisum_sativum

Arabidopsis_thaliana  (atToc75-­I)

Oryza_sativa

Aquilegia_coerulea

Physcomitrella_patensPhyscomitrella_patensSelaginella_moellendorffii

Brachypodium_distachyonZea_maysBrachypodium_distachyon

Chlamydomonas_reinhardtii

Physcomitrella_patens

Arabidopsis_thaliana  (atToc75-­IV)

Arabidopsis_lyrata

Primary  endosymbiosis

Secondary  endosymbiosis

Gene  duplication