comprehensive genome analyses of sellimonas …...2020/04/14  · 1 comprehensive genome analyses of...

28
Comprehensive genome analyses of Sellimonas intestinalis, a potential 1 biomarker of homeostasis gut recovery 2 Marina Muñoz a,b, Enzo Guerrero-Araya 1,2, Catalina Cortés-Tapia 1,2 , Ángela 3 Plaza-Garrido 1,2 , Trevor D. Lawley 3 and Daniel Paredes-Sabja 1,2 * 4 a Microbiota-Host Interactions and Clostridia Research Group, Departamento de 5 Ciencias Biológicas, Facultad de Ciencias de la Vida, Universidad Andrés Bello, 6 Santiago, Chile 7 b Millennium Nucleus in the Biology of Intestinal Microbiota, Santiago, Chile 8 c Host–Microbiota Interactions Laboratory, Wellcome Trust Sanger Institute, Wellcome 9 Genome Campus, Hinxton, United Kingdom 10 11 * Corresponding author: 12 Dr. Daniel Paredes-Sabja, Microbiota-Host Interactions and Clostridia Research Group, 13 Facultad de Ciencias de la Vida, Universidad Andrés Bello, República 330, Santiago, 14 Chile. Tel: 02-770-3955; e-mail: [email protected]. 15 16 Running title: Sellimonas intestinalis comparative genomics 17 18 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint this version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921 doi: bioRxiv preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint this version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921 doi: bioRxiv preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint this version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921 doi: bioRxiv preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint this version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921 doi: bioRxiv preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint this version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921 doi: bioRxiv preprint

Upload: others

Post on 13-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Comprehensive genome analyses of Sellimonas intestinalis, a potential 1

biomarker of homeostasis gut recovery 2

Marina Muñoza,b, Enzo Guerrero-Araya1,2, Catalina Cortés-Tapia1,2, Ángela 3

Plaza-Garrido1,2, Trevor D. Lawley3 and Daniel Paredes-Sabja1,2 * 4

a Microbiota-Host Interactions and Clostridia Research Group, Departamento de 5

Ciencias Biológicas, Facultad de Ciencias de la Vida, Universidad Andrés Bello, 6

Santiago, Chile 7

b Millennium Nucleus in the Biology of Intestinal Microbiota, Santiago, Chile 8

c Host–Microbiota Interactions Laboratory, Wellcome Trust Sanger Institute, Wellcome 9

Genome Campus, Hinxton, United Kingdom 10

11

* Corresponding author: 12

Dr. Daniel Paredes-Sabja, Microbiota-Host Interactions and Clostridia Research Group, 13

Facultad de Ciencias de la Vida, Universidad Andrés Bello, República 330, Santiago, 14

Chile. Tel: 02-770-3955; e-mail: [email protected]. 15

16

Running title: Sellimonas intestinalis comparative genomics 17

18

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 2: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Comprehensive genome analyses of Sellimonas intestinalis, a potential 19

biomarker of homeostasis gut recovery 20

Sellimonas intestinalis is a Gram positive and anaerobic bacterial species 21 previously considered as uncultivable. Although little is known about this 22 Lachnospiraceae family member, its increased abundance has been reported in 23 patients who recovered intestinal homeostasis after dysbiosis events. In this 24 context, the aim of this work was taken advantage of a culturomics protocol that 25 allowed the recovery species extremely oxygen-sensitive from faecal samples, 26 which led to the establishment of an S. intestinalis isolate. Whole genome 27 sequencing and taxonomic allocation confirmation were the base to develop 28 comparative analyses including 11 public genomes closely related. 29 Phylogeographic analysis revealed the existence of three lineages (linage-I 30 including isolates from Chile and France, linage-II from South Korea and Finland, 31 and linage-III from China and one isolate from USA). Pangenome analysis on the 32 established dataset revealed that although S. intestinalis seems to have a highly 33 conserved genome (with 50.1% of its coding potential being part of the 34 coregenome), some recombination signals were evidenced. The identification of 35 cluster of orthologous groups revealed a high number of genes involved in 36 metabolism, including amino acid and carbohydrate transport as well as energy 37 production and conversion, which matches with the metabolic profile previously 38 reported for healthy microbiota. Additionally, virulence factors and antimicrobial 39 resistance genes were found (mainly in linage-III), which could favour their 40 survival during antibiotic-induced dysbiosis. These findings provide the basis of 41 knowledge about this species with potential as a bioindicator of intestinal 42 homeostasis recovery and contribute to advance in the characterization of gut 43 microbiota members with beneficial potential. 44

Keywords: Sellimonas intestinalis; phylogenomic; gut homeostasis; extremely 45 oxygen-sensitive species 46

47

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 3: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Introduction 48

Gut microbiota plays important roles for human and other mammalian species, 49

including: i) the maintenance of the structural integrity of the intestinal epithelial 50

barrier;1 ii) the protection against the proliferation and colonization of enteropathogens;2 51

iii) metabolites production or conversion of substances for the host;3 and iv) the 52

stimulation of normal immune system functionality.4 All these functions are determined 53

by diversity and abundance of microbial taxa that have been associated with host status 54

(e.g. heath/disease, age, geographical origin among other comparison approaches). For 55

this reason, the scientific community has been focusing its efforts on deciphering the 56

composition of the microbial communities that inhabit this ecosystem. 57

Classical techniques to detect and study microorganisms involve in vitro culture, 58

however, it is well known that most species inhabiting human gut cannot be cultured 59

under standard conditions.5 To overcome this limitation, culture-independent DNA-60

based techniques, mainly based on next-generation sequencing (NGS), have been 61

widely used to decipher almost all species at the intestinal level, that is the case of 62

targeted NGS (tNGS) which has become the most popular scheme to depicting 63

microbiota composition, thanks to the use of high-resolution markers to identify the 64

taxonomic units (bacteria as well as eukaryotes and viruses), their variation among 65

individuals or populations, and to infer phylogenetic relationships among the dominant 66

taxa.6 This approach has been complemented with shotgun metagenomics technology, 67

which also leads to describe microbiota composition, but in addition it allows to 68

assembly whole genomes of the dominant taxa and to know the total content of nucleic 69

acids present in the studied environment, which in the case of the gut, could provide 70

informative markers of specific health/disease promoting factors.7 71

Studies based on culture independent NGS have shown that Ruminococcaceae and 72

Lachnospiraceae are the most abundant Clostridial families at gastrointestinal tract of 73

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 4: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

human and other mammals.8, 9 Although species diversity and the role of these two 74

families are being studied, changes in their relative abundance have been observed in 75

dysbiosis, being positively associated with healthy groups.8 In particular, the 76

Lachnospiraceae family has gained interest during the last years due to the ecological 77

adaptations exhibited by some of its species, associated with their capability to produce 78

short-chain fatty acid (SCFA) during glucose fermentation.10 This capability attributed 79

to commensal gut bacteria in healthy individuals11 has led to propose some 80

Lachnospiraceae species as potentially beneficial gut microbiota member; however, 81

few species of this family have been comprehensively studied. 82

One of the Lachnospiraceae species recently identified and poorly studied is Sellimonas 83

intestinalis, a Gram positive and obligately anaerobic bacteria,12 initially considered as 84

part of the gut microbiota fraction that remains uncultivated for its nature extremely 85

oxygen-sensitive ‘EOS’.13 This limitation could have been the cause of the limited 86

number of studies where S. intestinalis has been detected, being almost all aimed to 87

deciphering the microbiome composition from a shotgun metagenomics approach.13, 14 88

In these studies, an increased relative abundance of S. intestinalis was detected in 89

patients which recovered their intestinal homeostasis after suffering dysbiosis caused by 90

chemotherapy treatment against colorectal cancer15 or therapeutic splenectomy of 91

patients with liver cirrhosis.16 These findings suggest the potential of S. intestinalis as a 92

biomarker candidate of gut homeostasis recovery. Conversely, punctual transversal 93

studies have detected an incremented relative abundance of S. intestinalis in individuals 94

with altered gut microbiota associated with chronic kidney disease17 and systemic-onset 95

juvenile idiopathic arthritis.18 However, there are no studies aimed at clarifying the role 96

of S. intestinalis within the intestinal microbiome. 97

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 5: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

A pivotal step to clarify the implication of S. intestinalis in host’s gut homeostasis is to 98

know their genomic organization that allow to identify the genetic bases of their 99

ecological role. However, due to in vitro culture limitations, only eight draft genomes 100

have been obtained to November of 2019 101

(https://www.ncbi.nlm.nih.gov/genome/genomes/41970), that were assembled from 102

shotgun metagenomics data. These genomes have been reported mostly from Eastern 103

countries (China and South Korea), with a single genome reported from America 104

(USA).19 105

For this reason, in this study we have employed a culturomics approach directed to 106

isolate oxygen-sensitive intestinal microbiota species that allowed recovering S. 107

intestinalis. Subsequently, a comprehensive whole genome analysis of this species was 108

carried out to identify its genomic architecture, intra-taxa diversity, genetic population 109

structure, potential metabolic profiles codifying for its genome and the presence of 110

clinically important loci, as Virulence Factor markers (VFm) and antimicrobial 111

resistance genes (AMRg), which could play a detrimental role in the colonization and 112

relative abundance of this species in the complex intestinal environment. This approach 113

represents an initial step to define the genomic bases that could support the role of this 114

species in the intestinal microbiome and their potential as a biomarker of homeostasis 115

gut recovery. 116

Results 117

Isolate establishment and biological source 118

A Gram-positive bacterial isolate with coccoid morphology (Supplementary Figure 1) 119

was stablished under the conditions to recovery of microorganisms extremely sensitive 120

to oxygen at gastrointestinal level standardized by our research group. The biological 121

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 6: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

source of this isolate was a stool sample from a 23-year-old woman that despite being 122

healthy at the time of sample collection, has a diagnosis of idiopathic rheumatoid 123

arthritis, for this reason she was under treatment with Prednisone, a synthetic 124

corticosteroid with glucocorticoid modulation, which support its anti-inflammatory 125

effect and has proven as effective and safe for treatment of patients suffering this 126

pathology.20 The individual consumes in addition Chlorella (a microalgae containing 127

omega-3 fatty acids and carotenoids with antioxidant effect that have been proposed as 128

a potential source of renewable nutrition),21 vitamin E with selenium and Korean 129

ginseng. The individual did not use any antimicrobial treatment during the six months 130

prior to the sample collection. 131

Assembly genome and taxonomic placement 132

The assembly genome obtained showed a length of 3,096,198 base pairs (bp), 133

constituted by 32 contigs with an N50 of 3 and a length of 439,526 bp. The extraction 134

and subsequent comparison of 16S-rRNA sequence revealed that the analyzed genome 135

potentially belongs to one of the following genera: Ruminococcus, Drancourtella or 136

Sellimonas (Supplementary Table 1). The search of reads of this species in ENA 137

database, allowed to find the report for one isolate, that was assembly under the same 138

conditions of the genome analyzed in this study. The analysis of 2,902 genomes of 139

Ruminococcaceae and Lachnospiraceae genomes used for the revision of Clostridiales 140

order study allowed to identify which the analyzed genome and the ENA report makes 141

part of a node well supported which included other 9 genomes of Sellimonas intestinalis 142

(Supplementary Figure 2). These 11 genomes were then considered as the S. intestinalis 143

node. Interestingly, two incongruencies in taxonomic allocation of publicly available 144

genomes were detected, being previously deposited as Ruminococcus sp. DSM-100440 145

and Drancourtella massiliensis GD1, and consistently clustered with the genome set 146

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 7: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

under study (16S-rRNA phylogenetic reconstruction (Fig. 1A) and ANI analysis (Fig. 147

1B)), that hereafter will be treated as part of S. intestinalis node. This well supported 148

node was pruned join the closet node (with 7 genomes), that included mostly 149

Drancourtella genomes, and for that was identified as Drancourtella node. Within this 150

node were also found incongruences in taxonomic allocation, being included two 151

Ruminococcus and one Pseudoflavonifractor genomes (Supplementary Figure 2). Three 152

additional representative genomes clustering in related nodes were included as 153

outgroups (Lachnosclostridium sp. An181, Eubacterium sp. P3177 and 154

Lachnosclostridium sp. An118). Under these parameters a set of 21 assemblies were 155

included in the data set for subsequent analysis. 156

The phylogenetic reconstruction based on 16S-rRNA alignment for the 21 selected 157

genomes showed that the 11 genomes previously assigned to S. intestinalis node remain 158

clustered together (Fig. 1A). These findings were compared with the ANI percentage 159

identity which was higher than 95% for all these 11 S. intestinalis genomes (Fig. 1B), 160

which led to verify that under the traditional criteria to identify microbial species from 161

whole genome data (16S-rRNA and ANI), all 11 assemblies correspond to S. 162

instestinalis (Fig. 1A and 1B). The information on the genomes included in S. 163

intestinalis is described in the Supplementary Table 2. 164

Intra-species diversity and genetic population structure 165

A preliminary BLAST comparison of 11 S. intestinalis selected assemblies revealed a 166

high level of genome conservation; however, some genome regions were differentially 167

present in groups of isolates. The map comparing the complete genomes delimited by 168

these populations is described in the Supplementary Figure 3. As next step, the 169

pangenome analysis of S. instestinalis dataset showed a codifying potential of 4,627 170

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 8: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

genes (Supplementary Table 3), which are almost equally distributed between core 171

genes (n=2,318; 50.1%) and accessory genes (n=2,309; 49.9%). 172

A phylogeographic analysis, based on a Bayesian evolutionary approach, was 173

conducted from coregenome alignment (with 22,453 positions of length) of the 11 174

sequences selected assemblies. The phylogenetic three topology revealed that S. 175

intestinalis could have diversified into at least three major linages with a possible 176

relation by geographical origin (Fig. 2A). The first linage (Linage-I) included isolates 177

from Chile and France, while the second linage (Linage-II) isolates from South Korea 178

and Finland, and the third linage (Linage-III) isolates majority from China and only one 179

from the USA. This population genetic structure was confirmed by the phylogenetic 180

network topology that showed that although there are recombination signatures 181

(indicated by reticulation events observed), the three linages detected by 182

phylogeographic analysis are divergent among their, which supports the hypothesis of 183

the existence of three main populations within this species (Fig. 2B). 184

Cluster of orthologous groups 185

To explore the coding potential of the genome set under analysis, firstly a COGs was 186

developed for both global data set (Fig. 3A) and individual isolates according to the 187

linages to them belong (Fig. 3B). The results showed that this species directs much of 188

the coding potential to essential biological processes such as transcription, translation 189

and replication. However, it can be observed that an important part of their genes could 190

be involved in metabolism, including amino acid and carbohydrate transport as well as 191

energy production and conversion (Fig. 3A). Differential profiles were detected in the 192

identified populations, finding that the linage-I and linage-II clusters have more genes 193

involved in metabolic processes, while linage-III isolates revealed profiles with more 194

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 9: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

genes involved in the cell cycle, intracellular trafficking, secretion, and vesicular 195

transport (Fig. 3B). 196

Virulence factors and antimicrobial resistance genes 197

Considering that about half of the genes coding for this species are part of the accessory 198

genome, we inspect the genes differentially transported by the lineages detected (Fig. 199

4A). This analysis allowed to identify that the clustering in three populations is 200

maintained in the phylogenetic reconstruction based on the accessory genome, as was 201

found in coregenome phylogenetic analysis (Fig. 2A). 202

VFm and AMRg are important loci for survival of bacterial species because could 203

modulate changes in their abundance under different biological contexts and the 204

subsequent transmission dynamics between hosts (Fig. 4B). Although the exhaustive 205

search from both assemblies and reads revealed that the isolate from Chile (linage-I) not 206

carry known VFm nor AMRg, the extended search of this loci from assemblies included 207

in the comparative dataset reveled that the other genome clustering in the same linage-I 208

from France carry rpoB2 marker, associated with resistance to rifampin resistance. 209

rpoB2 was found in all other 9 evaluated genomes. tet(M) marker (associated with 210

tetracycline resistance) was the only one additional marker found in linage-II, being 211

transported by the isolate from Finland. Interestingly, linage-III exhibited the greatest 212

amount of AMRg, being found from 2 to 5 (in the case of AF14-9AC from China) 213

genes per genome. Among the genes with higher frequency were found: tet elements 214

(tet32 and tet(O)), present in five and four genomes, respectively, and cfr(C)_2 215

(conferring linezolid resistance) present in two genomes. In addition, (AGly)Aac6-216

Aph2, associated with aminoglycoside drug class resistance, ermB conferring 217

macrolide-lincosamide-streptogramin antibiotic resistance, and lnuA associated to 218

lincosamide resistance, all of these present in a single genome each one. 219

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 10: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Discussion 220

Recent studies based on amplicon-based sequencing and shotgun metagenomics have 221

contributed to the description of diversity and abundance of gut microbial communities 222

22 and it has even been possible to propose associations with host states 23 and make 223

inferences about the possible functions of specific members of this complex ecological 224

network.24 However, genomic characterization of gut microbiota members represents a 225

challenge to deciphering the genetic bases supporting the biological function of 226

microbial species inhabiting gut, being an essential initial step their recover by in vitro 227

culture, that have an increased complexity for EOS species.25 For this reason, this work 228

describes the isolation and genomic features of S. intestinalis, a understudied 229

Lachnospiraceae species recovered during a culturomics approach directed to recover 230

EOS species within microbiome environment. 231

During a genomic characterization study is essential a precise taxonomic allocation of 232

target genomes and those included in the comparative dataset to avoid mistakes in the 233

biological inferences. In this study, inconsistencies in taxonomic classification were 234

detected at different levels: i) in the allocation of species to families with little 235

phylogenetic relationship, as is the case of Clostridium difficile that had been included 236

within the Clostridiaceae family, but after detailed analysis of the Phylogenetic 237

relationships were classified within the Peptostreptococcaceae family,26 or ii) in the 238

taxonomic assignment of individuals, as revealed even before this work for S. 239

intestinalis, which in other works had previously been detected as Ruminococcus, but 240

later of the sequencing of its complete genome, it was correctly assigned.13 These types 241

of findings reveal limitations in the traditional analysis schemes of complete genome 242

data and make clear the need for further studies that lead to clarify the classification of 243

under-studied anaerobic families. 244

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 11: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

The study of genetic population structure represents an important tool to determine the 245

population sizes, dispersal potential and evolutionary rates lead over geographical scales 246

during characterization of a microbial species.27 For the case of S. intestinalis, the 247

limited number of isolates analyzed represents a limitation, for example, some 248

remarkable profiles were identified, as is the case of the Sin-II and Sin-III lineages in 249

which the isolates that comprise them show a slight degree of divergence (Fig. 2A and 250

2B), so that the increased number of individuals could lead to the identification of 251

independent populations. These findings were supported by pangenome results that 252

revealed that despite the limited number of genes of core genome (n=2,318, 50.1%), 253

this could be a first indicator of the high intra-taxa diversity of this species. This type of 254

finding has been detected in species such as Pseudomonas aeruginosa,28 a species of 255

interest in health that exhibits a high frequency of gene loss and gain. The pangenome 256

data also allowed the evaluation of phylogenetic relationships from the coregenome 257

(Fig. 2A), which led to the detection of three potential linages, which were subsequently 258

ratified through the construction of phylogenetic networks (Fig. 2B), which showed that 259

despite the potential recombination events, supported by crosslinking in the networks, 260

these populations could be highly divergent among them. Interestingly, possible 261

common geographical origins were identified among the observed populations, with the 262

exception of the USA isolate that clustering with isolates from China (Sin-III linage), 263

that could be attributed to migration of human population, as has been identified for 264

other pathogens.29 265

The effect of specific members over gut microbiome composition have been attributed 266

mainly to their metabolic profiling in which some subproducts can stimulate specific 267

process in the complex gut environment.30 For that reason, the metabolic profiling of S. 268

intestinalis from whole genome data using COGs analysis was determinant to 269

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 12: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

deciphering that the genes required for the survival of the bacteria (Fig. 3A), the genes 270

associated with amino acid and carbohydrates transport and metabolism were highly 271

frequent in this species. The individual analyzes grouped by the detected populations, 272

showed that the isolates of oriental origin (linage-III), codify a greater number of genes 273

of AMRg than the other geographical origins. These results are of importance, because 274

these types of genes are determinants for the realization of metabolic processes that lead 275

to the production of short-chain fatty acid (SCFA) during glucose fermentation,10 276

mainly butyric acid,31 which It has been proposed as one of the characteristics of the 277

candidate members for healthy microbiota.11 In the specific case of the patient from 278

whom the isolate studied in this work was obtained despite the immunosuppression 279

caused as a possible consequence of the anti-inflamatory effect of Prednisone (the 280

corticosteroid consumed as treatment of rheumatoid arthritis)20, the healthy lifestyle 281

habits and the consumption of substances with potential restorative effect of the gut 282

microbiota as Chlorella (the microalgae consumed by the donor individual because has 283

shown potential as antioxidant and treatment for different health conditions)21, they 284

could contribute to the restored effect of the Microbiota, and the presence of S. 285

intestinallis could then be complying with the hypothesis as a biomarker of the recovery 286

of intestinal homeostasis. 287

The isolation and characterization of microbiota members contribute to deciphering the 288

genome bases of their effect in the gut microbial ecology,32 as well as to detect 289

members that potentially play a role as reservoirs of antibiotic resistance.33 In the 290

particular case of S. intestinalis, several genes associated with antibiotic resistance, such 291

as rpoB2 in most of the analyzed isolates, and mobile genetic elements, such as the 292

family of tet AMRg (Fig. 4). These findings could represent the basis for the survival of 293

this species at the intestinal level, despite the adverse conditions that this niche naturally 294

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 13: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

represents. Additionally, it could explain its role as a biomarker, after the presentation 295

and restoration of homeostasis after dysbiosis generated by different causes. The 296

relevance of these findings should be subsequently confirmed through the application of 297

phenotypic tests that lead to identify the minimum inhibitory concentrations at which 298

the proliferation of this species is inhibited in vitro. 299

Considering the differential presence of genes biological and clinically relevant among 300

the three S. intestinalis linages found in this work, is necessary to develop future studies 301

conducting to develop a molecular typing method to quickly identify isolates and which 302

contributes to clarifying the phylogenetic relationships and evolutionary history of this 303

species. Additionally, taking into account that this study was aimed at analyzing data 304

from complete genomes of S. intestinalis and no phenotypic tests were performed, it is 305

necessary to carry out further studies that lead to identify the impact of the expression of 306

these VFm and AMRg and their potential role in modulation of the relative abundance 307

of this species under different biotic contexts. Despite this limitation, the identification 308

of these markers could support the hypothesis that some members of the microbiota 309

could fulfill resistance reservoir-function from which bacterial pathogens can acquire 310

resistance is the human gut microbiota,13 generating interest at the health level. This 311

approach represents the first step conducing to the genomic bases that support S. 312

intestinalis survival under conditions of dysbiosis and subsequent proliferation after the 313

homeostasis reestablishment that could play an important role in maintaining the 314

optimal conditions for host development. 315

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 14: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Materials and Methods 316

Sample collection 317

A culturomics approach was applied to stool samples from adult Chilean individuals, 318

within the framework of the project Millennium Nucleus in the Biology of Intestinal 319

Microbiota, aimed at detecting and characterizing the microorganisms that make up the 320

intestinal microbiota of healthy individuals in Latin America, developed by our research 321

team. This project was approved by Comité de Bioética de la Facultad de Ciencias de 322

Vida, Universidad Andrés Bello, through the act 013-2017. All patients enrolled in this 323

study agreed to participate and signed an informed consent form. 324

Bacterial isolate recovery 325

This approach involved the optimization of a protocol for Extremely Oxygen-Sensitive 326

(EOS) intestinal bacteria isolation as follow: stool samples (collected in sterile 327

containers without preservation media) were processed within the first 72 hours. Next, 328

the samples were mechanically homogenized and divided into two fractions that were 329

treated independently. The first (approximately 50%), was washed with 100% ethanol 330

to reach 70% (w/v) and incubated for 4 hours in anaerobiosis. The biological material 331

was then precipitated by centrifugation, to discard the ethanol, and then washed twice 332

with sterile molecular grade water. The second fraction of the sample was processed 333

without washing. The two fractions were weighed and then independently resuspended 334

in sterile 1X PBS (1mL per 100mg of feces), to be then serially diluted (from 10-1 to 10-335

5 for the sample washed with ethanol and from 10-1 to 10-8 for the sample processed 336

directly). Each dilution of the two treatments was seeded in duplicate on the complex 337

and broad-range YCFA medium,34 in two formats: traditional or supplemented with 338

taurocholate (Winckler) (0.1% v/v). Finally, they were incubated for 72-96 hours at 37 339

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 15: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

°C under anaerobic conditions. The manipulation and incubation of samples were 340

conducted in an anaerobic chamber Bactron EZ2 (ShellLab). 341

The colony forming units (CFU) obtained were streaked on YCFA plates, and after 24-342

48 hours of incubation under the conditions described, and then their quality and 343

morphology were evaluated by classical microbiological techniques (macro and 344

microscopic observation). The verified colonies were propagated in liquid YCFA 345

medium to increase their biomass to establish the isolates, using the same incubation 346

conditions. This isolate was named 6K002. 347

DNA extraction and whole genome sequencing (WGS) 348

The biomass recovered from isolate incubation in broth medium was subjected to DNA 349

extraction using the commercial kit Wizard® Genomic DNA Purification Kit (Promega 350

Corporation, Madison, WI, USA), following the manufacturer's recommendations. The 351

DNA sequencing was carried out by Wellcome Trust Sanger Institute on an Illumina 352

HiSeq 2000 platform, with a read length of 100 bp, under the “developing and 353

implementing an institute-wide data sharing policy following conditions”.35 354

Genome assembly and quality control verification 355

The reads obtained from WGS were De novo assembled using Unicycler v0.4.8, an 356

assembly pipeline for bacterial genomes defined as a SPAdes-optimiser (Spades 357

v3.13.1) which generates the best possible assembly,36 using parameters by default. The 358

quality of the genome assembly was evaluated using the GenomeQC_Filter_v1-5 script, 359

37 which considers as parameters the maximum number of contigs per genome (fixed to 360

400) and a maximum size of each genome (considering 8 Mbp) and then extracts the 361

small subunit 16S rRNA gene sequences (16S-rRNA). 362

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 16: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Taxonomic placement and data retrieval 363

Initially, the 16S-rRNA sequence previously extracted was used to search sequence 364

similarity against the data available in public datasets using the BLASTn algorithm 365

[39], results that were subsequently verified by 16S-rRNA sequence alignment using 366

SILVA Incremental Aligner (SINA) service.38 367

Next, a dataset with 2,902 Ruminococcaceae and Lachnospiraceae genome assemblies, 368

publicly available in PATRIC,39, 40 ENA41 and NCBI42 databases and which passed the 369

assembly quality test previously described, were analyzed to identify the genomes most 370

related with the analyzed assembly. This dataset makes part of a parallel work of our 371

research team directed to evaluate the phylogenetic relationships of Clostridiales order. 372

Parallelly, a search of reads for ‘Sellimonas’ genus was conducted in the European 373

Nucleotide Archive (https://www.ebi.ac.uk/ena/data/search?query=Sellimonas), with 374

the aim of recovering the greatest number of genomes for analysis. The obtained reads 375

were subject to the genome assembly and quality control verification methodology 376

describe in the previous section. 377

The complete genome dataset was the base to select the node closely related to the 378

analyzed genomes, throughout phylogenetic reconstruction based on 16S-rRNA 379

sequence under the parameters described in the corresponded section. The set of 380

assemblies selected were subjected to a step of delimiting species using average identity 381

of nucleotides (ANI),43 using pyANI 0.2.10, a Python3 module and script that provides 382

support for calculating average nucleotide identity (ANI) and related measures for 383

whole genome comparisons, and rendering relevant graphical summary output 384

(https://github.com/widdowquinn/pyani).44 pyANI analyses was developed using blast 385

and other settings by default. Scores of ANI higher than 95.0%, were used to verify that 386

the genomes belong to the same species. 387

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 17: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

A graphical map of the genome assemblies identified as belonging to the same species 388

of studied genome, was built in the CGview server,45 where a comparison was made in 389

pairs to identify the differences between the genomes, using the tool based on the 390

BLAST algorithm, included inside the server. 391

Annotation and pangenome analysis 392

An automated annotation pipeline was applied to the complete set of evaluated 393

genomes. This pipeline is based on Prokka v1.13,46 as follows: Infernal v1.1.247 was run 394

to predict RNA structures, followed by an analysis in Prodigal v2.6.348 to predict 395

proteins. Aragorn v1.2.3849 was used to predict tRNAs and tmRNAs, and Rnammer50 396

was used to predict ribosomal RNAs. All predicted genes were then annotated 397

throughout databases search following this order: genus specific databases were 398

generated by retrieving the annotation from RefSeq.51 The protein sequences were then 399

merged using CD-hit version 4.8.152 to produce a non-redundant blast protein database. 400

Next UniprotKB/SwissProt53 was searched, considering kingdom specific databases for 401

Bacteria. The complete set of genomes evaluated was submitted to the aforementioned 402

annotation pipeline. 403

As a next step, the pangenome was determined using the Roary tool version 3.11.2,54 404

taking as definition of coregenome a percentage identity of 95% using Protein-Protein 405

BLAST 2.9.0+ and the presence in 99% of the analyzed genomes. 406

Phylogeographic analyses 407

The phylogenetic relationships among Ruminococcaceae and Lachnospiraceae 408

assemblies was evaluated to identify the data most closely related with the studied 409

genome. For that, the 16S-rRNA sequences extracted during the quality control 410

verification step were aligned using MAFFT v7.40755 using parameters by default and 411

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 18: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

then an approximately-maximum-likelihood phylogenetic tree was built in FastTree 412

double precision version 2.1.1056 with settings by default. The robustness of the nodes 413

was evaluated using the Bootstrap method (BT, with 1,000 replicates). 414

After the definition of dataset to analyze, the phylogenetic relationships among isolates 415

were evaluated using a Bayesian evolutionary approach based on Markov Chain Monte 416

Carlo (MCMC) implemented in Beast v1.10.457 from the pangenome alignment (with a 417

length of 22,453 nucleotides) of the 11 sequences selected assemblies. GTR substitution 418

model was chosen as the best model in jModelTest v0.1.1,58 an uncorrelated relaxed 419

clock model and the skyline population model, were considered initial parameters. 420

Twenty independent MCMC was carried out, each with a chain length of 100,000,000 421

states and resampling every 10,000 states. Log files were summarized with Tree 422

Annotator v2.4.8 [44] using 10% burning. The effective sample size (ESS) were >200 423

for all parameters; convergence and mixing were assessed using trace plot in Tracer 424

v1.7.1.59 Tree files generated were summarized with Tree Annotator v2.4.860 using 10% 425

burning, with maximum clade credibility and node heights at the heights of common 426

ancestors. A node dating step was conducted using isolate metadata (date of isolate and 427

geographic origin). The graphic visualization of all phylogenetic trees was obtained in 428

the web tool Interactive Tree of Life V3 (http://itol.embl.de).61 Additionally, 429

phylogenetic networks were conducted with the aim to detect recombination signatures 430

in the analyzed population. This analyses were carried out in SplitsTree5,62 using 431

neighbor-net method. 432

Codifying potential of S. intestinalis genome 433

The annotation outputs were additionally used to identify Clusters of Orthologous 434

Groups (COG) using eggNOG-mapper v2 under default settings, a tool for fast 435

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 19: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

functional annotations of sequence collections.63 The COG categories were 436

subsequently were represented in a histogram. 437

Virulence factor markers (VFm) and antimicrobial resistance genes (AMRg) were 438

identified from whole genome assembles using Abricate 0.8.4 439

(https://github.com/tseemann/abricate), making BLAST against the sequences 440

previously reported in the following databases: CARD (1,749 sequences, last update: 441

Jul 8, 2017),64 Resfinder (1,749 sequences, last update: Jul 8, 2017),65 NCBI (1,749 442

sequences, last update: Jul 8, 2017),66 ARG-ANNOT (1,749 sequences, last update: Jul 443

8, 2017),67 VFDB (1,749 sequences, last update: Jul 8, 2017)68 and PlasmidFinder 444

(1,749 sequences, last update: Jul 8, 2017)69. Minimum DNA identity of 75% was used 445

as detection thresholds. As a confirmation step about VFm and AMR presence, Ariba 446

(Antimicrobial Resistance Identification By Assembly) version 2.070 was run from reads 447

of the studied isolate. 448

Acknowledgements 449

The authors thank Wellcome Trust Sanger Institute, in particular to the core library and 450

sequencing teams for whole genome sequencing of Sellimonas intestinalis GK002 and 451

pathogen informatics team for the use of several automated pipelines during the 452

processing and analysis of the whole genome sequence data. 453

Financial support 454

This work was supported by: i) EULac project ‘Genomic Epidemiology of Clostridium 455

difficile in Latin America’ (T020076); ii) Fondo Nacional de Ciencia y Tecnología de 456

Chile (FONDECYT Grant 1191601); iii) Fondo de Fomento al Desarrollo Científico y 457

Tecnológico (FONDEF) ID18|10230 to M.P-G and D.P-S and iv) Millennium Science 458

Initiative of the Ministry of Economy, Development and Tourism to D.P-S. 459

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 20: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

References 460

1. Barko PC, McMichael MA, Swanson KS, Williams DA. The Gastrointestinal 461 Microbiome: A Review. J Vet Intern Med 2018; 32:9-25. 462 2. Rinninella E, Raoul P, Cintoni M, Franceschi F, Miggiano GAD, Gasbarrini A, 463 et al. What is the Healthy Gut Microbiota Composition? A Changing Ecosystem across 464 Age, Environment, Diet, and Diseases. Microorganisms 2019; 7. 465 3. Blaut M. Ecology and physiology of the intestinal tract. Curr Top Microbiol 466 Immunol 2013; 358:247-72. 467 4. Eberl G. The microbiota, a necessary element of immunity. C R Biol 2018; 468 341:281-3. 469 5. Carabeo-Perez A, Guerra-Rivera G, Ramos-Leal M, Jimenez-Hernandez J. 470 Metagenomic approaches: effective tools for monitoring the structure and functionality 471 of microbiomes in anaerobic digestion systems. Appl Microbiol Biotechnol 2019; 472 103:9379-90. 473 6. Fraher MH, O'Toole PW, Quigley EM. Techniques used to characterize the gut 474 microbiota: a guide for the clinician. Nat Rev Gastroenterol Hepatol 2012; 9:312-22. 475 7. Mitchell SL, Simner PJ. Next-Generation Sequencing in Clinical Microbiology: 476 Are We There Yet? Clin Lab Med 2019; 39:405-18. 477 8. Lozupone CA, Stombaugh J, Gonzalez A, Ackermann G, Wendel D, Vazquez-478 Baeza Y, et al. Meta-analyses of studies of the human microbiota. Genome Res 2013; 479 23:1704-14. 480 9. Suchodolski JS. Intestinal microbiota of dogs and cats: a bigger world than we 481 thought. Vet Clin North Am Small Anim Pract 2011; 41:261-72. 482 10. Meehan CJ, Beiko RG. A phylogenomic view of ecological specialization in the 483 Lachnospiraceae, a family of digestive tract-associated bacteria. Genome Biol Evol 484 2014; 6:703-13. 485 11. Brestoff JR, Artis D. Commensal bacteria at the interface of host metabolism 486 and the immune system. Nat Immunol 2013; 14:676-84. 487 12. Seo B, Yoo JE, Lee YM, Ko G. Sellimonas intestinalis gen. nov., sp. nov., 488 isolated from human faeces. Int J Syst Evol Microbiol 2016; 66:951-6. 489 13. Versluis D, de JBGT, Zoetendal EG, Passel M, Smidt H. High throughput 490 cultivation-based screening on porous aluminum oxide chips allows targeted isolation 491 of antibiotic resistant human gut bacteria. PLoS One 2019; 14:e0210970. 492 14. Sun Y, Chen Q, Lin P, Xu R, He D, Ji W, et al. Characteristics of Gut 493 Microbiota in Patients With Rheumatoid Arthritis in Shanghai, China. Front Cell Infect 494 Microbiol 2019; 9:369. 495 15. Kong C, Gao R, Yan X, Huang L, He J, Li H, et al. Alterations in intestinal 496 microbiota of colorectal cancer patients receiving radical surgery combined with 497 adjuvant CapeOx therapy. Sci China Life Sci 2019; 62:1178-93. 498 16. Liu Y, Li J, Jin Y, Zhao L, Zhao F, Feng J, et al. Splenectomy Leads to 499 Amelioration of Altered Gut Microbiota and Metabolome in Liver Cirrhosis Patients. 500 Front Microbiol 2018; 9:963. 501 17. Lun H, Yang W, Zhao S, Jiang M, Xu M, Liu F, et al. Altered gut microbiota 502 and microbial biomarkers associated with chronic kidney disease. Microbiologyopen 503 2019; 8:e00678. 504 18. Dong YQ, Wang W, Li J, Ma MS, Zhong LQ, Wei QJ, et al. Characterization of 505 microbiota in systemic-onset juvenile idiopathic arthritis with different disease 506 severities. World J Clin Cases 2019; 7:2734-45. 507

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 21: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

19. Poyet M, Groussin M, Gibbons SM, Avila-Pacheco J, Jiang X, Kearney SM, et 508 al. A library of human gut bacterial isolates paired with longitudinal multiomics data 509 enables mechanistic microbiome research. Nat Med 2019; 25:1442-52. 510 20. Krasselt M, Baerwald C. Efficacy and safety of modified-release prednisone in 511 patients with rheumatoid arthritis. Drug Des Devel Ther 2016; 10:1047-58. 512 21. Barkia I, Saari N, Manning SR. Microalgae for High-Value Products Towards 513 Human Health and Nutrition. Mar Drugs 2019; 17. 514 22. Laudadio I, Fulci V, Palone F, Stronati L, Cucchiara S, Carissimi C. 515 Quantitative Assessment of Shotgun Metagenomics and 16S rDNA Amplicon 516 Sequencing in the Study of Human Gut Microbiome. OMICS 2018; 22:248-54. 517 23. Lynch SV, Pedersen O. The Human Intestinal Microbiome in Health and 518 Disease. N Engl J Med 2016; 375:2369-79. 519 24. Gillings MR, Paulsen IT, Tetu SG. Ecology and Evolution of the Human 520 Microbiota: Fire, Farming and Antibiotics. Genes (Basel) 2015; 6:841-57. 521 25. Lagier JC, Dubourg G, Million M, Cadoret F, Bilen M, Fenollar F, et al. 522 Culturing the human microbiota and culturomics. Nat Rev Microbiol 2018; 16:540-50. 523 26. Galperin MY, Brover V, Tolstoy I, Yutin N. Phylogenomic analysis of the 524 family Peptostreptococcaceae (Clostridium cluster XI) and proposal for reclassification 525 of Clostridium litorale (Fendrich et al. 1991) and Eubacterium acidaminophilum (Zindel 526 et al. 1989) as Peptoclostridium litorale gen. nov. comb. nov. and Peptoclostridium 527 acidaminophilum comb. nov. Int J Syst Evol Microbiol 2016; 66:5506-13. 528 27. Gerstein AC, Jean-Sebastien M. Small is the new big: assessing the population 529 structure of microorganisms. Mol Ecol 2011; 20:4385-7. 530 28. Kung VL, Ozer EA, Hauser AR. The accessory genome of Pseudomonas 531 aeruginosa. Microbiol Mol Biol Rev 2010; 74:621-41. 532 29. Motayo BO, Oluwasemowo OO, Olusola BA, Opayele AV, Faneye AO. 533 Phylogeography and evolutionary analysis of African Rotavirus a genotype G12 reveals 534 district genetic diversification within lineage III. Heliyon 2019; 5:e02680. 535 30. Lin L, Zhang J. Role of intestinal microbiota and metabolites on gut homeostasis 536 and human diseases. BMC Immunol 2017; 18:2. 537 31. Fu X, Liu Z, Zhu C, Mou H, Kong Q. Nondigestible carbohydrates, butyrate, 538 and butyrate-producing bacteria. Crit Rev Food Sci Nutr 2019; 59:S130-S52. 539 32. Suzuki TA, Worobey M. Geographical variation of human gut microbial 540 composition. Biol Lett 2014; 10:20131037. 541 33. van Schaik W. The human gut resistome. Philos Trans R Soc Lond B Biol Sci 542 2015; 370:20140087. 543 34. Browne HP, Forster SC, Anonye BO, Kumar N, Neville BA, Stares MD, et al. 544 Culturing of 'unculturable' human microbiota reveals novel taxa and extensive 545 sporulation. Nature 2016; 533:543-6. 546 35. Dyke SO, Hubbard TJ. Developing and implementing an institute-wide data 547 sharing policy. Genome Med 2011; 3:60. 548 36. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial 549 genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017; 550 13:e1005595. 551 37. Gualtero SM, Abril LA, Camelo N, Sanchez SD, Davila FA, Arias G, et al. 552 [Characteristics of Clostridium difficile infection in a high complexity hospital and 553 report of the circulation of the NAP1/027 hypervirulent strain in Colombia]. Biomedica 554 : revista del Instituto Nacional de Salud 2017; 37:466-72. 555

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 22: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

38. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA 556 ribosomal RNA gene database project: improved data processing and web-based tools. 557 Nucleic acids research 2013; 41:D590-6. 558 39. Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al. 559 PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic acids 560 research 2014; 42:D581-91. 561 40. Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al. 562 Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis 563 Resource Center. PATRIC 3.5.4. Search criteria: Genomes/Clostridium 2017. 564 41. Silvester N, Alako B, Amid C, Cerdeno-Tarraga A, Clarke L, Cleland I, et al. 565 The European Nucleotide Archive in 2017. Nucleic acids research 2018; 46:D36-D40. 566 42. Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL, Schuler GD, et al. 567 Database resources of the National Center for Biotechnology Information. Nucleic acids 568 research 2000; 28:10-4. 569 43. Figueras MJ, Beaz-Hidalgo R, Hossain MJ, Liles MR. Taxonomic affiliation of 570 new genomes should be verified using average nucleotide identity and multilocus 571 phylogenetic analysis. Genome announcements 2014; 2. 572 44. Richter M, Rossello-Mora R. Shifting the genomic gold standard for the 573 prokaryotic species definition. Proc Natl Acad Sci U S A 2009; 106:19126-31. 574 45. Grant JR, Stothard P. The CGView Server: a comparative genomics tool for 575 circular genomes. Nucleic acids research 2008; 36:W181-4. 576 46. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014; 577 30:2068-9. 578 47. Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. 579 Bioinformatics 2013; 29:2933-5. 580 48. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: 581 prokaryotic gene recognition and translation initiation site identification. BMC 582 bioinformatics 2010; 11:119. 583 49. Laslett D, Canback B. ARAGORN, a program to detect tRNA genes and 584 tmRNA genes in nucleotide sequences. Nucleic acids research 2004; 32:11-6. 585 50. Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW. 586 RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic acids 587 research 2007; 35:3100-8. 588 51. Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences 589 (RefSeq): current status, new features and genome annotation policy. Nucleic acids 590 research 2012; 40:D130-5. 591 52. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-592 generation sequencing data. Bioinformatics 2012; 28:3150-2. 593 53. UniProt C. The universal protein resource (UniProt). Nucleic acids research 594 2008; 36:D190-5. 595 54. Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MT, et al. Roary: 596 rapid large-scale prokaryote pan genome analysis. Bioinformatics 2015; 31:3691-3. 597 55. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 598 7: improvements in performance and usability. Molecular biology and evolution 2013; 599 30:772-80. 600 56. Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution 601 trees with profiles instead of a distance matrix. Molecular biology and evolution 2009; 602 26:1641-50. 603

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 23: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

57. Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. 604 Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus 605 Evol 2018; 4:vey016. 606 58. Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models, new 607 heuristics and parallel computing. Nat Methods 2012; 9:772. 608 59. Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior 609 Summarization in Bayesian Phylogenetics Using Tracer 1.7. Syst Biol 2018; 67:901-4. 610 60. Bouckaert R, Heled J, Kuhnert D, Vaughan T, Wu CH, Xie D, et al. BEAST 2: a 611 software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2014; 612 10:e1003537. 613 61. Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the 614 display and annotation of phylogenetic and other trees. Nucleic acids research 2016; 615 44:W242-5. 616 62. Huson DH, Bryant D. Application of phylogenetic networks in evolutionary 617 studies. Molecular biology and evolution 2006; 23:254-67. 618 63. Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, et al. 619 eggNOG 4.5: a hierarchical orthology framework with improved functional annotations 620 for eukaryotic, prokaryotic and viral sequences. Nucleic acids research 2016; 44:D286-621 93. 622 64. Jia B, Raphenya AR, Alcock B, Waglechner N, Guo P, Tsang KK, et al. CARD 623 2017: expansion and model-centric curation of the comprehensive antibiotic resistance 624 database. Nucleic acids research 2017; 45:D566-D73. 625 65. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S, Lund O, et al. 626 Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother 627 2012; 67:2640-4. 628 66. Feldgarden M, Brover V, Haft DH, Prasad AB, Slotta DJ, Tolstoy I, et al. Using 629 the NCBI AMRFinder Tool to Determine Antimicrobial Resistance Genotype-630 Phenotype Correlations Within a Collection of NARMS Isolates. bioRxiv 2019:550707. 631 67. Gupta SK, Padmanabhan BR, Diene SM, Lopez-Rojas R, Kempf M, Landraud 632 L, et al. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes 633 in bacterial genomes. Antimicrob Agents Chemother 2014; 58:212-20. 634 68. Chen L, Zheng D, Liu B, Yang J, Jin Q. VFDB 2016: hierarchical and refined 635 dataset for big data analysis--10 years on. Nucleic acids research 2016; 44:D694-7. 636 69. Carattoli A, Zankari E, Garcia-Fernandez A, Voldby Larsen M, Lund O, Villa L, 637 et al. In silico detection and typing of plasmids using PlasmidFinder and plasmid 638 multilocus sequence typing. Antimicrob Agents Chemother 2014; 58:3895-903. 639 70. Hunt M, Mather AE, Sanchez-Buso L, Page AJ, Parkhill J, Keane JA, et al. 640 ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads. 641 Microb Genom 2017; 3:e000131. 642

643

644

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted April 16, 2020. . https://doi.org/10.1101/2020.04.14.041921doi: bioRxiv preprint

Page 24: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Figure legends 645

Figure 1. Taxonomic allocation analyses of studied genome using a phylogenomic 646

approach. A) Phylogenetic reconstruction based on 16S-rRNA alignment for the 21 647

selected genomes. Sequences were aligned using MAFFT55 and then an approximately-648

maximum-likelihood phylogenetic tree was built in FastTree double precision version 649

2.1.10.56 Interactive Tree of Life V3 (http://itol.embl.de) was used for the graphic 650

visualization.61 Red dots represent Bootstrap ≥ 90.0. B) Average Nucleotide Identity 651

(ANI) analysis for the selected dataset. Two genomes with ANI results higher than 95% 652

belong to the same microbial species. The analysis was developed using pyANI 653

(https://github.com/widdowquinn/pyani). 654

655

1009590

75

50

ANI percentage identity

25

0

B.A.

Sellimonas intestinalis MHI-CRG-PUJ-666

Ruminococcaceae bacterium MGYG-HGUT-02290

Sellimonas intestinalis BR72

Drancourtella sp. An210

Pseudoflavonifractor sp. BSD2780061688

Sellimonas intestinalis BIOML-A1

Ruminococcus sp. DSM-100440

Drancourtella sp. An57

Sellimonas intestinalis AF37-2AT

Eubacterium sp. Marseille-P3177

Sellimonas intestinalis AF37-1

Sellimonas intestinalis AF07-16

Drancourtella sp. An177

Sellimona intestinalis AM38-2BH

Sellimonas intestinalis AM14-42

Drancourtella massiliensis GD1

Sellimonas intestinalis AF14-9AC

Ruminococcus sp. OM05-10BH

Lachnoclostridium sp. An181

Lachnoclostridium sp. An118

Drancourtella sp. An12

Tree scale: 0.01

Lachnoclostridium sp. An181

Eubacterium sp. P3177

Lachnosclostridium sp. An118

Drancourtella sp. An210

Drancourtella sp. An177

Drancourtella sp. An57

Drancourtella sp. An12

Pseudoflavonifractor sp. BSD2780061688

Ruminococcus sp. OM05-10BH

Ruminococcaceae bacterium MGYG-HGUT-02290

Sellimonas intestinalis AF07-16

Sellimonas intestinalis 6K002

Ruminococcus sp. DSM-100440

Drancourtella massiliensis GD1

Sellimonas intestinalis AM38-2BH

Sellimonas intestinalis AF37-1

Sellimonas intestinalis AM14-42

Sellimonas intestinalis AF14-9AC

Sellimonas intestinalis BR72

Sellimonas intestinalis AF37-2AT

Sellimonas intestinalis BIOML-A1

Page 25: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Figure 2. Phylogeographic analysis and phylogenetic networks used to predict the 656

genetic population structure of Sellimonas intestinalis. A) Bayesian evolutionary 657

analysis based on Markov Chain Monte Carlo (MCMC) implemented in BEAST-260 658

carried out from the core genome alignment (with 22,453 positions of length) of the 11 659

sequences selected assemblies. GTR substitution model was chosen as the best model in 660

jModelTest v0.1.1.58 B) phylogenetic network using neighbor-net method conducted in 661

SplitsTree5.62 662

663

664

0 0 0 0 0 0 0.0010.001 0.001

GD1

BR72

DSM-100440

AF07-16

BIOML-A1

AM14-42

AF14-9AC

AM38-2BH

6K002

AF37-2AT, AF37-1

Linage-III

Linage-II

Linage-I

AM14-42 (China)

GD1 (Fran

ce)

AF1

4-9A

C (C

hina

)

AF07-16 (C

hina)

AF37

-1 (C

hina

)

BR72 (South Korea)

DSM-100440 (Finland)

6K002 (Chile)

BIOML-A1 (United states)

AF37-2AT (China)

AM38-2BH

(China)

Tree scale: 100000000A.

B.

Page 26: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Figure 3. Cluster of orthologous groups (COGs) for A) global data set and B) 665

individual isolates. eggNOG-mapper v2 was used as a tool for fast functional 666

annotations of sequence collections.63 667

668

1845

3060

1939

495

706

1758

164

1421

854

1758

2256

2786

950

971

592

318

296

1086

0 500 1000 1500 2000 2500 3000 3500

J: Translation, ribosomal structure and biogenesis

K: Transcription

L: Replication, recombination and repair

D: Cell cycle control, cell division, chromosome partitioning

O: Posttranslational modification, protein turnover, chaperones

M: Cell wall/membrane/envelope biogenesis

N: Cell motility

P: Inorganic ion transport and metabolism

T: Signal transduction mechanisms

C: Energy production and conversion

G: Carbohydrate transport and metabolism

E: Amino acid transport and metabolism

F: Nucleotide transport and metabolism

H: Coenzyme transport and metabolism

I: Lipid transport and metabolism

U: Intracellular trafficking, secretion, and vesicular transport

Q: Secondary metabolites biosynthesis, transport and catabolism

V: Defense mechanisms

GD1

6K002

BR72

DSM-100440

AM14-42

AF07-16

BIOML-A1

AM38-2BH

AF14-9AC

AF37-1

AF37-2AT

164 168 167 168 166 168 168 168 167 172 169

258 273 281 282 280 289 288 266 272 290 281

151 165 186 189 178 183 181 160 173 191 182

50 43 44 44 45 45 45 43 45 46 45

59 64 68 69 63 64 64 63 64 64 64

155 158 158 156 161 164 164 154 155 167 166

13 10 15 15 16 16 16 16 16 16 15

126 129 134 129 128 132 132 130 127 127 127

75 73 77 77 78 82 82 78 77 78 77

159 171 161 160 159 159 159 157 159 157 157

211 203 203 202 206 207 207 206 203 204 204

257 250 262 263 252 253 252 253 238 253 253

88 83 86 87 87 88 88 85 86 86 86

87 89 87 91 90 87 87 88 87 89 89

53 56 50 52 53 55 55 55 55 54 54

23 25 30 27 29 31 31 24 30 35 33

21 26 29 29 28 27 27 28 24 29 28

97 93 97 96 100 105 105 97 95 101 100

B.A.

1653434.7.fna

1653434.9.fna

1653434.60.fna

1653434.5.fna

1671366.3.fna

1632013.5.fna

1653434.6.fna

SRR9222402.fna

1653434.4.fna

1653434.8.fna

ERR2703811.fna

Tree scale: 0.1

1653434.7.fna

1653434.9.fna

1653434.60.fna

1653434.5.fna

1671366.3.fna

1632013.5.fna

1653434.6.fna

SRR9222402.fna

1653434.4.fna

1653434.8.fna

ERR2703811.fna

Tree scale: 0.1

Page 27: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Figure 4. Virulence factors and antimicrobial resistance genes detected in 669

Sellimonas intestinalis genomes. A) phylogenetic reconstruction from accessory 670

genome alignment. B) frequency of markers found in each assembly; and C) presence-671

absence matrix which describe the markers detected in each genome. Abricate 0.8.4 672

(https://github.com/tseemann/abricate) was used to make BLAST against the sequences 673

previously reported in the following databases: CARD,64 Resfinder,65 NCBI,66 ARG-674

ANNOT,67 VFDB68 and PlasmidFinder.69 675

676 677

1 2 3 4 5

1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0

1 0 0 0 0 0 1 0

1 0 0 0 0 1 0 1

1 0 1 0 0 1 0 0

1 0 0 0 0 0 1 0

1 0 0 0 0 1 0 1

1 0 0 1 1 1 0 1

1 1 0 0 0 0 0 1

1 1 0 0 0 0 0 1

lnuA

tet(3

2)

tet(M

)

tet(O

)

Number of markers

rpoB

2

cfr(

C)_

2

(AG

ly)A

ac6-

Aph2

Erm

B

1671366.3.fna

ERR2703811.fna

1632013.5.fna

1653434.60.fna

1653434.9.fna

1653434.8.fna

1653434.7.fna

1653434.4.fna

1653434.5.fna

1653434.6.fna

SRR9222402.fna

Tree scale: 0.1

B. C.A.

Isolate Country

GD1 France

6K002 Chile

BR72 South Korea

DSM-100440 Finland

AM14-42 China

AF07-16 China

BIOML-A1 United states

AM38-2BH China

AF14-9AC China

AF37-1 China

AF37-2AT China

Page 28: Comprehensive genome analyses of Sellimonas …...2020/04/14  · 1 Comprehensive genome analyses of Sellimonas intestinalis, a potential 2 biomarker of homeostasis gut recovery 3

Supplementary material legends 678

Supplementary Figure 1. Microscopic morphology of Sellimonas intestinalis isolate. 679

The results indicate a Gram-positive bacterial with coccoid morphology. 680

Supplementary Figure 2. Phylogenetic reconstruction of 2,902 genomes of 681

Lachnospiraceae and Ruminococcacea genomes based on 16S-rRNA alignment which 682

allowed to define a node well supported which included the studied assembly and other 683

9 genomes. 684

Supplementary Figure 3. Graphical map of the 11 genome assemblies identified as 685

belonging to the same species of studied genome, built in the CGview server.45 686

Supplementary Table 1. Comparison of 16S-rRNA sequence using BLAST which 687

revealed that the analyzed genome belongs to one of the following genera: 688

Ruminococcus, Drancourtella or Sellimonas. 689

Supplementary Table 2. Information of S. intestinalis genomes included in 690

comparative analyses. 691

Supplementary Table 3. Pangenome analysis of S. instestinalis dataset showed a 692

codifying potential of 4,627 genes. 693

694

695