exploring the remarkable diversity of escherichia coli · 1/19/2020  · 74 coli (mg1655, k-12),...

39
Exploring the remarkable diversity of Escherichia coli 1 Phages in the Danish Wastewater Environment, Including 2 91 Novel Phage Species 3 Nikoline S. Olsen 1 , Witold Kot 1,2 * Laura M. F. Junco 2 and Lars H. Hansen 1,2, * 4 1 Department of Environmental Science, Aarhus University, Frederiksborgvej 399, Roskilde, Denmark; 5 [email protected] 6 2 Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsensvej 40, 1871 7 Frederiksberg C, Denmark; [email protected], [email protected] 8 * Correspondence: [email protected] Phone: +45 28 75 20 53, [email protected] Phone: +45 35 33 38 77 9 10 Funding: This research was funded by Villum Experiment Grant 17595, Aarhus University Research Foundation 11 AUFF Grant E-2015-FLS-7-28 to Witold Kot and Human Frontier Science Program RGP0024/2018. 12 Competing interests: The authors declare no competing interests. 13 . CC-BY-NC-ND 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818 doi: bioRxiv preprint

Upload: others

Post on 07-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

Exploring the remarkable diversity of Escherichia coli 1

Phages in the Danish Wastewater Environment, Including 2

91 Novel Phage Species 3

Nikoline S. Olsen 1, Witold Kot 1,2* Laura M. F. Junco2 and Lars H. Hansen 1,2,* 4

1 Department of Environmental Science, Aarhus University, Frederiksborgvej 399, Roskilde, Denmark; 5

[email protected] 6

2 Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsensvej 40, 1871 7

Frederiksberg C, Denmark; [email protected], [email protected] 8

* Correspondence: [email protected] Phone: +45 28 75 20 53, [email protected] Phone: +45 35 33 38 77 9

10 Funding: This research was funded by Villum Experiment Grant 17595, Aarhus University Research Foundation 11 AUFF Grant E-2015-FLS-7-28 to Witold Kot and Human Frontier Science Program RGP0024/2018. 12 Competing interests: The authors declare no competing interests. 13

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 2: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

2

Abstract: Phages drive bacterial diversity - profoundly influencing diverse microbial communities, from 14

microbiomes to the drivers of global biogeochemical cycling. The vast genomic diversity of phages is 15

gradually being uncovered as >8000 phage genomes have now been sequenced. Aiming to broaden our 16

understanding of Escherichia coli (MG1655, K-12) phages, we screened 188 Danish wastewater samples (0.5 17

ml) and identified 136 phages of which 104 are unique phage species and 91 represent novel species, 18

including several novel lineages. These phages are estimated to represent roughly a third of the true diversity 19

of Escherichia phages in Danish wastewater. The novel phages are remarkably diverse and represent four 20

different families Myoviridae, Siphoviridae, Podoviridae and Microviridae. They group into 14 distinct clusters 21

and nine singletons without any substantial similarity to other phages in the dataset. Their genomes vary 22

drastically in length from merely 5 342 bp to 170 817 kb, with an impressive span of GC contents ranging 23

from 35.3% to 60.0%. Hence, even for a model host bacterium, in the go-to source for phages, substantial 24

diversity remains to be uncovered. These results expand and underlines the range of Escherichia phage 25

diversity and demonstrate how far we are from fully disclosing phage diversity and ecology. 26

Keywords: bacteriophage; wastewater; Escherichia coli; diversity; genomics; taxonomy; coliphage 27

28

1. Introduction 29

Phages are important ecological contributors, they renew organic matter supplies in nutrient cycles and 30

drive bacterial diversity by enabling co-existence of competing bacteria by “Killing the winner” and by serving 31

as genomic reservoirs and transport units [1,2]. Phage genomes are known to entail auxiliary metabolism 32

genes (AMGs), toxins, virulence factors and even antibiotic resistance genes [3–7] and through lysogeny and 33

transduction they can transfer metabolic traits to their hosts and even confer immunity against homologous 34

phages [1]. Still, in spite of their ecological role, potential as antimicrobials and the fact that they carry a 35

multitude of unknown genes with great potential for biotechnological applications, phages are vastly 36

understudied. Less than 9000 phage genomes have now been published, and though the number increases 37

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 3: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

3

rapidly, we may have merely scratched the surface of the expected diversity [8]. It has been estimated that at 38

least a billion bacterial species exist [9], hence only phages targeting a tiny fraction of potential hosts have been 39

reported. Efforts to disclose the range and diversity of phages targeting a single host, have revealed a stunning 40

display of diversity. The most scrutinized phage host is the Mycobacterium smegmatis, for which the Science 41

Education Alliance Phage Hunters program has isolated more than 4700 phages and fully sequenced 680 42

phages which represent 30 distinct phage clusters [10,11]. This endeavour has provided a unique insight into 43

viral and host diversity, evolution and genetics [12–15]. No other phage host has been equally targeted, but 44

Escherichia coli phages have been isolated in fairly high numbers. The International Committee on Taxonomy 45

of Viruses (ICTV) has currently recognised 158 phage species originally isolated on E. coli [16], while 2732 46

Escherichia phage genomes have been deposited in GenBank [17]. As phages are expected to have an 47

evolutionary potential to migrate across microbial populations, host species may not be an ideal indicator of 48

relatedness, but it serves as an excellent starting point to explore phage diversity. Hierarchical classification 49

of phages is complicated by the high degree of horizontal gene transfer, consequently several classification 50

systems have been proposed [18–20], and we may not yet have reached a point where it is reasonable to 51

establish the criteria for a universal system [20]. Nonetheless, a system enabling a mutual understanding and 52

exchange of knowledge is needed. Accordingly, we have in this study chosen to classify our novel phages 53

according to the ICTV guidelines [21]. 54

Here we aim to expand our understanding of Escherichia phage diversity. Earlier studies are few, have 55

relied on more laborious and time-consuming methods, or are based on in silico analyses of already published 56

phage genomes [8]. Accordingly, Korf et al., (2019) isolated 50 phages on 29 individual E. coli strains, while 57

Jurczak-Kurek et al., (2016) isolated 60 phages on a single E. coli strain, both finding a broad diversity of 58

Caudovirales and also representatives of novel phage lineages [22,23]. We hypothesized, that a high-throughput 59

screening of nearly 200 samples in time-series, using a single host, would expand the number of known phages 60

by facilitating the interception of the prevailing phage(s) of the given day. 61

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 4: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

4

2. Materials and Methods 62

The screening for Escherichia phages was performed with the microplate based High-throughput screening 63

method as described in (not published). With the exception that lysates of wells giving rise to plaques were 64

sequenced without further purification. In short, an overnight enrichment was performed in microplates with 65

host culture, media and wastewater (0.5 ml per well), followed by filtration (0.45 µm), a purification step by 66

re-inoculation, a second overnight incubation and filtration (0.45µm), and then a spot-test (soft-agar overlay) 67

to indicate positive wells. All procedures were performed under sterile conditions. 68

2.1.1 Samples Bacteria and media 69

Inlet Wastewater samples (188) were collected (40-50 ml) in time-series (2-4 days spanning 1-3 weeks), 70

from 48 Danish wastewater treatment facilities (rural and urban) geographically distributed on Zealand, 71

Funen and in Jutland, during July and August 2017. The samples were centrifuged (9000 x g, 4 °C, 10 min) and 72

the supernatant filtered (0.45 µm) before storage in aliquots (-20°C) until screening. The host bacterium is E. 73

coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration 74

50 mM). Collected phages were stored in SM-buffer [24] at 4°C. 75

2.1.2 Sequencing and genomic characterisation 76

DNA extractions, clean-up (ZR-96 Clean and Concentrator kit, Zymo Research, Irvine, CA USA) and 77

sequencing libraries (Nextera® XT DNA kit, Illumina, San Diego, CA USA) were performed according to 78

manufacturer’s protocol with minor modifications as described in Kot et al., (2014) [25]. The libraries were 79

sequenced as paired-end reads on an Illumina NextSeq platform with the Mid Output Kit v2 (300 cycles). The 80

obtained reads were trimmed and assembled in CLC Genomics Workbench version 10.1.1. (CLC BIO, Aarhus, 81

Denmark). Overlapping reads were merged with the following settings: mismatch cost: 2, minimum score: 15, 82

gap cost: 3 and maximum unaligned end mismatches: 0, and then assembled de novo. Additional control 83

assemblies were constructed using SPAdes version 3.12.0 [26]. Phage genomes were defined as contigs with 84

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 5: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

5

an average coverage > x 20 and a sequence length ≥ 90% of closest relative. Gene prediction and annotation 85

were performed using a customized RASTtk version 2.0 [27] workflow with GeneMark [28], with manual 86

curation and verification using BLASTP [29], HHpred [30] and Pfam version 32.0 [31], or de novo annotated 87

using VIGA version 0.11.0 [32] based on DIAMOND searches (RefSeq Viral protein database) and HMMer 88

searches (pVOG HMM database). All genomes were assessed for antibiotic resistance genes, bacterial 89

virulence genes, type I, II, III and IV restriction modification (RM) genes and auxiliary metabolism genes 90

(AMGs) using ResFinder 3.1 [33,34], VirulenceFinder 2.0 [35], Restriction-ModificationFinder 1.1 (REBASE) 91

[36] and VIBRANT version 1.0.1 [37], respectively. The 104 unique phage genomes were aligned to viromes of 92

BioProject PRJNA545408 in CLC Genomics Workbench and deposited in Genbank [17]. 93

2.1.3 Genetic analyses 94

Nucleotide (NT) and amino acid (AA) similarities were calculated using tools recommended by the ICTV 95

[38], i.e. BLAST [29] for identification of closest relative (BLASTn when possible, discontinuous megaBLAST 96

(word size 16) for larger genomes) and Gegenees version 2.2.1 [39] for assessing phylogenetic distances of 97

multiple genomes, for both NTs (BLASTn algorithm) and AAs (tBLASTx algorithm) a fragment size of 200 bp 98

and step size 100 bp was applied. NT similarity was determined as percentage query cover multiplied by 99

percentage NT identity. Novel phages are categorised according to ICTV taxonomy. The criterion of 95% DNA 100

sequence similarity for demarcation of species was applied to identify novel species representatives and to 101

determine uniqueness within the dataset. Evolutionary analyses for phylogenomic trees were conducted in 102

MEGA7 version 2.1 (default settings) [40]. These were based on the large terminase subunit (Caudovirales), a 103

gene commonly applied for phylogenetic analysis [41,42] and on the DNA replication gene (gpA) 104

(Microviridae). The NT sequences were aligned by MUSCLE [43] and the evolutionary history inferred by the 105

Maximum Likelihood method based on the Tamura-Nei model [44]. The trees with the highest log likelihood are 106

shown. Pairwise whole genome comparisons were performed with Easyfig 2.2.2 [45] (BLASTn algorithm), these 107

were curated by adding color-codes and identifiers in Inkscape version 0.92.2. The R package iNEXT [46,47] in 108

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 6: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

6

R studio version 1.1.456 [48] was used for rarefaction, species diversity (q = 0, datatype: incidence_raw), 109

extrapolation hereof (estimadeD) and estimation of sample coverage. The visualisation of genome sizes and 110

GC contents was prepared in Excel version 16.31. 111

3. Results and Discussion 112

3.2. Phylogenetics, taxonomy and species richness 113

The sequenced phages were analysed strictly in silico, focusing on their relatedness to known phages, 114

their taxonomy and any distinctive characteristics. Based on the confirmed morphology of closely related 115

phages, at least four different families are represented, Myoviridae (58%), Siphoviridae (25%), Podoviridae (7.4%) 116

and even the single stranded DNA (ssDNA) Microviridae (5.9%) (Table 1). Five (3.7%) of the phages are so 117

distinct that they could not with certainty be assigned to any family. A similar distribution was found by Korf 118

et al., (2019), i.e. 70% Myoviridae, 22% Siphoviridae and 8% Podoviridae, just as an analysis of the genomes of 119

Caudovirales infecting Enterobacteriaceae, by Grose & Casjens (2014), also identified more clusters belonging to 120

the Myoviridae, than the Siphoviridae and fewest of the Podoviridae [8,22]. Jurczak-Kurek et al., (2016) found more 121

siphoviruses than myovirus, but also found the Podoviridae to be the least abundant [23]. Nonetheless, these 122

distributions are just as likely to be caused by culture or isolation methods, as they are to reflect true 123

abundances. Based on DNA homology and phylogenetic analysis, the 104 unique phages identified in this 124

study group into 14 distinct clusters and nine singletons, having a <1% Gegenees score between clusters and 125

singletons, with the exceptions of cluster IV, V and Jahat which have inter-Gegenees scores of up to 20%, cluster 126

IX, X and XI which have inter-Gegenees scores of 1-8% and cluster XIII, Sortsne and Sortkaff which have inter-127

Gegenees scores of 6-14% (Figure 1, Table 2). Excluding one phage in cluster I and two phages in cluster XIV, 128

the intra-Gegenees score of all clusters is above 39% (Figure 1). 129

130

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 7: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

131

Figure 1 Phylogenetic tree (Maximum log Likelihood: -1325.01, large terminase subunit (Caudovirales) or DNA replication protein gpA (Microviridae)), scalebar: substitutions 132 per site, morphology is indicated by symbols (Myoviridae: ☐, Siphoviridae: ▽, Podoviridae: ○, Microviridae: ◇) and phylogenomic nucleotide distances (Gegenees, BLASTn: 133 fragment size: 200, step size: 100, threshold: 0%) 134

I

II

III

IV

V

VI

VII

VIIIIX

X

XI

XII

XIII

XIV

Organism 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104

1: MG_194_C2_Esch_VpaE1 100 82 79 78 82 82 81 80 82 80 83 80 77 74 78 79 76 79 83 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2: MG_196_C1_Esch_AYO145A 76 100 75 77 83 80 77 78 78 78 78 76 78 73 75 78 77 76 80 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3: MG_191_C1_Esch_AYO145A 75 76 100 77 78 77 84 75 80 78 77 83 76 72 77 75 79 77 77 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

4: MG_189_C2_Esch_Alf5 73 77 76 100 78 76 76 75 77 77 77 77 78 71 75 76 76 75 77 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

5: MG_160_C2_Esch_AYO145A 78 83 77 79 100 83 80 84 79 82 81 81 80 73 76 83 79 78 83 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

6: MG_106_C1_Esch_VpaE1 80 84 79 79 85 100 81 83 83 82 82 81 81 76 79 82 81 82 84 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

7: MG_079_C1_Esch_VpaE1 76 78 84 76 80 78 100 80 80 79 80 81 78 75 78 79 79 78 78 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

8: MG_053_C1_Esch_Alf5 77 80 76 77 85 81 81 100 78 79 83 82 77 73 76 84 77 76 82 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

9: MG_198_C3_Esch_VpaE1 77 79 80 78 79 80 80 78 100 79 81 80 78 73 79 80 80 80 81 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

10: MG_049_C2_Esch_AYO145A 76 80 78 79 83 80 80 78 80 100 78 79 77 74 77 79 79 76 80 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

11: MG_087_C3_Esch_Alf5 79 79 78 78 82 80 81 82 82 78 100 83 79 73 78 84 79 79 83 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

12: MG_156_C2_Esch_O157 76 78 84 79 82 78 82 81 81 79 82 100 78 75 78 82 82 79 81 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

13: MG_172_C1_Esch_Alf5 74 80 77 79 81 80 79 77 78 77 78 79 100 75 81 78 79 79 79 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

14: MG_047_C1_Sal_VSe11 74 78 76 75 78 77 80 76 78 77 77 79 79 100 80 76 80 76 77 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

15: MG_131_C5_Esch_AYO145A 74 77 78 76 76 77 78 75 80 77 78 78 80 76 100 76 80 76 78 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

16: MG_137_C3_Esch_Alf5 77 81 77 79 85 81 81 86 82 80 85 84 79 74 78 100 84 78 84 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

17: MG_081_C1_Sal_Mushroom 74 79 80 79 81 81 81 79 82 80 80 83 80 77 81 83 100 80 81 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

18: MG_084_C1_Esch_VpaE1 76 78 78 77 79 81 80 76 81 76 79 80 80 73 77 77 80 100 79 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

19: MG_008_C1_Sal_Mushroom 78 81 76 77 83 81 78 81 81 79 82 81 78 73 78 81 79 78 100 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

20: MG_017_C2_Esch_SUSP2 21 20 20 19 20 20 20 19 19 21 20 21 19 19 20 20 20 20 21 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

21: MG_179_C1_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 92 88 92 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

22: MG_187_C21_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 100 91 93 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

23: MG_044_C1_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 88 90 100 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

24: MG_107_C2_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 93 91 100 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

25: MG_138_C36_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 88 89 92 89 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

26: MG_185_C4_Esch_Murica 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 90 84 84 88 83 86 90 91 87 90 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

27: MG_197_C2_Esch_slur12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 100 84 85 87 85 86 88 90 86 89 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

28: MG_132_C2_Esch_O157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 83 100 83 86 85 87 85 85 88 85 84 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

29: MG_086_C4_Esch_0157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 83 83 82 100 82 80 81 79 80 84 80 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

30: MG_097_C2_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 85 85 83 100 86 87 86 87 86 88 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

31: MG_120_C1_esch_2JES-2013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 85 86 82 87 100 87 85 86 89 85 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

32: MG_130_C1_Ent_FV3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 86 85 87 82 86 86 100 89 88 88 87 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

33: MG_150_C5_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 87 85 80 87 84 89 100 91 86 91 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

34: MG_160_C1_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 89 85 81 87 85 88 91 100 88 92 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

35: MG_183_C17_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 87 85 87 84 86 87 87 86 87 100 87 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

36: MG_003_R1_C1_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 88 85 82 89 85 88 91 92 88 100 93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

37: MG_039_C5_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 88 84 81 88 85 88 90 92 87 93 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

38: MG_184_C35_Esch_ST32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

39: MG_099_C3_Esch_SLUR25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

40: MG_001_C1_Sal_wksl3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

41: MG_106_C6_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 71 80 79 71 77 75 76 9 8 10 18 8 10 10 7 9 8 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

42: MG_182_C1_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 74 100 76 76 74 80 79 78 10 9 10 18 9 10 10 11 11 11 10 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

43: MG_146_C2_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 77 68 100 82 70 73 72 74 8 7 8 16 7 9 8 7 9 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

44: MG_010_C12_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 76 69 83 100 73 74 76 76 6 7 7 15 7 8 9 8 9 9 9 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

45: MG_187_C4_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73 70 74 77 100 73 75 76 11 10 10 18 9 11 11 10 11 9 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

46: MG_086_C1_Sal_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75 74 74 74 70 100 78 84 7 8 9 17 8 9 9 7 10 9 9 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

47: MG_142_C100_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 74 73 74 77 72 78 100 80 9 8 9 17 8 9 9 8 10 8 9 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

48: MG_193_C1_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73 70 74 75 72 83 79 100 9 10 10 17 9 10 10 7 10 9 12 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

49: MG_145_C1_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 9 8 6 10 7 9 9 100 20 20 19 17 19 20 14 16 14 14 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

50: MG_081_C2_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 9 7 7 10 8 8 9 19 100 85 79 84 87 88 76 76 77 78 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

51: MG_164_C9_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 9 8 7 10 9 9 10 20 85 100 88 89 89 90 74 76 75 76 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

52: MG_187_C12_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 17 16 15 17 17 17 18 18 78 87 100 80 84 82 67 69 69 70 82 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

53: MG_126_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 9 7 7 9 8 8 9 17 86 90 82 100 91 93 76 77 76 78 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

54: MG_049_C5_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 9 9 8 10 9 9 10 19 87 89 84 89 100 91 73 76 76 77 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

55: MG_190_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 10 8 9 11 9 9 11 20 89 90 83 91 91 100 74 78 78 78 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

56: MG_022_C3_Esch_phAPEEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 10 7 8 10 7 8 7 14 77 74 68 75 73 74 100 84 79 83 74 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

57: MG_086_C6_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 10 9 9 11 10 9 11 15 76 76 70 75 76 77 84 100 80 83 77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

58: MG_135_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 10 8 9 9 9 9 9 14 76 74 69 73 75 76 78 79 100 81 76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

59: MG_197_C3_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 10 8 9 10 9 9 12 15 78 76 70 76 76 78 82 82 81 100 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

60: MG_192_C21_Esch_Swan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 8 7 7 9 8 8 9 20 88 90 83 91 92 91 74 77 77 79 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

61: MG_022_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 84 82 85 78 4 5 4 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

62: MG_091_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 100 86 77 81 5 5 4 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

63: MG_136_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 81 85 100 80 82 5 4 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

64: MG_187_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 80 84 100 80 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

65: MG_145_C3_Esch_ESCO13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 78 80 82 77 100 4 3 4 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

66: MG_170_C16_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 100 92 91 91 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

67: MG_188_C1_Ent_ECGD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 4 3 3 91 100 90 91 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

68: MG_146_C3_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 5 3 4 90 91 100 88 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

69: MG_048_C2_Ent_ECGD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 90 91 88 100 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

70: MG_089_C1_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 91 93 90 91 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

71: MG_199_C3_Esch_ECD7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

72: MG_148_C1_Ent_RB49 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

73: MG_002_C1_Ent_JS10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 93 2 2 1 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

74: MG_193_C3_Ent_JS10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 100 2 3 2 3 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

75: MG_002_C2_Esch_O157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 100 88 88 89 8 8 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

76: MG_109_C1_Esch_APCEc01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 85 100 87 87 8 8 8 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

77: MG_107_C1_Esch_PhAPEC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 85 87 100 87 8 7 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

78: MG_135_C2_Ent_RB69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 86 87 86 100 8 8 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

79: MG_110_C1_Esch_UFV13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 9 8 8 100 87 84 86 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

80: MG_116_C2_Esch_slur13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 7 8 7 8 87 100 83 86 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

81: MG_036_C1_Yer_phiD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 8 8 8 86 85 100 86 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

82: MG_044_C3_Esch_UFV13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 8 8 8 86 86 85 100 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

83: MG_178_C1_Yer_phiD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 7 8 7 8 86 89 85 86 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

84: MG_038_C11_Ent_IME_EC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

85: MG_099_C1_Esch_Gluttony 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 72 49 66 64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

86: MG_180_C4_Esch_Pride 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 71 100 48 66 67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

87: MG_124_C1_Esch_Sloth 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 48 48 100 46 48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

88: MG_043_C4_Esch_SECphi18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 65 66 47 100 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

89: MG_046_C1_Esch_slur05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 67 49 79 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

90: MG_144_C1_Halfdan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0

91: MG_038_Ent_St-1C6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 39 0 13 6 0 0 0 0 0 0 0 0 0

92: MG_180_C3_Ent_IME_EC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42 100 0 12 7 0 0 0 0 0 0 0 0 0

93: MG_053_C26_Skarpretter 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0

94: MG_165_C4_Kleb_IME279 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 11 0 100 14 0 0 0 0 0 0 0 0 0

95: MG_162_C4_Kleb_IME279 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 0 13 100 0 0 0 0 0 0 0 0 0

96: MG_126_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 22 13 13 15 15 14 15 17

97: MG_039_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 100 27 28 25 25 23 23 25

98: MG_130_C3_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 26 100 90 47 48 65 67 47

99: MG_144_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 28 93 100 48 48 69 68 47

100: MG_200_C27_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 24 46 46 100 85 41 42 88

101: MG_116_C10_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 24 48 47 86 100 43 43 84

102: MG_182_C43_Ent_J8+65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 23 66 67 43 43 100 93 44

103: MG_140_C1_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 23 67 66 43 44 93 100 44

104: MG_168_C20_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 24 46 46 88 83 43 44 100

Heat plot for: Comparison[ALL_090419], Alignment[analysis], Analysis[*default*] file:///Users/au574427/Desktop/All_coliphages/All_110419.html

1 of 1 11/04/2019, 11.49

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 8: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

2

Table 1 List of 104 unique Escherichia phages identified in 94 Danish wastewater samples. Novelty (*) is based on the 95% demarcation of species. Similarity is sequence 135 identity (%) times sequence coverage (%) to closest related (Blastn). Taxonomy is based on similarity (BLASTn) to closest related and the ICTV Master Species list. 136

Escherichia

Phage

Genome

(bp)

ORFs

(n)

tRNAs

(n)

GC

(%)

Novel

Sample

Location

Cluster

Taxonomy

Similarity

(%)

Accession

number

Tootiki 88257 128 22 39 * 3D Avedøre I Felixounavirus 90,2 MN850647

Mio 83431 121 18 39,1 * 13C Hadsten I Felixounavirus 89,7 MN850631

Allfine 86963 125 20 39 * 14A Hammel I Felixounavirus 91,2 MN850633

Bumzen 87360 126 20 39,1 * 14B, 14D, 15A Hammel, Hinnerup I Felixounavirus 92,5 MN850635

Dune 88511 129 20 39 * 19C Tørring I Felixounavirus 91,5 MN850636

Warpig 86106 127 17 39 * 20A Helsingør I Felixounavirus 93 MN850637

Radambza 86702 127 19 38,9 * 20D Helsingør I Felixounavirus 91,6 MN850639

Ekra 87282 128 20 38,9 * 21C Herning I Felixounavirus 92,9 MN850644

Humlepung 85311 119 19 39,1 * 12B. 25B, 37A Drøsbro, Gram, Bogense I Felixounavirus 92,1 MN850564

Finno 87554 129 20 38,9 * 31C Skovby I Felixounavirus 89,7 MN850619

Garuso 85798 130 20 38,9 * 29A, 33A Nustrup, Sommersted I Felixounavirus 90,9 MN850566

Momo 88168 130 20 39 * 38D Hofmansgave I Felixounavirus 90,7 MN850580

Heid 87590 126 20 39 * 8C, 24B, 24C, 34D, 37D, 38B Esbjerg Ø, Bevtoft, Vojens, Bogense,

Hofmansgave I Felixounavirus 91,2 MN850577

Skuden 87263 131 20 38,9 * 41D Odense NV I Felixounavirus 91,1 MN850585

Pinkbiff 88814 129 20 39 * 46A Marselisborg I Felixounavirus 93,9 MN850603

Fjerdesal 87715 128 21 39 * 3A, 39A, 46C Avedøre, Hårslev, Marselisborg I Felixounavirus 90,6 MN850605

Andreotti 83391 117 20 39,2 * 47B Egå I Felixounavirus 91,9 MN850610

Nataliec 89137 134 20 39 * 47D, 48C Egå, Viby I Felixounavirus 90,3 MN850611

Adrianh 88226 128 19 38,9 * 42C, 48B Otterup, Viby I Felixounavirus 91,1 MN850614

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 9: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

3

Escherichia

Phage

Genome

(bp)

ORFs

(n)

tRNAs

(n)

GC

(%)

Novel

Sample

Location

Cluster

Taxonomy

Similarity

(%)

Accession

number

Mistaenkt 86664 128 22 47,2 * 6A Kolding I Suspvirus 91,1 MN850587

Nimi 137039 213 5 43,7 * 11C Varde III Vequintavirus 93,3 MN850626

Navn 141707 224 4 43,6 * 21B Herning III Vequintavirus 91,1 MN850642

Nomine 137991 220 5 43,6 * 23A Lemvig III Vequintavirus 91,5 MN850649

Naswa 138583 222 5 43,6 * 45A Ålborg Ø III Vequintavirus 93,1 MN850595

naam 137129 215 5 43,7

31D Skovby III Vequintavirus 94,5 MN850630

Ime 137114 217 5 43,6 * 12C, 36B Drøsbo III Vequintavirus 93,1 MN850576

Magaca 135826 217 5 43,6

48A Viby III Vequintavirus 96 MN850612

Nom 136114 213 5 43,6 * 28D Jegerup III Vequintavirus 92,6 MN850646

Isim 138289 219 5 43,6 * 31B Skovby III Vequintavirus 93,8 MN850597

Nomo 137702 218 5 43,7 * 38D Hofmansgave III Vequintavirus 93,3 MN850578

Inoa 138710 220 5 43,6 * 44C Ålborg V III Vequintavirus 92 MN850593

Pangalan 136944 215 5 43,7

2A, 45D, 48D Grindsted, Egå, Viby III Vequintavirus 94,8 MN850627

Tuntematon 150473 279 11 39,1 * 7B, 7C Esbjerg V VI Myoviridae 89,6 MN850618

Anhysbys 149335 271 11 39,1 * 22C Hillerød VI Myoviridae 91,5 MN850648

Ukendt 150947 266 11 39 * 32D Skrydstrup VI Myoviridae 88,7 MN850565

Nepoznato 151514 265 10 38,9 * 19D, 35A, 48A, 48D Tørring, Årøsund, Viby VI Myoviridae 85,6 MN850571

Nieznany 144998 254 11 39,1 * 45C Ålborg Ø VI Myoviridae 88,9 MN850598

Muut 146307 243 13 37,4 * 35B Årøsund VII Myoviridae 92 MN850573

Alia 147009 246 13 37,5 * 13D Hadsten VII Myoviridae 93,1 MN850632

Outra 145482 246 13 37,4 * 22A Hillerød VII Myoviridae 93,8 MN850645

Inny 147483 247 13 37,4 * 45D Ålborg Ø VII Myoviridae 92,4 MN850601

Arall 145715 242 13 37,4

41B Odense NØ VII Myoviridae 94,6 MN850584

Kvi 163673 266 - 40,5 * 48C Viby VIII Krischvirus 94,2 MN850615

Kaaroe 163719 267 - 40,5

35D Årøsund VIII Krischvirus 94,7 MN850574

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 10: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

4

Escherichia

Phage

Genome

(bp)

ORFs

(n)

tRNAs

(n)

GC

(%)

Novel

Sample

Location

Cluster

Taxonomy

Similarity

(%)

Accession

number

Dhabil 165644 266 3 39,5 * 1B Billund IX Dhakavirus 87,5 MN850621

Dhaeg 170817 278 3 39,4 * 47A, 47C Egå IX Dhakavirus 87,4 MN850609

Mogra 168724 263 2 37,7 * 25C Gram X Mosigvirus 91,1 MN850579

Mobillu 163063 255 2 37,7

1B Billund X Mosigvirus 94,5 MN850622

Moha 168676 267 2 37,6

26A Haderslev X Mosigvirus 94,8 MN850590

Moskry 169410 269 2 37,6 * 32C Skrydstrup X Mosigvirus 93,2 MN850651

Teqskov 165017 257 6 35,4 * 10D Skovlund XI Tequatrovirus 91,7 MN895437

Teqdroes 166833 269 10 35,4 * 12D Drøsbro XI Tequatrovirus 88,6 MN895438

Teqhad 167892 270 10 35,3 * 26B Haderslev XI Tequatrovirus 90,1 MN895434

Teqhal 168070 266 11 35,4 * 27A, 27D Halk XI Tequatrovirus 93,9 MN895435

Teqsoen 166468 268 10 35,5 * 43B Søndersø XI Tequatrovirus 91,7 MN895436

Flopper 52092 78 1 44,2 * 44D Ålborg V - Myoviridae 87 MN850594

Damhaus 51154 89 - 44,1 * 4B Damhusåen IV Hanrivervirus 85,8 MN850602

Herni 50971 89 - 44,1 * 21B, 30D Herning, Over Jerstal IV Hanrivervirus 87,6 MN850640

Grams 49530 83 - 44,1 * 25B Gram IV Hanrivervirus 87,1 MN850567

Aaroes 51662 92 - 44,1 * 35B Årøsund IV Hanrivervirus 83 MN850572

Aalborv 46660 79 - 43,9 * 44B Ålborg V IV Hanrivervirus 86,9 MN850591

Haarsle 48613 85 - 44 * 39B, 45C Hårslev, Ålborg Ø IV Hanrivervirus 87,1 MN850600

Egaa 51643 87 - 44,1 * 47A Egå IV Hanrivervirus 89,7 MN850607

Vojen 50709 86 - 44,1 * 34B Vojens IV Hanrivervirus 89,7 MN850569

Tiwna 51014 85 - 44,6 * 21B Herning V Tunavirinae 87,2 MN850643

Tonijn 51627 86 - 44,6 * 32B, 32C Skrydstrup V Tunavirinae 88,4 MN850641

Tonnikala 51277 86 - 44,8 * 48A Viby V Tunavirinae 86,4 MN850613

Atuna 50732 88 - 44,6 * 7B Esbjerg V V Tunavirinae 84,9 MN850620

Tunus 51111 87 - 44,8 * 20A Helsingør V Tunavirinae 93,7 MN850638

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 11: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

5

Escherichia

Phage

Genome

(bp)

ORFs

(n)

tRNAs

(n)

GC

(%)

Novel

Sample

Location

Cluster

Taxonomy

Similarity

(%)

Accession

number

Orkinos 49798 81 - 44,6 * 30B, 30D Over Jerstal V Tunavirinae 91,3 MN850586

Ityhuna 50768 86 - 44,7 * 39D Hårslev V Tunavirinae 93,3 MN850582

Tonn 51012 87 - 44,5 * 8C, 45C Esbjerg Ø, Ålborg Ø V Tunavirinae 94 MN850596

Tinuso 50856 86 - 44,8

14A Hammel V Tunavirinae 97,3 MN850634

Tunzivis 50596 84 - 44,6

46B Marselisborg V Tunavirinae 94,5 MN850604

Tuinn 50505 86 - 44,7

46D Marselisborg V Tunavirinae 94,8 MN850606

Jahat 51101 87 - 45,7 * 35A Vojens - Tunavirinae 68,5 MK552105

Bob 45252 63 - 54,5 * 12C Drøsbro XII Dhillonvirus 88,6 MN850628

Mckay 44443 63 - 54,5 * 13B Hadsten XII Dhillonvirus 83,8 MN850629

Jat 44417 63 - 54,5 * 23C Lemvig XII Dhillonvirus 89,4 MN850650

Rolling 46017 64 - 54,2 * 29D Nustrup XII Dhillonvirus 80,2 MN850575

Welsh 45207 62 - 54,6 * 23A, 43D Lemvig, Søndersø XII Dhillonvirus 83,8 MN850589

Buks 40308 62 - 49,7 * 1A Billund - Jerseyvirus 91,3 MN850616

Skure 59474 92 - 44,6 * 23C Lemvig - Seuratvirus 90,4 MK672798

Halfdan 42858 57 - 53,7 * 34D Vojens - Siphoviridae 28,8 MH362766

Lilleen 5342 6 - 46,9 * 33A Sommersted II Gequatrovirus 93,8 MK629526

Lilleput 5490 6 - 47 * 12D Drøsbro II Gequatrovirus 93,4 MK629525

Lilleto 5492 6 - 46,8 * 43C, 43D, 44B Søndersø, Ålborg V II Gequatrovirus 92,7 MK629529

Lilledu 5483 6 - 47,2 * 25C Gram II Gequatrovirus 92,6 MK791318

Lillemer 5492 6 - 47,1

45C Ålborg Ø II Gequatrovirus 94,5

Lilleven 6090 9 - 44,4 * 11B Varde - Alphatrevirus 93,9 MK629527

Sortsyn 42116 61 - 59 * 11B Varde XIII Unclassified 92,3 MN850623

Sortregn 38200 53 - 59,3

43D Søndrsø XIII Unclassified 97,3 MN850588

Skarpretter 42042 63 - 55,8 * 15A Hinnerup - Unclassified 37,9 MK105855

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 12: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

6

Escherichia

Phage

Genome

(bp)

ORFs

(n)

tRNAs

(n)

GC

(%)

Novel

Sample

Location

Cluster

Taxonomy

Similarity

(%)

Accession

number

Sortkaff 42538 61 - 59,5 * 39B Hårslev - Unclassified 89,8 MN850581

Sortsne 41912 62 - 60 * 40A Odense NV - Unclassified 67,6 MK651787

Aldrigsur 42379 55 - 55,7 * 44B Ålborg V XIV Autographvirinae 71,9 MN850592

Altidsur 42197 53 - 55,7 * 33D Sommersted XIV Autographvirinae 71,8 MN850568

Forsur 42476 56 - 55,4 * 48D Viby XIV Autographvirinae 72 MN850617

Glasur 42507 56 - 55,4 * 40D Odense NV XIV Autographvirinae 72,3 MN850583

Lidtsur 42291 56 - 54,6 * 30B Over Jerstal XIV Autographvirinae 69 MK629528

Megetsur 42132 54 - 55,8 * 31B Skovby XIV Autographvirinae 73,1 MN850608

Mellemsur 40770 50 - 55,8 * 34D Vojens XIV Autographvirinae 76,4 MN850570

Smaasur 41110 50 - 55,4 * 11C Varde XIV Autographvirinae 93,3 MN850625

Usur 41906 51 - 55,4 * 27A, 27D Halk XIV Autographvirinae 73,3 MN850624

137 138

The high-throughput screening method favours easily culturable plaque-forming lytic phages. Still, we identified 104 unique Escherichia phages of 139

which only 16% were ≥95% similar (BLAST) to already published phages (Table 2). Phages were identified in wastewater samples from 43 of the 48 140

investigated treatment facilities. From the majority of positive samples (58) a single phage was sequenced, however in some samples the lysate held 141

more than one phage. Twenty-five of the lysates held two phages, eight lysates held three phages and one had as many as four phages. Of the 104 142

unique phages, 91 represent novel species (Table 1, 2). Of these, 51 differed by ≥10% from published phage genomes and some have NT similarities as 143

low as 29% (Table 2).144

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 13: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

These newly sequenced phages represent a substantial quota of divergent lytic Escherichia phages in Danish 145

wastewater, but are still far from disclosing the true diversity hereof (Figure 2). An extrapolation of species 146

richness (q = 0) predicts a total of 292 distinct species (requiring a sample size of ∼900 phages). The relatively 147

small sample-size in this study (n = 136), may subject the estimation to a large prediction bias. The sampling-148

method also introduces a bias by selecting for abundance and burst size, thereby potentially underestimating 149

diversity. Nonetheless, the results provide an indication of the minimal diversity of lytic Escherichia (MG1655, 150

K-12) phages in Danish wastewater, estimated to be as a minimum be in the range of 160 to 420 unique phage 151

species (Figure 2b). The novelty and diversity of these wastewater phages is truly remarkable and verifies our 152

hypothesis, as well as the efficiency of the High-throughput Screening Method for exploring diverse phages of a 153

single host (not published). 154

(a) (b)

Figure 2 (a) Sample-size-based rarefaction and extrapolation curve with confidence intervals (0.95). (b) Sample 155 completeness curve with confidence intervals (0.95). 156

157

0

100

200

300

400

0 250 500 750Number of sampling units

Spec

ies

dive

rsity

interpolated extrapolated

0.00

0.25

0.50

0.75

1.00

0 250 500 750Number of sampling units

Sam

ple

cove

rage

interpolated extrapolated

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 14: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

2

Table 2 Taxonomic distribution of phages identified in 94 Danish wastewater samples, 158 based on similarity to closest related and the ICTV Master Species list. 1. ≤95% similarity 159 to other phages in the dataset. 2. ≤95% similarity to other phages in the dataset and to 160 published phage genomes. 161

Taxonomy* total unique1 novel2

Caudovirales; Myoviridae; Ounavirinae; Felixounavirus 33 19 19

Caudovirales; Myoviridae; Ounavirinae; Suspvirus 1 1 1

Caudovirales; Myoviridae; Tevenvirinae; Krischvirus 2 2 1

Caudovirales; Myoviridae; Tevenvirinae; Dhakavirus 3 2 2

Caudovirales; Myoviridae; Tevenvirinae; Mosigvirus 4 4 2

Caudovirales; Myoviridae; Tevenvirinae; Tequatrovirus 6 5 5

Caudovirales; Myoviridae; Vequintavirinae; Vequintavirus 15 12 9

Caudovirales; Myoviridae 15 11 10

Caudovirales; Podoviridae; Autographivirinae 10 9 9

Caudovirales; Siphoviridae; Dhillonvirus 6 5 5

Caudovirales; Siphoviridae; Guernseyvirinae; Jerseyvirus 1 1 1

Caudovirales; Siphoviridae; Seuratvirus 1 1 1

Caudovirales; Siphoviridae; Tunavirinae; Hanrivervirus 10 8 8

Caudovirales; Siphoviridae; Tunavirinae 15 12 8

Caudovirales; Siphoviridae 1 1 1

Microviridae; Bullavirinae; Alphatrevirus 1 1 1

Microviridae; Bullavirinae; Gequatrovirus 7 5 4

Unclassified bacterial viruses 5 5 4

Total 136 104 91

162

3.2. Phage genome characteristics 163

The sequenced phage genomes range in sizes from 5 342 bp of the Microviridae lilleen to a span in the 164

Caudovirales from 38 200 bp of the unclassified sortregn to 170 817 bp of the Dhakavirus dhaeg (Figure 3, Table 165

2). GC contents also vary greatly, from only 35.3% (Tequatrovirus teqhad) and up to 60.0% (the unclassified 166

sortsne) (Figure 3,Table 2), heavily diverging from the host GC content of 50.79%. Phages often have a lower 167

GC contents than their host [49], as observed for the majority (81%) of the wastewater phages (Table 2). A 168

lower GC content of phage genomes tends to correlate with an increase in genome size [50], as is also the case 169

for the Caudovirales in this study (Table 2). Rocha & Danchin (2002) hypothesized that phages, along with 170

plasmids and insertion sequences, can be considered intracellular pathogens, and like host dependent bacterial 171

pathogens and symbionts, they experience competition for the energetically expensive and less abundant GTP 172

and CTP and as a consequence develop genomes with comparatively higher AT contents [49]. However, the 173

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 15: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

3

differences in AT richness reported by Rocha & Danchin (2002) for dsDNA and isometric ssDNA phages were 174

merely 4.2% and 5% [49], while the dsDNA and isometric ssDNA phages in this study deviate from their 175

isolation host by having up to 31% higher and up to 19% lower AT content (Table 2). Accordingly, the 176

considerable differences in GC/AT contents may instead primarily be a reflection of past host relations [14]. 177

The sequencing of lysates and not only plaquing phages, may have contributed in enabling the capture of such 178

a broad GC-content and overall diversity. 179

180 Figure 3 Bubble-diagram of the 104 unique Escherichia phages displaying genome size- and GC-content distribution. Area 181 of bubbles indicates number of phages. The yellow bubble is Podoviridae phages, green ones are Siphoviridae phage clusters, 182 dark-blue ones are Myoviridae phage clusters with tRNAs, the light-blue bubble is Myovridae phages without tRNAs, the 183 red bubble is Microviridae phages and the grey one is unclassified phages. 184

185

The genome screening algorithms identified no sequences coding for homologs of known virulence or 186

antibiotic resistance genes. Though not a definitive exclusion, this interprets as a reduced risk of presence, a 187

preferable trait for phage therapy application. Currently available tools for AMG screening of viromes did not 188

provide a comprehensive and exclusive assessment of the AMG pool in the dataset. The majority of genes 189

identified are not AMGs, but code for phage DNA modification pathways (Table S1). The function of some of 190

30

35

40

45

50

55

60

65

-5 5 15 25 35 45 55 65 75 85 95 105 115 125 135 145 155 165 175 185

GC

con

tent

(%)

Genome size (kb)

Podoviridae SiphoviridaeMyoviridae w. tRNAs Myoviridae no tRNAsMicroviridae Unclassifed

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 16: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

4

the suggested AMGs is unknown, these include a nicotinamide phosphoribosyltransferase (NAMPT) present 191

in cluster I (Felixounavirus and Suspvirus) and VII (unclassified Myoviridae), a 3-deoxy-7-phosphoheptulonate 192

(DAHP) synthase present in cluster X (Mosigvirus) and a complete dTDP-rhamnose biosynthesis pathway 193

present in cluster VI and VII (unclassified Myoviridae). The presence of a dTDP-rhamnose biosynthesis pathway 194

in the DNA metabolism region of phage genomes is peculiar, one possible explanation is, that these phages 195

utilize rhamnose for glycosylation of hydroxy-methylated nucleotides in the same manner as the T4 generated 196

glucosyl-hmC [51]. Dihydrofolate reductase, identified in six of the Myoviridae clusters, is involved in thymine 197

nucleotide biosynthesis, but is also a structural component in the tail baseplate of T-even phages [52]. Multiple 198

verified DNA modification genes were identified. Methyltransferases, some putative, were detected in all 199

Myoviridae of cluster III (vequintavirus), VI (unclassified), VII (unclassified), VIII (Krischvirus), in all Siphoviridae 200

of cluster IV (Hanrivervirus) and in the unclassified phages of cluster XIII. Indeed, cluster XIII and the singletons 201

skarpretter, sortsne, and sortkaff code for both DNA N-6-adenine-methyltransferases (dam) and DNA cytosine 202

methyltransferases (dcm). Finally, two novel epigenetic DNA hypermodification pathways were identified. 203

The mosigviruses of cluster X code for arabinosylation of hmC [51], while the novel seuratvirus Skure codes 204

for the recently verified complex 7-deazaguanine DNA hypermodification system [53,54]. 205

3.3. Forty-eight novel Myoviridae phages species 206

The Myoviridae genomes (56 unique, 49 novel) represents the greatest span in genome sizes in this study, 207

from the unclassified flopper of 52092 bp to the dhakavirus dhaeg of 170817 bp), all, except the krischviruses, 208

code for tRNAs (Figure 3, Table 2). The Myoviridae group into nine distinct clusters and four singletons, 209

representing at least three subfamilies; Tevenvirinae, Vequintavirinae and Ounavirinae in addition to 11 210

unclassified Myoviridae (cluster VI & VII) (Figure 1, Table 2). The eleven novel Tevenvirinae distributes into four 211

distinct genera, two of the Krischvirus (cluster VIII, 164kb, 40.5% GC, no tRNAs), two of the Dhakavirus (cluster 212

IX, 166-170 kb, 39.4-39.5% GC, 3 tRNAs), two of the Mosigvirus (cluster X, 169 kb, 37.6-37.7% GC, 2 tRNAs) and 213

five of the Tequatrovirus (cluster XI, 165-168 kb, 35.3-35.5% GC, 6-11 tRNAs), while the nine novel 214

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 17: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

5

Vequintavirinae are all vequintaviruses (cluster III, 136-142 kb, 43.6-43.7% GC, 5 tRNAs) closely related (91.1-215

93.8%, BLAST) to classified species. The vequintaviruses were identified in samples from 12 treatment 216

facilities. Only reads from two samples of a dataset of human gut viromes based on timeseries of faecal 217

samples from 10 healthy persons [55] mapped to the wastewater phages, and only to vequintaviruses. These 218

reads covered 43-54% and 13-26% of all the Vequintavirus genomes except pangalan and navn, respectively. 219

This suggests that the vequintaviruses are related to entero-phages. All but one of the Ounavirinae (cluster I, 220

82-89 kb, 38.9-39.2% GC, 17-22 tRNAs) are felixounaviruses (89.7-93.9%, BLAST) with an intra-Gegenees score 221

of 71-86% (Figure 1). The Felixounavirus is a relatively large genus with 17 recognized species. In this study, 222

felixounaviruses were identified in samples from 23 of the 48 facilities, indicating that they are both ubiquitous 223

in Danish wastewater, numerous and easily cultivated. The last ounavirus, mistaenkt (86.7 kb, 47.2% GC, 22 224

tRNAs) is a Suspvirus. The five cluster VI phages (144.9-151.5 kb, 39±0.1% GC, 10-11 tRNAs) belong to a 225

phylogenetic distinct clade, the recently proposed ´Phapecoctavirus´, and have substantial similarity (86-90%, 226

BLAST) with the anticipated type species Escherichia phage phAPEC8 (JX561091) [22,56]. Cluster VII have 227

significantly (p = 0.038) larger genomes (145.8-147.5 kb) with marginally, though not significantly (p = 0.786), 228

lower GC contents of 37.4-37.5%, all have 13 tRNAs (Table 2) and code for NAMPT not present in cluster VI 229

(Table S1). As a group, cluster VII are even more homogeneous than cluster VI and all are closely related (92-230

95%, BLAST) to the same five unclassified Enterobacteriaceae phages vB_Ecom_PHB05 (MF805809), 231

vB_vPM_PD06 (MH816848), ECGD1 (KU522583), phi92 (NC_023693) and vB_vPM_PD114 (MH675927) 232

[57,58], with whom they represent a novel unclassified genera. The distinctive and novel singleton flopper 233

only shares NT similarity (36-87%, BLAST) with six other phages, the Escherichia phages ST32 (MF04458) [59] 234

and phiEcoM_GJ1 (EF460875) [60], the Erwinia phages Faunus (MH191398) [61] and vB_EamM-Y2 235

(NC_019504) [62], and the Pectobacterium phages PM1 (NC_023865) [63] and PP101 (KY087898). This group of 236

unclassified phages all have genomes of 52.1-56.6 kb with 43.6-44.9% GC and while only flopper, phiEcoM_GJ1 237

and PM1 code for a single tRNA, all of them have exclusively unidirectional coding sequences (CDSs) and 238

code for RNA polymerases, characteristics of the Autographivirinae of the Podoviridae [64]. However, the 239

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 18: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

6

verified morphology of ST32, phiEcoM_GJ1, PM1 and PP101, icosahedral head, neck and a contractile tail with 240

tail fibres, classifies them as myoviruses [59,60,62,63]. Based on NT similarity and genome synteny, these seven 241

phages belong to the same, not yet classified, peculiar lineage first described by Jamalludeen et al., (2008) [60]. 242

3.4. Five novel Microviridae phages species 243

The singleton lilleven (6.1 kb, 44.4% GC, no tRNA) and the five (four novel) cluster II phages (5.3-5.5 kb, 244

46.9-47.2% GC, no tRNAs) are all Microviridae, a family of small ssDNA non-enveloped icosahedral phages 245

(Table 2). Lilleven is closely related to (93.9%, BLAST), have pronounced gene pattern synteny and high AA 246

similarities (89-90%), with the Alphatremicrovirus Enterobacteria phage St1 (Figure S2), and is a novel species 247

of the genus Alphatremicrovirus, subfamily Bullavirinae. The cluster II phages comprise four distinct species, 248

only differing by single nucleotide polymorphisms and in non-coding regions (Figure S2). They have high 249

intra-Gegenees score (88-92%) and share genomic organisation and extensive NT similarity (92.6-94.5-%, 250

BLAST) with the unclassified microvirus Escherichia phage SECphi17 (LT960607), primarily differing by 251

single nucleotide polymorphisms and in noncoding regions (Figure S2, Table 2). Cluster II are only related to 252

one recognised phage species Escherichia virus ID52 (63%, BLAST), genus Gequatrovirus, subfamily Bullavirinae, 253

with whom they predominantly differ in the region in and around the major spike protein (gpG), a distinctive 254

marker of the subfamily Bullavirinae involved in host attachment. The phylogenetic analysis separates cluster 255

II from the gequatroviruses, and both the Gegenees scores (<22%) and NT similarities (<65% BLAST) are 256

moderately low (Figure S2). Nonetheless, cluster II are still, based on pronounced genome organisation synteny 257

and a conserved AA similarity (62-64%, Gegenees) considered to be gequatroviruses (Figure S2). 258

The sequencing of the microviruses is peculiar, as library preparation with the Nextera® XT DNA kit 259

applies transposons targeting dsDNA. However, during microvirus infection the host polymerase converts 260

the viral ssDNA into an intermediate state of covalently closed dsDNA, which is then replicated in rolling 261

circle by viral replication proteins transcribed by the host RNA polymerase [65]. This intermediate state may 262

have enabled the library preparation. The presence of host DNA in the sequence results indicates an 263

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 19: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

7

insufficient initial DNase1 treatment, which can be attributed to chemical inhibition or inactivation of the 264

enzyme by adhesion to the sides of wells. Hence, it is reasonable to assume that the extracted microvirus DNA 265

was captured as free dsDNA inside host cells during ongoing infections. 266

3.5. Twenty-five novel Siphoviridae phage species 267

In spite of similar genome sizes, the large group of Siphoviridae, is the most diverse (28 unique, 24 novel) 268

in this study, with GC contents ranging from 43.9-54-6% (Figure 3, Table 2). The majority, clusters IV-V and 269

singleton jahat, are of the subfamily Tunavirinae, while the remaining are allocated into at least three divergent 270

genera, Dhillonvirus, Jerseyvirus and seuratvirus. 271

Cluster IV (46.7-51.6 kb, 43.9-44.1% GC, no tRNAs), V (49.8-51.6 kb, 44.6-44.8% GC, no tRNAs) and jahat 272

(51.1 kb, 45.7% GC, no tRNAs) have low inter-Gegenees scores (7-20%) (Figure 1), yet they form a 273

monophyletic clade and are all closely related (85-94%, BLAST) to published Tunavirinae. The Cluster IV phages 274

are notable, as they represent a significant increase to the small genus Hanrivervirus comprising only the type 275

species Shigella virus pSf-1 (51.8 kb, 44% GC, no tRNAs) isolated from the Han River in Korea [66]. The common 276

ancestry of Cluster IV and pSf-1 is evident by comparable genome sizes, GC contents, genomic organization 277

and substantial NT (86-90%, BLAST) and AA (77-85%, Gegenees) similarities (Table 2, Figure S3). During their 278

differentiation, many deletions and insertions of small hypothetical genes have occurred, most notable is a 279

unique version of a putative tail-spike protein in seven of the cluster IV hanriverviruses, indicating divergent 280

host ranges (Figure S3). All the hanriverviruses code for (putative) dam and Psf-1 is resistant towards at least 281

six restriction endonucleases [66], suggesting they employ DNA methylation as a defence strategy. 282

The five novel dhillonviruses of Cluster XII (44.4-46.0 kb, 54.2-54.6% GC, no tRNAs) have substantial NT 283

similarities with a group of unclassified dhillonviruses (80-89%, BLAST) (Table 2) and the type species 284

Escherichia virus HK578 (77-80%, BLAST). As with the hanriverviruses, and as observed by Korf et al., (2019) 285

[22], their genomes mainly differ in minor hypothetical genes and in putative tail-tip proteins, indicating 286

divergent host ranges (Figure S4). Based on NT similarity and the presence of the canonical 7-deazaguanine 287

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 20: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

8

operon the singleton skure (59.4 kb, 44.6% GC, no tRNAs) is a seuratvirus, while the singleton Buks (40.3 kb, 288

49.7% GC, no tRNAs) is assigned to the genus Jerseyvirus, subfamily Guernseyvirinae. 289

3.5.1 Escherichia phage halfdan, a novel lineage within the Siphoviridae 290

Interestingly, the singleton halfdan (42.8 kb, 53.7% GC, no tRNAs) has only miniscule similarity with 291

published phages (12-29%, BLAST). These entail two Pseudomonas phages vB_PaeS_SCUT-S3 (MK165657) and 292

Ab26 (HG962376) [67] both Septimatreviruses, two Acinetobacter phages of the Lokivirus IMEAB3 (KF811200) 293

and type species Acinetobacter virus Loki [68], and to a lesser degree the unclassified Achromobacter phage 294

phiAxp-1 (KP313532) [69]. They have a common genomic organization, yet their intra-Gegenees score is ≤1% 295

and NT similarity is negligent in roughly a third of halfdan’s 57 CDSs (Figure 4b, d). The phylogeny and AA 296

similarities (Gegenees) also indicate a distant relation, although grouping halfdan closer with the lokiviruses 297

(40-43%) than the septimatreviruses (33-34%) (Figure 4a, c). The genome of halfdan is mosaic, resembling the 298

lokiviruses in the structural region and the septimatreviruses in the replication region (Figure 4d). 299

Remarkably, halfdan has no NT similarity with known Escherichia phages, which could be an indication of E. 300

coli not being the natural host, and that halfdan may have to transcended species barriers. However, host 301

range, morphology and other physical characteristics are beyond the scope of this study, and will be 302

determined in future lab-based studies. Notably, both halfdan, Loki and Ab26 code for a (putative) MazG, 303

although there is negligible sequence similarity (Figure 4d). Phage encoded MazG is hypothesized to be 304

involved in restoring protein synthesis in a starved cell, in order to keep the cell alive, and ensure optimal 305

replication conditions [70]. The phylogeny, whole genome alignments and low NT and AA similarities 306

suggests that halfdan is distinct from other known phages (Figure 4). Accordingly, as per the ICTV guidelines, 307

halfdan is the first phage sequenced of a novel Siphoviridae genus. Yet, halfdan is also, by its mosaicism, an 308

indicator of the genetic continuum of phages questioning the validity of taxonomic interpretations. 309

310

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 21: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

9

(a) (b) (c)

(d)

Figure 4 (a) Phylogenetic tree (Maximum log Likelihood: -7678.71, large terminase subunit), scalebar: substitutions per 311 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 312 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 313 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 314 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 315 phages identified in this study and deposited in GenBank. 316

3.6. Nine novel Podoviridae phage species 317

The nine novel Podoviridae (cluster XIV, 40.7-42.5 kb, 55.4-55.7% GC, no tRNAs), all with the hallmarks of 318

the Autographvirinae i.e. unidirectionally encoded genes and RNA polymerases [64], have conserved genome 319

organisation with the type species of the Phikmvvirus, Pseudomonas virus phiKMV (42.5 kb, 62.3% GC, no 320

tRNAs) [71], but almost no NT similarity (<1%, BLAST). Besides the unclassified Autographvirinae 321

Enterobacteria phage J8-65 (NC_025445) (40.9 kb, 55.7% GC, no tRNAs), with which Cluster XIV has 322

considerable NT similarity (69-93%, BLAST) and similar GC content, Cluster XIV only share >5% nucleotide 323

similarity (40-42%, BLAST) with the Phikmvvirus Pantoea virus Limezero (43.0 kb, 55.4% GC, no tRNAs) [72] 324

Acinetobacter phage IME AB3

Acinetobacter phage vB AbaS Loki

Escherichia phage Halfdan

Pseudomonas phage vB PaeS SCH Ab26

Pseudomonas phage 73

Pseudomonas phage vB paeS SCUT-S3

Acinetobacter phage phiAxp0.050

Organism 1 2 3 4 5 6 7

1: Acinetobacter phage IME_AB3 100 2 0 0 0 0 0

2: Acinetobacter phage vB_AbaS_Loki 2 100 0 0 0 0 0

3: Escherichia phage halfdan 0 0 100 1 0 0 0

4: Pseudomonas phage vB_PaeS-SCH_Ab26 0 0 0 100 83 69 0

5: Pseudomonas phage 73 0 0 0 84 100 68 0

6: Pseudomonas phage vB_PaeS_SCUT-S3 0 0 0 69 69 100 0

7: Achromobacter phage phiAxp-1 0 0 0 0 0 0 100

Heat plot for: Comparison[Halfdan_and_friends_210319], Alig... file:///Users/au574427/Desktop/Coli_unclass/Halfdan/halfdan_n...

1 of 1 29/03/2019, 22.19

Organism 1 2 3 4 5 6 7

1: Acinetobacter phage IME_AB3 100 62 40 26 26 26 23

2: Acinetobacter phage vB_AbaS_Loki 64 100 43 28 27 27 24

3: Escherichia phage halfdan 40 42 100 33 33 33 27

4: Pseudomonas phage vB_PaeS-SCH_Ab26 26 27 33 100 89 77 25

5: Pseudomonas phage 73 26 27 34 89 100 76 24

6: Pseudomonas phage vB_PaeS_SCUT-S3 26 27 33 77 77 100 24

7: Achromobacter phage phiAxp-1 23 23 26 24 24 24 100

Heat plot for: Comparison[hlafdan_tblastx_210319], Alignment[... file:///Users/au574427/Desktop/Coli_unclass/Halfdan/halfdan_2...

1 of 1 29/03/2019, 22.19

100%

64%

Term. Port. Tape m. CoatTube Polym.MazG

Prim.

Lysis

IME_AB3

Loki

Halfdan

vB_PaeS_SCH_Ab26

73

vB_PaeS_SCUT-S3

phiAxp-1

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 22: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

10

(Figure 5, Table 2). The Phikmvvirus consists of four species, phiKMV, Limezero, Pantoea virus Limelight (44.5 kb, 325

54% GC, no tRNAs) [72] and Pseudomonas virus LKA1 (41.6 kb, 60.9% GC, no tRNAs) [73]. 326

(a) (b) (c)

(d)

Figure 5 (a) Phylogenetic tree (Maximum log Likelihood: -11728.26, large terminase subunit), scalebar: substitutions per 327 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 328 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 329 Pairwise alignment of phage genomes (Easyfig, BLASTn), the colour bars between genomes indicate percent pairwise 330 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 331 phages identified in this study and deposited in GenBank. 332

Escherichia phage lidtsurEscherichia phage smaasurEnterobacteria phage J8-65

Escerichia phage glasurEscherichia phage forsur

Escerichia phage usurEscherichia phage megetsurEschericia phage mellemsur

Escherichia phage altidsurEscherichia phage aldrigsur

Pantoea phage LIMEzeroPantoea phage LIMElight

Pseudomonas phage phiKMVPseudomonas phage LKA1-

0.10

Organism 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1: MG_126_C2_Ent_J8-65 100 22 22 17 15 15 13 13 15 14 1 0 0 0

2: MG_039_C2_Ent_J8-65 22 100 84 25 25 25 27 28 23 23 0 0 0 0

3: Enterobacteria_phage_J8-65 23 85 100 27 25 25 27 28 24 24 1 0 0 0

4: MG_168_C20_Ent_J8-65 17 24 26 100 88 83 46 46 44 43 1 0 0 0

5: MG_200_C27_Ent_J8-65 16 24 25 88 100 85 46 46 42 41 1 0 0 0

6: MG_116_C10_Ent_J8-65 16 24 25 84 86 100 48 47 43 43 1 0 0 0

7: MG_130_C3_Ent_J8-65 14 26 26 47 47 48 100 90 67 65 0 0 0 0

8: MG_144_C2_Ent_J8-65 14 28 28 47 48 48 93 100 68 69 1 0 0 0

9: MG_140_C1_Ent_J8-65 15 23 23 44 43 44 67 66 100 93 0 0 0 0

10: MG_182_C43_Ent_J8+65 14 23 23 44 43 43 66 67 93 100 1 0 0 0

11: Pantoea LIMEzero 1 0 1 1 1 1 0 0 0 0 100 0 0 0

12: LIMElight 0 0 0 0 0 0 0 0 0 0 0 100 0 0

13: phiKMV 0 0 0 0 0 0 0 0 0 0 0 0 100 0

14: LKA1 0 0 0 0 0 0 0 0 0 0 0 0 0 100

Heat plot for: Comparison[podos_genuse_290319], Alignment[a... file:///Users/au574427/Desktop/Coli_Podos/podos_genus_29031...

1 of 1 29/03/2019, 18.17

Organism 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1: MG_126_C2_Ent_J8-65 100 70 69 68 68 68 66 66 66 66 47 19 19 20

2: MG_039_C2_Ent_J8-65 72 100 91 72 71 71 72 72 71 71 47 19 20 20

3: Enterobacteria_phage_J8-65 70 91 100 72 72 72 73 72 72 72 47 19 20 20

4: MG_168_C20_Ent_J8-65 67 69 70 100 92 89 77 75 77 77 46 19 19 20

5: MG_200_C27_Ent_J8-65 67 69 70 92 100 90 77 77 77 77 46 19 19 20

6: MG_116_C10_Ent_J8-65 68 70 70 90 91 100 77 77 77 77 47 19 19 20

7: MG_130_C3_Ent_J8-65 66 71 71 78 78 77 100 92 84 84 47 19 19 20

8: MG_144_C2_Ent_J8-65 68 73 73 78 79 79 95 100 86 86 48 19 19 20

9: MG_140_C1_Ent_J8-65 65 70 71 77 77 77 84 83 100 95 46 19 19 20

10: MG_182_C43_Ent_J8+65 65 69 70 77 77 76 84 83 95 100 47 19 19 20

11: Pantoea LIMEzero 45 45 45 46 46 46 46 46 46 46 100 19 19 20

12: LIMElight 19 19 19 19 19 19 19 19 19 19 19 100 19 20

13: phiKMV 19 19 19 19 19 19 19 19 19 19 19 19 100 27

14: LKA1 20 20 20 20 20 20 20 20 20 20 20 20 27 100

Heat plot for: Comparison[podos_tblastx_290319], Alignment[a... file:///Users/au574427/Desktop/Coli_Podos/podos_tblastx_2903...

1 of 1 29/03/2019, 18.23

100%

64%

Term.TailspikePort.RNA pol.DNA pol. Tail Virion VirionLidtsur

Smaasur

J8-65

Glasur

Forsur

Usur

Megetsur

Mellemsur

Altidsur

Aldrigsur

LIMEzero

LIMElight

phiKMV

LKA1

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 23: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

11

Cluster XIV and J8-65 form a diverse monophyletic clade, with a substantial amount of deletions and 333

insertions between them, subdividing into three sub-clusters with intra-Gegenees scores ≤28% (Figure 5a, d). 334

Phage lidtsur is singled-out and also codes for a unique version of tailspike colanidase, smaasur resembles J8-335

65 phage and the rest group together (Figure 5). Still, the three sub-clusters have for a large part conserved AA 336

sequences (66-72%, Gegenees) (Figure 5b, c, d). Interestingly, this also applies to Limezero, with whom they 337

have a Gegenees NT score ≤1%, but an AA similarity of 45-48%, supporting the phylogenetic grouping of 338

Limezero and cluster XIV (Figure 5a, b, c). Based on phylogeny, limited NT and low AA similarities it is evident 339

that there is a very distant relation between cluster XIV (and J8-65) and the phikmvviruses (Figure 5). Hence, 340

cluster XIV (and J8-65) are not immediately, according to the ICTV guidelines, considered phikmvviruses. 341

However, the Phikmvvirus already includes phages with minuscule DNA homology infecting divergent hosts 342

[72,73]. The genus delimitation is based on overall genome architecture with the location of a single-subunit 343

RNA polymerase gene adjacent to the structural genes, and not in the early region as in T7-like phages [16]. 344

Another characteristic feature of the phikmvvirus is the presence of direct terminal repeats, though not yet 345

verified in all members [72]. Read abundance in non-coding regions in genomic ends of cluster XIV genomes 346

suggests they also have direct terminal repeats of a few hundred bp. In conclusion, we consider the cluster XIV 347

phages (and J8-65) to be of the same lineage as the phikmvviruses, but contemplate that this genus may at 348

some point be divided into at least two independent genera, as we learn more of the nature of the genes which 349

distinguish cluster XIV from the other phikmvviruses on both NT and AA level and account for more 50% of 350

their genomes. 351

3.7. Unclassified bacterial viruses represent two novel lineages 352

The four novel phages sortsyn of cluster XIII and the three singletons sortsne, sortkaff (41.9-42.5 kb, 59-353

60% GC, no tRNAs) and skarpretter (42.0 kb, 55.8% GC, no tRNAs) all have small genomes and high GC 354

contents (Figure 3) and are suggested to represent two distinct novel lineages. They only share considerable 355

(≥6%, BLAST) NT similarity with four phages, Enterobacteria phage IME_EC2 (KF591601)(41.5 kb, 59.2% GC, 356

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 24: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

12

no tRNAs) isolated from hospital sewage [74], Salmonella phage lumpael (MK125141)(41.4 kb, 59.5% GC, no 357

tRNAs) isolated from wastewater of the same sample-set in a previous study, Klebsiella phage 358

vB_KpnS_IME279 (MF614100) (42.5 kb, 59.3% GC, no tRNAs) and Escherichia phage C130_2 (MH363708)(41.7 359

kb, 55.4% GC, no tRNAs) isolated from cheese [75]. 360

(a) (b) (c)

(d)

Figure 6 (a) Phylogenetic tree (Maximum log Likelihood: -8023.43, large terminase subunit), scalebar: substitutions per 361 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 362 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 363 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 364 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 365 phages identified in this study and deposited in GenBank. 366

367

Curiously, IME_EC2 is a confirmed (TEM) Podoviridae, with a short non-contractile tail [74], while C130_2 368

is a confirmed (TEM) Myoviridae with a long contractile tail [75]. Lumpael and vB_KpnS_IME279 are 369

uncharacterised, yet both are assigned as Podoviridae in GenBank [17]. Although sortsne, sortkaff, sortsyn, 370

Escherichia phage Sortkaff

Klebsiella phage vB KpnS IME279

Escherichia phage Sortsne

Escherichia phage sortsyn

Enterobacteria phage IME EC2

Salmonella phage Lumpael

Escherichia phage Skarpretter

Escherichia phage C130 2

0.050

Organism 1 2 3 4 5 6 7 8

1: Sortkaff_ny start 100 86 13 6 6 6 0 0

2: MF614100_IME279 86 100 14 5 5 6 0 0

3: Escherichia phage Sortsne 13 14 100 13 15 11 0 0

4: Sortsyn_new start 7 5 13 100 87 42 0 0

5: KF591601_IME_EC2 6 5 15 88 100 40 0 0

6: MK125141_Lumpael 7 7 12 42 40 100 0 0

7: MK105855_Skarpretter_new start 0 0 0 0 0 0 100 1

8: MH363708_C130_2 0 0 0 0 0 0 1 100

Heat plot for: Comparison[Sortsne_and_the_gang_200319], Ali... file:///Users/au574427/Desktop/Coli_unclass/Skarpretter/skarp_...

1 of 1 30/03/2019, 07.37

Organism 1 2 3 4 5 6 7 8

1: Sortkaff_ny start 100 91 60 50 50 50 33 32

2: MF614100_IME279 91 100 60 49 50 49 33 32

3: Escherichia phage Sortsne 61 61 100 54 55 54 33 31

4: Sortsyn_new start 51 50 54 100 91 67 33 33

5: KF591601_IME_EC2 51 50 56 92 100 67 33 33

6: MK125141_Lumpael 51 51 55 68 68 100 33 30

7: MK105855_Skarpretter_new start 33 33 32 33 33 32 100 43

8: MH363708_C130_2 32 32 31 33 33 30 43 100

Heat plot for: Comparison[skarp_and_freinds_tblastx], Alignmen... file:///Users/au574427/Desktop/Coli_unclass/Skarpretter/skarp_t...

1 of 1 30/03/2019, 07.38

100%

64%

Holin

Lysis

Methylationhflc Term.PortalTailspikeHelica. Coat

Sortkaff

vB_KonS_IME279

Sortsne

Sortsyn

IME_EC2

Lumpael

Skarpretter

C130_2

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 25: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

13

vB_KpnS_IME279, IME_EC2 and lumpael form a monophyletic clade, the Gegenees score (5-15%) between 371

sub-clusters is surprisingly low (Figure 6a, b). Still, these six phages have comparable genome sizes (41.5-42.5 372

kb) and organisation, including similar relatively small-sized structural proteins, dam and dcm genes, equally 373

high GC contents (59.5±0.5%) and relatively high AA similarities (49-91%) (Figure 6a, c, Table 2). Hence, they 374

are clearly of the same lineage and likely to resemble IME_EC2 in having Podoviridae morphology. This group 375

is also clearly distinct from all other known phages (<5%, BLAST) and as such constitute a novel genus, with 376

a delimitation to be determined by future physical characterisation. Skarpretter and C130_2 form a 377

monophyletic clade and both have slightly, though not significantly (P = 0.38), lower GC contents (55.6±0.2%), 378

indicating a common ancestry and host. The conserved genome organisation and clustering of skarpretter 379

with the other Podoviridae (Figure 1, 6d), suggests skarpretter also has Podoviridae morphology. Still, its closest 380

relative C130_2 is described as a Myoviridae [75], leaving the true morphology of skarpretter a conundrum 381

only resolved by TEM imaging. According to ICTV guidelines both skarpretter and C130_2 are representatives 382

of novel linages, as they are both clearly distinct from all other known phages. Skarpretter and C130_2 have 383

very low NT (1% Gegenees, 38% BLAST) and AA similarities (43%) (Figure 6b, c). Indeed, skarpretter has ≤20% 384

NT similarity (BLAST) with all other published phages. In addition, skarpretter codes for a putative hflc gene 385

possibly inhibiting proteolysis of essential phage proteins, located in the replication region (Figure 6d). This 386

gene, has to our knowledge never before been observed in phages, but occurs in almost all proteobacteria, 387

including E. coli MG1655 [76]. 388

4. Conclusion 389

By screening Danish wastewater, we identified no less than 104 unique Escherichia MG1655 phages, but 390

predict the species richness to be at least in the range of 160-420, and even higher if including phages infecting 391

other Escherichia species, though it is expected to fluctuate drastically over time and both within and between 392

treatment facilities. Among the unique phages 91 represent novel phage species of at least four different 393

families Myoviridae, Siphoviridae, Podoviridae and Microviridae. The diversity of these phages is striking, they 394

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 26: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

14

vary greatly in genome size and have a very broad GC-content range - possibly prompted by former or current 395

alternate hosts. These findings add to our growing understanding of phage ecology and diversity, and through 396

classification of these many phages we come yet another step closer to a more refined taxonomic 397

understanding of phages. Furthermore, the numerous and diverse phages isolated in this study, all lytic to the 398

same single strain, serve as an excellent opportunity to learn important phage-host interactions in future 399

studies. These include, but are not limited to lysogen induced phage immunity, host-range and anti-RE 400

systems. Finally, apart from substantial contributions to known genera, we came upon several unclassified 401

lineages, some completely novel, which in our opinion constitute novel phage genera. We consider sortregn, 402

sortkaff, sortsyn and sortsne together with lumpael, vB_kpnS_IME278 and IME_EC2 to constitute a novel 403

genus within the Podoviridae. The Myoviridae flopper joins an interesting group of not yet classified phages 404

with Myoviridae morphology and Autographvirinae characteristics, just as cluster VII together with five 405

unclassified phages represent a novel unclassified genus within the Myoviridae. Lastly, we consider the 406

Siphoviridae halfdan and the unclassified bacterial virus skarpretter to be the first sequenced representatives of 407

each their novel genus. In conclusion, this study shows that uncharted territory still remains for even well-408

studied phage-host couples. 409

Supplementary Materials: Figure S1: Microviridae, Figure S2: Hanriverviruses, Figure S3: Dhillonviruses, Table 1 410

Auxiliary metabolism genes. 411

Funding: This research was funded by Villum Experiment Grant 17595, Aarhus University Research Foundation AUFF 412

Grant E-2015-FLS-7-28 to Witold Kot and Human Frontier Science Program RGP0024/2018 413

Acknowledgments: This study had not been possible without the much-appreciated contribution by the many members 414

of the Danish Water and Wastewater Association (DANVA), who kindly supplied us with time-series of wastewater 415

samples from their treatment facilities. A special thanks to the Billund and Grindsted treatment facilities of Billund Vand 416

& Energi, Lynetten, Avedøre and Damhusåen treatment facilities of BIOFOS, Kolding treatment facility of BlueKolding, 417

Esbjerg Øst, Esbjerg Vest, Ribe, Varde og Skovlund treatment facilities of DIN Forsyning, Drøsbro, Hadsten, Hammel, 418

Hinnerup and Voldum treatment facilities of Favrskov Forsyning, Haerning treatment facility of Herning Vand, Hillerød 419

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 27: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

15

treatment facility of Hillerød Forsyning,Lemvig treatment facility of Lemvig Vand og Spildevand, Bevtoft, Gram, 420

Haderslev, Halk, Jegerup, Nustrup, Over Jerstal, Skovby, Skrydstrup, Sommersted, Vojens and Årøsund treatment 421

facilities of Provas, Marselisborg, Egå and Viby treatment facilities of Aarhus vand, Hedensted, Juelsminde and Tørring 422

treatment facilities of Hedensted Spildevand, Ejby Mølle, Bogense, Hofmansgave, Hårslev, Nordvest, Nordøst, Otterup 423

and Søndersø treatment facilities of VandCenterSyd, Øst and Vest treatment facilities of Aalborg Forsyning and finally 424

Helsingør treatment facility of Forsyning Helsingør. 425

Competing interests: The authors declare no competing interests. 426

References 427

1. Weinbauer, M.G.; Rassoulzadegan, F. Are viruses driving microbial diversification and diversity? Environ. 428

Microbiol. 2003, 6, 1–11. 429

2. Weitz, J.S.; Wilhelm, S.W. Ocean viruses and their effects on microbial communities and biogeochemical cycles. 430

F1000 Biol. Rep. 2012, 4, 17. 431

3. Crummett, L.T.; Puxty, R.J.; Weihe, C.; Marston, M.F.; Martiny, J.B.H. The genomic content and context of 432

auxiliary metabolic genes in marine cyanomyoviruses. Virology 2016, 499, 219–229. 433

4. Boyd, E.F. Bacteriophage-Encoded Bacterial Virulence Factors and Phage–Pathogenicity Island Interactions. Adv. 434

Virus Res. 2012, 82, 91–118. 435

5. Fortier, L.-C.; Sekulovic, O. Importance of prophages to evolution and virulence of bacterial pathogens. 436

https://doi.org/10.4161/viru.24498 2013. 437

6. Volkova, V. V; Lu, Z.; Besser, T.; Gröhn, Y.T. Modeling the infection dynamics of bacteriophages in enteric 438

Escherichia coli: estimating the contribution of transduction to antimicrobial gene spread. Appl. Environ. 439

Microbiol. 2014, 80, 4350–62. 440

7. Bearson, B.L.; Allen, H.K.; Brunelle, B.W.; Lee, I.S.; Casjens, S.R.; Stanton, T.B. The agricultural antibiotic 441

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 28: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

16

carbadox induces phage-mediated gene transfer in Salmonella. Front. Microbiol. 2014, 5, 52. 442

8. Grose, J.H.; Casjens, S.R. Understanding the enormous diversity of bacteriophages: the tailed phages that infect 443

the bacterial family Enterobacteriaceae. Virology 2014, 468–470, 421–443. 444

9. Dykhuizen, D. Species Numbers in Bacteria. Proc. Calif. Acad. Sci. 2005, 56, 62. 445

10. Hatfull, G.F.; Pedulla, M.L.; Jacobs-Sera, D.; Cichon, P.M.; Foley, A.; Ford, M.E.; Gonda, R.M.; Houtz, J.M.; 446

Hryckowian, A.J.; Kelchner, V.A.; et al. Exploring the Mycobacteriophage Metaproteome: Phage Genomics as an 447

Educational Platform. PLoS Genet. 2006, 2, e92. 448

11. Hatfull, G.F. Mycobacteriophages: Windows into Tuberculosis. PLoS Pathog. 2014, 10, e1003953. 449

12. Hatfull, G.F.; Jacobs-Sera, D.; Lawrence, J.G.; Pope, W.H.; Russell, D.A.; Ko, C.C.; Weber, R.J.; Patel, M.C.; 450

Germane, K.L.; Edgar, R.H.; et al. Comparative Genomic Analysis of 60 Mycobacteriophage Genomes: Genome 451

Clustering, Gene Acquisition, and Gene Size. J. Mol. Biol. 2010, 397, 119–143. 452

13. Dedrick, R.M.; Jacobs-Sera, D.; Bustamante, C.A.G.; Garlena, R.A.; Mavrich, T.N.; Pope, W.H.; Reyes, J.C.C.; 453

Russell, D.A.; Adair, T.; Alvey, R.; et al. Prophage-mediated defence against viral attack and viral counter-454

defence. Nat. Microbiol. 2017, 2, 16251. 455

14. Jacobs-Sera, D.; Marinelli, L.J.; Bowman, C.; Broussard, G.W.; Guerrero Bustamante, C.; Boyle, M.M.; Petrova, 456

Z.O.; Dedrick, R.M.; Pope, W.H.; Science Education Alliance Phage Hunters Advancing Genomics And 457

Evolutionary Science Sea-Phages Program, S.E.A.P.H.A.G. and E.S. (SEA-P.; et al. On the nature of 458

mycobacteriophage diversity and host preference. Virology 2012, 434, 187–201. 459

15. Hatfull, G.F. The Secret Lives of Mycobacteriophages. Adv. Virus Res. 2012, 82, 179–288. 460

16. Lefkowitz, E.J.; Dempsey, D.M.; Hendrickson, R.C.; Orton, R.J.; Siddell, S.G.; Smith, D.B. Virus taxonomy: the 461

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 29: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

17

database of the International Committee on Taxonomy of Viruses (ICTV). Nucleic Acids Res. 2018, 46, D708–D717. 462

17. Benson, D.A.; Cavanaugh, M.; Clark, K.; Karsch-Mizrachi, I.; Lipman, D.J.; Ostell, J.; Sayers, E.W. GenBank. 463

Nucleic Acids Res. 2017, 45, D37–D42. 464

18. Lawrence, J.G.; Hatfull, G.F.; Hendrix, R.W. Imbroglios of viral taxonomy: genetic exchange and failings of 465

phenetic approaches. J. Bacteriol. 2002, 184, 4891–905. 466

19. Aiewsakun, P.; Adriaenssens, E.M.; Lavigne, R.; Kropinski, A.M.; Simmonds, P. Evaluation of the genomic 467

diversity of viruses infecting bacteria, archaea and eukaryotes using a common bioinformatic platform: steps 468

towards a unified taxonomy. J. Gen. Virol. 2018, 99, 1331–1343. 469

20. Nelson, D. Phage taxonomy: we agree to disagree. J. Bacteriol. 2004, 186, 7029–31. 470

21. Adriaenssens, E.; Brister, J.R. How to Name and Classify Your Phage: An Informal Guide. Viruses 2017, 9. 471

22. Korf; Meier-Kolthoff; Adriaenssens; Kropinski; Nimtz; Rohde; van Raaij; Wittmann Still Something to Discover: 472

Novel Insights into&#13; Escherichia coli Phage Diversity and Taxonomy. Viruses 2019, 11, 454. 473

23. Jurczak-Kurek, A.; Gąsior, T.; Nejman-Faleńczyk, B.; Bloch, S.; Dydecka, A.; Topka, G.; Necel, A.; Jakubowska-474

Deredas, M.; Narajczyk, M.; Richert, M.; et al. Biodiversity of bacteriophages: morphological and biological 475

properties of a large group of phages isolated from urban sewage. Sci. Rep. 2016, 6, 34338. 476

24. Sambrook, J.F.; Russell, D.W.; Editors Molecular cloning: A laboratory manual, third edition.; 2nd ed.; Cold Spring 477

Harbor Laboratory, 2000; ISBN 0 87969 309 6. 478

25. Kot, W.; Vogensen, F.K.; Sørensen, S.J.; Hansen, L.H. DPS – A rapid method for genome sequencing of DNA-479

containing bacteriophages directly from a single plaque. J. Virol. Methods 2014, 196, 152–156. 480

26. Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; 481

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 30: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

18

Pham, S.; Prjibelski, A.D.; et al. SPAdes: a new genome assembly algorithm and its applications to single-cell 482

sequencing. J. Comput. Biol. 2012, 19, 455–77. 483

27. Brettin, T.; Davis, J.J.; Disz, T.; Edwards, R.A.; Gerdes, S.; Olsen, G.J.; Olson, R.; Overbeek, R.; Parrello, B.; Pusch, 484

G.D.; et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom 485

annotation pipelines and annotating batches of genomes. Sci. Rep. 2015, 5, 8365. 486

28. Besemer, J.; Borodovsky, M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. 487

Nucleic Acids Res. 2005, 33, W451–W454. 488

29. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 489

215, 403–410. 490

30. Soding, J.; Biegert, A.; Lupas, A.N. The HHpred interactive server for protein homology detection and structure 491

prediction. Nucleic Acids Res. 2005, 33, W244–W248. 492

31. Finn, R.D.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Mistry, J.; Mitchell, A.L.; Potter, S.C.; Punta, M.; Qureshi, M.; 493

Sangrador-Vegas, A.; et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids 494

Res. 2016, 44, D279–D285. 495

32. Enrique González-Tortuero1, Thomas David Sean Sutton1, Vimalkumar Velayudhan1, 4 Andrey Nikolaevich 496

Shkoporov1, Lorraine Anne Draper1, Stephen Robert Stockdale1, 2, Reynolds 5 Paul Ross1, 3, Colin Hill1, 3, 6 497

VIGA: a sensitive, precise and automatic de novo VIral Genome Annotator. bioRxiv 2018, 277509. 498

33. Zankari, E.; Hasman, H.; Cosentino, S.; Vestergaard, M.; Rasmussen, S.; Lund, O.; Aarestrup, F.M.; Larsen, M.V. 499

Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 2012, 67, 2640–4. 500

34. Kleinheinz, K.A.; Joensen, K.G.; Larsen, M.V. Applying the ResFinder and VirulenceFinder web-services for easy 501

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 31: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

19

identification of acquired antibiotic resistance and E. coli virulence genes in bacteriophage and prophage 502

nucleotide sequences. Bacteriophage 2014, 4, e27943. 503

35. Joensen, K.G.; Scheutz, F.; Lund, O.; Hasman, H.; Kaas, R.S.; Nielsen, E.M.; Aarestrup, F.M. Real-Time Whole-504

Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli. 505

J. Clin. Microbiol. 2014, 52, 1501–1510. 506

36. Roberts, R.J.; Vincze, T.; Posfai, J.; Macelis, D. REBASE—a database for DNA restriction and modification: 507

enzymes, genes and genomes. Nucleic Acids Res. 2015, 43, D298–D299. 508

37. Kieft, K.; Zhou, Z.; Anantharaman, K. VIBRANT: Automated recovery, annotation and curation of microbial 509

viruses, and evaluation of virome function from genomic sequences. bioRxiv 2019, 855387. 510

38. Adriaenssens, E.; Brister, J.R. How to Name and Classify Your Phage: An Informal Guide. Viruses 2017, 9. 511

39. Ågren, J.; Sundström, A.; Håfström, T.; Segerman, B. Gegenees: Fragmented Alignment of Multiple Genomes for 512

Determining Phylogenomic Distances and Genetic Signatures Unique for Specified Target Groups. PLoS One 513

2012, 7, e39107. 514

40. Kumar, S.; Stecher, G.; Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger 515

Datasets. Mol. Biol. Evol. 2016, 33, 1870–1874. 516

41. Lopes, A.; Tavares, P.; Petit, M.-A.; Guérois, R.; Zinn-Justin, S. Automated classification of tailed bacteriophages 517

according to their neck organization. BMC Genomics 2014, 15, 1027. 518

42. Mercanti, D.J.; Rousseau, G.M.; Capra, M.L.; Quiberoni, A.; Tremblay, D.M.; Labrie, S.J.; Moineau, S. Genomic 519

Diversity of Phages Infecting Probiotic Strains of Lactobacillus paracasei. Appl. Environ. Microbiol. 2016, 82, 95–520

105. 521

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 32: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

20

43. Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 522

2004, 32, 1792–7. 523

44. Tamura, K.; Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial 524

DNA in humans and chimpanzees. Mol. Biol. Evol. 1993, 10, 512–26. 525

45. Sullivan, M.J.; Petty, N.K.; Beatson, S.A. Easyfig: a genome comparison visualizer. Bioinformatics 2011, 27, 1009. 526

46. Hsieh, T.C.; Ma, K.H.; Chao, A. iNEXT: an R package for rarefaction and extrapolation of species diversity (Hill 527

numbers). Methods Ecol. Evol. 2016, 7, 1451–1456. 528

47. Chao, A.; Gotelli, N.J.; Hsieh, T.C.; Sander, E.L.; Ma, K.H.; Colwell, R.K.; Ellison, A.M. Rarefaction and 529

extrapolation with Hill numbers: A framework for sampling and estimation in species diversity studies. Ecol. 530

Monogr. 2014, 84, 45–67. 531

48. RStudio Team RStudio: Integrated Development for R. 2016. 532

49. Rocha, E.P.C.; Danchin, A. Base composition bias might result from competition for metabolic resources. Trends 533

Genet. 2002, 18, 291–294. 534

50. Motlagh, A.M.; Bhattacharjee, A.S.; Coutinho, F.H.; Dutilh, B.E.; Casjens, S.R.; Goel, R.K. Insights of Phage-Host 535

Interaction in Hypersaline Ecosystem through Metagenomics Analyses. Front. Microbiol. 2017, 8, 352. 536

51. Thomas, J.A.; Orwenyo, J.; Wang, L.-X.; Black, L.W. The Odd &quot;RB&quot; Phage-Identification of 537

Arabinosylation as a New Epigenetic Modification of DNA in T4-Like Phage RB69. Viruses 2018, 10. 538

52. Purohit, S.; Bestwick, K.; Lasser, G.W.; Rogers, C.M.; Mathews, C.K. T4 Phage-coded Dihydrofolate Reductase. 539

1981, 256, 9121–9125. 540

53. Hutinet, G.; Kot, W.; Cui, L.; Hillebrand, R.; Balamkundu, S.; Gnanakalai, S.; Neelakandan, R.; Carstens, A.B.; 541

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 33: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

21

Lui, C.F.; Tremblay, D.; et al. 7-Deazaguanine modifications protect phage DNA from host restriction systems. 542

Nature 2019, 1–12. 543

54. Carstens, A.B.; Kot, W.; Lametsch, R.; Neve, H.; Hansen, L.H. Characterisation of a novel enterobacteria phage, 544

CAjan, isolated from rat faeces. Arch. Virol. 2016, 161, 2219–2226. 545

55. Shkoporov, A.N.; Clooney, A.G.; Sutton, T.D.S.; Ryan, F.J.; Daly, K.M.; Nolan, J.A.; McDonnell, S.A.; Khokhlova, 546

E. V.; Draper, L.A.; Forde, A.; et al. The Human Gut Virome Is Highly Diverse, Stable, and Individual Specific. 547

Cell Host Microbe 2019, 26, 527-541.e5. 548

56. Tsonos, J.; Adriaenssens, E.M.; Klumpp, J.; Hernalsteens, J.-P.; Lavigne, R.; De Greve, H. Complete genome 549

sequence of the novel Escherichia coli phage phAPEC8. J. Virol. 2012, 86, 13117–8. 550

57. Schwarzer, D.; Buettner, F.F.R.; Browning, C.; Nazarov, S.; Rabsch, W.; Bethe, A.; Oberbeck, A.; Bowman, V.D.; 551

Stummeyer, K.; Mühlenhoff, M.; et al. A multivalent adsorption apparatus explains the broad host range of 552

phage phi92: a comprehensive genomic and structural analysis. J. Virol. 2012, 86, 10384–98. 553

58. Schwarzer, D.; Browning, C.; Stummeyer, K.; Oberbeck, A.; Mühlenhoff, M.; Gerardy-Schahn, R.; Leiman, P.G. 554

Structure and biochemical characterization of bacteriophage phi92 endosialidase. Virology 2015, 477, 133–143. 555

59. Liu, H.; Geagea, H.; Rousseau, G.M.; Labrie, S.J.; Tremblay, D.M.; Liu, X.; Moineau, S. Characterization of the 556

Escherichia coli Virulent Myophage ST32. Viruses 2018, 10. 557

60. Jamalludeen, N.; Kropinski, A.M.; Johnson, R.P.; Lingohr, E.; Harel, J.; Gyles, C.L. Complete genomic sequence 558

of bacteriophage phiEcoM-GJ1, a novel phage that has myovirus morphology and a podovirus-like RNA 559

polymerase. Appl. Environ. Microbiol. 2008, 74, 516–25. 560

61. Thompson, D.W.; Casjens, S.R.; Sharma, R.; Grose, J.H. Genomic comparison of 60 completely sequenced 561

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 34: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

22

bacteriophages that infect Erwinia and/or Pantoea bacteria. Virology 2019, 535, 59–73. 562

62. Born, Y.; Fieseler, L.; Marazzi, J.; Lurz, R.; Duffy, B.; Loessner, M.J. Novel Virulent and Broad-Host-Range 563

Erwinia amylovora Bacteriophages Reveal a High Degree of Mosaicism and a Relationship to Enterobacteriaceae 564

Phages. Appl. Environ. Microbiol. 2011, 77, 5945–5954. 565

63. Lim, J.-A.; Shin, H.; Lee, D.H.; Han, S.-W.; Lee, J.-H.; Ryu, S.; Heu, S. Complete genome sequence of the 566

Pectobacterium carotovorum subsp. carotovorum virulent bacteriophage PM1. Arch. Virol. 2014, 159, 2185–2187. 567

64. Lavigne, R.; Seto, D.; Mahadevan, P.; Ackermann, H.-W.; Kropinski, A.M. Unifying classical and molecular 568

taxonomic classification: analysis of the Podoviridae using BLASTP-based tools. Res. Microbiol. 2008, 159, 406–569

414. 570

65. Doore, S.M.; Fane, B.A. The microviridae: Diversity, assembly, and experimental evolution. Virology 2016, 491, 571

45–55. 572

66. Jun, J.W.; Kim, J.H.; Shin, S.P.; Han, J.E.; Chai, J.Y.; Park, S.C. Characterization and complete genome sequence of 573

the Shigella bacteriophage pSf-1. Res. Microbiol. 2013, 164, 979–986. 574

67. Essoh, C.; Latino, L.; Midoux, C.; Blouin, Y.; Loukou, G.; Nguetta, S.-P.A.; Lathro, S.; Cablanmian, A.; Kouassi, 575

A.K.; Vergnaud, G.; et al. Investigation of a Large Collection of Pseudomonas aeruginosa Bacteriophages 576

Collected from a Single Environmental Source in Abidjan, Côte d’Ivoire. PLoS One 2015, 10, e0130548. 577

68. Turner, D.; Wand, M.E.; Briers, Y.; Lavigne, R.; Sutton, J.M.; Reynolds, D.M. Characterisation and genome 578

sequence of the lytic Acinetobacter baumannii bacteriophage vB_AbaS_Loki. PLoS One 2017, 12, e0172303. 579

69. Ma, Y.; Li, E.; Qi, Z.; Li, H.; Wei, X.; Lin, W.; Zhao, R.; Jiang, A.; Yang, H.; Yin, Z.; et al. Isolation and molecular 580

characterisation of Achromobacter phage phiAxp-3, an N4-like bacteriophage. Sci. Rep. 2016, 6, 24776. 581

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 35: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

23

70. Bryan, M.J.; Burroughs, N.J.; Spence, E.M.; Clokie, M.R.J.; Mann, N.H.; Bryan, S.J. Evidence for the intense 582

exchange of MazG in marine cyanophages by horizontal gene transfer. PLoS One 2008, 3, 1–12. 583

71. Lavigne, R.; Burkal’tseva, M. V; Robben, J.; Sykilinda, N.N.; Kurochkina, L.P.; Grymonprez, B.; Jonckx, B.; 584

Krylov, V.N.; Mesyanzhinov, V. V; Volckaert, G. The genome of bacteriophage φKMV, a T7-like virus infecting 585

Pseudomonas aeruginosa. Virology 2003, 312, 49–59. 586

72. Adriaenssens, E.M.; Ceyssens, P.-J.; Dunon, V.; Ackermann, H.-W.; Van Vaerenbergh, J.; Maes, M.; De Proft, M.; 587

Lavigne, R. Bacteriophages LIMElight and LIMEzero of Pantoea agglomerans, belonging to the &quot;phiKMV-588

like viruses&quot;. Appl. Environ. Microbiol. 2011, 77, 3443–50. 589

73. Ceyssens, P.-J.; Lavigne, R.; Mattheus, W.; Chibeu, A.; Hertveldt, K.; Mast, J.; Robben, J.; Volckaert, G. Genomic 590

analysis of Pseudomonas aeruginosa phages LKD16 and LKA1: establishment of the phiKMV subgroup within 591

the T7 supergroup. J. Bacteriol. 2006, 188, 6924–31. 592

74. Hua, Y.; An, X.; Pei, G.; Li, S.; Wang, W.; Xu, X.; Fan, H.; Huang, Y.; Zhang, Z.; Mi, Z.; et al. Characterization of 593

the morphology and genome of an Escherichia coli podovirus. Arch. Virol. 2014, 159, 3249–3256. 594

75. Sváb, D.; Falgenhauer, L.; Rohde, M.; Chakraborty, T.; Tóth, I. Complete genome sequence of C130_2, a novel 595

myovirus infecting pathogenic Escherichia coli and Shigella strains. Arch. Virol. 2019, 164, 321–324. 596

76. Bandyopadhyay, K.; Parua, P.K.; Datta, A.B.; Parrack, P. Escherichia coli HflK and HflC can individually inhibit 597

the HflB (FtsH)-mediated proteolysis of λCII in vitro. Arch. Biochem. Biophys. 2010, 501, 239–243. 598

599

600

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 36: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

24

SUPPLEMENTARY FIGURES AND TABLES 601

Figure S1 Microviridae 602

(a) (b) (c)

(d)

Figure S4 (a) Phylogenetic tree (Maximum log Likelihood: -6065.3, DNA replication protein gpA), scalebar: substitutions 603 per site (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 604 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 605 Pairwise alignment of phage genomes, the colour bars between genomes indicate percent pairwise similarity as illustrated 606 by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for phages identified in this 607 study and deposited in GenBank. 608 609

Coliphage G4Coliphage ID52

Escherichia phage SECphi17Escherichia phage Lillemer

Escherichia phage LilletoEscherichia phage Lilleduescherichia phage Lilleput

Escherichia phage LilleenEnterobacteria phage alpha3Escherichia phage Lilleven

Enterobacteria phage St-10.050

Organism 1 2 3 4 5 6 7 8 9 10 11

1: NC_001420_G4 100 62 20 19 19 21 17 21 1 1 1

2: DQ079877_ID52 63 100 15 20 19 20 17 19 2 2 2

3: Escherichia_phage_SECphi17 22 17 100 74 72 73 76 72 1 2 2

4: MG_187_C21_Esch_SECphi17 21 20 76 100 92 93 91 88 0 0 0

5: MG_179_C1_Esch_SECphi17 21 19 74 92 100 92 88 86 1 2 1

6: MG_107_C2_Esch_SECphi17 24 20 76 93 92 100 91 87 1 1 1

7: MG_044_C1_Esch_SECphi17 18 17 76 90 88 90 100 90 1 1 1

8: MG_138_C36_Esch_SECphi17 22 20 74 89 88 89 92 100 1 1 1

9: NC_001330_alpha3 2 2 1 0 1 1 1 1 100 63 66

10: MG_038_C11_Ent_St-1 2 2 1 0 1 1 1 1 67 100 83

11: Enterobacteria_phage_St-1 1 2 1 0 1 1 1 1 66 84 100

Heat plot for: Comparison[blastn_micro_190619], Alignment[analysis], Analysis[*default*] file:///Users/au574427/Desktop/blastn_micro_190619.html

1 of 1 20/06/2019, 14.04

Organism 1 2 3 4 5 6 7 8 9 10 11

1: NC_001420_G4 100 88 62 63 62 62 62 63 42 42 42

2: DQ079877_ID52 88 100 62 62 62 63 62 63 42 41 41

3: Escherichia_phage_SECphi17 62 62 100 89 88 90 90 86 42 42 42

4: MG_187_C21_Esch_SECphi17 62 62 89 100 95 95 94 90 41 41 41

5: MG_179_C1_Esch_SECphi17 62 62 89 95 100 94 93 90 42 42 41

6: MG_107_C2_Esch_SECphi17 62 63 90 96 94 100 93 90 42 42 42

7: MG_044_C1_Esch_SECphi17 62 62 90 94 93 93 100 92 42 42 41

8: MG_138_C36_Esch_SECphi17 64 64 88 92 92 91 94 100 43 43 42

9: NC_001330_alpha3 39 39 39 39 38 39 39 39 100 86 87

10: MG_038_C11_Ent_St-1 39 39 39 39 39 39 39 39 87 100 90

11: GQ149088_St1 39 39 39 39 38 39 39 39 87 89 100

Heat plot for: Comparison[tblastx_micor_190619], Alignment[analysis], Analysis[*default*] file:///Users/au574427/Desktop/tblastx_micro-190619.html

1 of 1 20/06/2019, 14.04

67%

100%

Minor spike (gpH)

Minor spike (gpH)

Minor spike (gpH)

Minor spike (gpH)

Minor spike (gpH)

Minor spike (gpH)

gpG

gpG

gpG

gpG

gpG

gpG

Major coat (gpF)

Major coat (gpF)

Major coat (gpF)

Major coat (gpF)

Major coat (gpF)

Major coat (gpF)

gpD

gpD

gpD

gpD

gpD

gpD

gpC

gpC

gpC

gpC

gpC

gpC

DNA replication (gpA)

DNA replication (gpA)

DNA replication (gpA)

DNA replication (gpA)

DNA replication (gpA)

DNA replication (gpA)

G4

ID52

SECphi17

Lilleto

Lillemer

Lilledu

Lilleput

Lilleen

Alpha3

Lilleven

St-1

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 37: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

25

Figure S2 Hanrivervirus 610

1 Damhaus

2 Herni

3 Grams

4 Vojen

5 Aaroes

6 Aalborv

7 Haarsle

8 Egaa

9 Shigella

phage pSf-1 611

612

Figure S2 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). 613 and Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 614 similarity as illustrated by the colour bar in the lower right corner (Easyfig, BlASTn). Genome order is the same in both 615 analyses. 616

617

Organism 1 2 3 4 5 6 7 8 9

1: MG_010_C12_Shi_pSf-1 100 84 85 85 88 79 81 85 83

2: MG_086_C1_Sal_pSf-1 84 100 86 86 85 82 80 90 85

3: MG_106_C6_Shi_pSf-1 87 88 100 86 88 81 82 87 84

4: MG_142_C100_Shi_pSf-1 85 86 84 100 84 81 82 89 85

5: MG_146_C2_Shi_pSf-1 88 84 85 83 100 79 80 84 81

6: MG_182_C1_Shi_pSf-1 85 88 85 87 86 100 83 86 84

7: MG_187_C4_Shi_pSf-1 85 84 83 84 84 80 100 86 84

8: MG_193_C1_Shi_pSf-1 85 89 84 87 84 79 82 100 85

9: Shigella phage pSf-1 82 84 81 84 81 77 80 85 100

Heat plot for: Comparison[hanriver_tblastx-psf1_020419], Alig... file:///Users/au574427/Desktop/Coli_siphos/Hanriver/hanriver_p...

1 of 1 02/04/2019, 13.43

68%

100%

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 38: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

26

Figure S3 Dhillonvirus 618

619

Figure S3 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent 620 pairwise similarity as illustrated by the colour bar in the lower right corner (Easyfig, BlASTn). 621

622

70%

100%

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint

Page 39: Exploring the remarkable diversity of Escherichia coli · 1/19/2020  · 74 coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration

Table S1 Auxiliary metabolism genes 623

624

Gene Protein KO number Function in bacteriophages Count phages with this gene

rmlC dTDP-4-dehydrorhamnose 3,5-epimerase K01790 dtdp rhamnose biosynthesis 10 Cluster VI and VII

rmlA glucose-1-phosphate thymidylyltransferase K00973 dtdp rhamnose biosynthesis 8 Cluster VII, ukendt and tuntematon

rmlB dTDP-glucose 4,6-dehydratase K01710 dtdp rhamnose biosynthesis 8 Cluster VII, ukendt and tuntematon

rmlD dTDP-4-dehydrorhamnose reductase K00067 dtdp rhamnose biosynthesis 10 Cluster VI and VII

dcm DNA (cytosine-5)-methyltransferase K17398 DNA modification 5 Cluster VII

NAMPT nicotinamide phosphoribosyltransferase K03462 unknown 25 Felixounavirus, Suspvirus and Cluster VII

mec CysO sulfur-carrier protein]-S-L-cysteine

hydrolase K21140 tail fiber protein 20 Hanrivervirus and unclassifed Tunavirinae

folA dihydrofolate reductase K00287 de novo synthesis of pyrimidines,

thymidylic acid, and certain amino acids 33

Felixounavirus, Suspvirus, Krischvirus, Dhakavirus, Mosigvirus

and Tequatrovirus

aroAG 3-deoxy-7-phosphoheptulonate synthase K01626 unknown 3 Three of the four Mosigvirus

AUP1 UDP-N-acetylglucosamine diphosphorylase K23144 DNA modification 4 Mosigvirus

KdsD arabinose-5-phosphate isomerase K06041 DNA modification 4 Mosigvirus

dcm DNA (cytosine-5)-methyltransferase K00558 DNA modification 1 Pangalan

queC 7-cyano-7-deazaguanine synthase K06920 DNA modification 1 Skure

queE 7-carboxy-7-deazaguanine synthase K10026 DNA modification 1 Skure

folE GTP cyclohydrolase K01495 DNA modification 1 Skure

queD 6-carboxy-5,6,7,8-tetrahydropterin synthase K01737 DNA modification 1 Skure

625

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint