genome characteristics of a generalist marine … · genome characteristics of a generalist marine...

15
ORIGINAL ARTICLE Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1 , Laura E Griffin 1 , Kathy M Bowles 1 , Christof Meile 1 , Scott Gifford 1 , Carrie E Givens 1 , Erinn C Howard 1 , Eric King 1 , Clinton A Oakley 2 , Chris R Reisch 3 , Johanna M Rinta-Kanto 1 , Shalabh Sharma 1 , Shulei Sun 1 , Vanessa Varaljay 3 , Maria Vila-Costa 1,4 , Jason R Westrich 5 and Mary Ann Moran 1 1 Department of Marine Sciences, University of Georgia, Athens, GA, USA; 2 Department of Plant Biology, University of Georgia, Athens, GA, USA; 3 Department of Microbiology, University of Georgia, Athens, GA, USA; 4 Group of Limnology-Department of Continental Ecology, Centre d’Estudis Avanc¸ats de Blanes-CSIS, Catalunya, Spain and 5 Odum School of Ecology, University of Georgia, Athens, GA, USA Members of the marine Roseobacter lineage have been characterized as ecological generalists, suggesting that there will be challenges in assigning well-delineated ecological roles and biogeochemical functions to the taxon. To address this issue, genome sequences of 32 Roseobacter isolates were analyzed for patterns in genome characteristics, gene inventory, and individual gene/ pathway distribution using three predictive frameworks: phylogenetic relatedness, lifestyle strategy and environmental origin of the isolate. For the first framework, a phylogeny containing five deeply branching clades was obtained from a concatenation of 70 conserved single-copy genes. Somewhat surprisingly, phylogenetic tree topology was not the best model for organizing genome characteristics or distribution patterns of individual genes/pathways, although it provided some predictive power. The lifestyle framework, established by grouping isolates according to evidence for heterotrophy, photoheterotrophy or autotrophy, explained more of the gene repertoire in this lineage. The environment framework had a weak predictive power for the overall genome content of each strain, but explained the distribution of several individual genes/pathways, including those related to phosphorus acquisition, chemotaxis and aromatic compound degradation. Unassembled sequences in the Global Ocean Sampling metagenomic data independently verified this global-scale geographical signal in some Roseobacter genes. The primary findings emerging from this comparative genome analysis are that members of the lineage cannot be easily collapsed into just a few ecologically differentiated clusters (that is, there are almost as many clusters as isolates); the strongest framework for predicting genome content is trophic strategy, but no single framework gives robust predictions; and previously unknown homologs to genes for H 2 oxidation, proteorhodopsin-based phototrophy, xanthorhodpsin-based phototrophy, and CO 2 fixation by Form IC ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) expand the possible mechanisms for energy and carbon acquisition in this remarkably versatile bacterial lineage. The ISME Journal (2010) 4, 784–798; doi:10.1038/ismej.2009.150; published online 14 January 2010 Subject Category: integrated genomics and post-genomics approaches in microbial ecology Keywords: genome; marine; phylogeny; Roseobacter Introduction Since the discovery of their abundance in marine bacterioplankton communities two decades ago (Gonza ´lez and Moran, 1997), members of the marine Roseobacter lineage have emerged as important model organisms for marine microbial ecology. The group spans multiple described genera (at least 45), encompasses a comparatively large sequence varia- tion among 16S rRNA genes (up to 11%) and has a poorly resolved within-taxon phylogeny (Buchan et al., 2005; Wagner-Do ¨bler and Biebl, 2006; Brinkhoff et al., 2008). The recent availability of genome sequences (currently, 5 closed and 27 draft) from cultured members of the Roseobacter lineage pro- vides a detailed inventory of the metabolic and ecological capabilities of each strain (albeit limited by the accuracy of annotation), a basis for compara- tive analyses among strains, and a means to examine predictive frameworks for the lineage. Genome sequences of other ocean microbes have been used to explore niches and resource partition- ing within taxa. Multiple genome sequences and robust phylogenies for Prochlorococcus have re- vealed that the distribution of ecologically impor- tant gene systems (for example, light harvesting Received 5 October 2009; revised 30 November 2009; accepted 2 December 2009; published online 14 January 2010 Correspondence: MA Moran, Department of Marine Sciences, University of Georgia, Marine Science Building, Athens, GA 30602-3636, USA. E-mail: [email protected] The ISME Journal (2010) 4, 784–798 & 2010 International Society for Microbial Ecology All rights reserved 1751-7362/10 $32.00 www.nature.com/ismej

Upload: dinhdung

Post on 09-Sep-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

ORIGINAL ARTICLE

Genome characteristics of a generalist marinebacterial lineage

Ryan J Newton1, Laura E Griffin1, Kathy M Bowles1, Christof Meile1, Scott Gifford1,Carrie E Givens1, Erinn C Howard1, Eric King1, Clinton A Oakley2, Chris R Reisch3,Johanna M Rinta-Kanto1, Shalabh Sharma1, Shulei Sun1, Vanessa Varaljay3,Maria Vila-Costa1,4, Jason R Westrich5 and Mary Ann Moran1

1Department of Marine Sciences, University of Georgia, Athens, GA, USA; 2Department of Plant Biology,University of Georgia, Athens, GA, USA; 3Department of Microbiology, University of Georgia, Athens, GA,USA; 4Group of Limnology-Department of Continental Ecology, Centre d’Estudis Avancats de Blanes-CSIS,Catalunya, Spain and 5Odum School of Ecology, University of Georgia, Athens, GA, USA

Members of the marine Roseobacter lineage have been characterized as ecological generalists,suggesting that there will be challenges in assigning well-delineated ecological roles andbiogeochemical functions to the taxon. To address this issue, genome sequences of 32 Roseobacterisolates were analyzed for patterns in genome characteristics, gene inventory, and individual gene/pathway distribution using three predictive frameworks: phylogenetic relatedness, lifestyle strategyand environmental origin of the isolate. For the first framework, a phylogeny containing five deeplybranching clades was obtained from a concatenation of 70 conserved single-copy genes. Somewhatsurprisingly, phylogenetic tree topology was not the best model for organizing genomecharacteristics or distribution patterns of individual genes/pathways, although it provided somepredictive power. The lifestyle framework, established by grouping isolates according to evidencefor heterotrophy, photoheterotrophy or autotrophy, explained more of the gene repertoire in thislineage. The environment framework had a weak predictive power for the overall genome content ofeach strain, but explained the distribution of several individual genes/pathways, including thoserelated to phosphorus acquisition, chemotaxis and aromatic compound degradation. Unassembledsequences in the Global Ocean Sampling metagenomic data independently verified this global-scalegeographical signal in some Roseobacter genes. The primary findings emerging from thiscomparative genome analysis are that members of the lineage cannot be easily collapsed intojust a few ecologically differentiated clusters (that is, there are almost as many clusters as isolates);the strongest framework for predicting genome content is trophic strategy, but no single frameworkgives robust predictions; and previously unknown homologs to genes for H2 oxidation,proteorhodopsin-based phototrophy, xanthorhodpsin-based phototrophy, and CO2 fixation by FormIC ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) expand the possible mechanismsfor energy and carbon acquisition in this remarkably versatile bacterial lineage.The ISME Journal (2010) 4, 784–798; doi:10.1038/ismej.2009.150; published online 14 January 2010Subject Category: integrated genomics and post-genomics approaches in microbial ecologyKeywords: genome; marine; phylogeny; Roseobacter

Introduction

Since the discovery of their abundance in marinebacterioplankton communities two decades ago(Gonzalez and Moran, 1997), members of the marineRoseobacter lineage have emerged as importantmodel organisms for marine microbial ecology. Thegroup spans multiple described genera (at least 45),encompasses a comparatively large sequence varia-tion among 16S rRNA genes (up to 11%) and has a

poorly resolved within-taxon phylogeny (Buchanet al., 2005; Wagner-Dobler and Biebl, 2006; Brinkhoffet al., 2008). The recent availability of genomesequences (currently, 5 closed and 27 draft) fromcultured members of the Roseobacter lineage pro-vides a detailed inventory of the metabolic andecological capabilities of each strain (albeit limitedby the accuracy of annotation), a basis for compara-tive analyses among strains, and a means to examinepredictive frameworks for the lineage.

Genome sequences of other ocean microbes havebeen used to explore niches and resource partition-ing within taxa. Multiple genome sequences androbust phylogenies for Prochlorococcus have re-vealed that the distribution of ecologically impor-tant gene systems (for example, light harvesting

Received 5 October 2009; revised 30 November 2009; accepted 2December 2009; published online 14 January 2010

Correspondence: MA Moran, Department of Marine Sciences,University of Georgia, Marine Science Building, Athens, GA30602-3636, USA.E-mail: [email protected]

The ISME Journal (2010) 4, 784–798& 2010 International Society for Microbial Ecology All rights reserved 1751-7362/10 $32.00

www.nature.com/ismej

Page 2: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

(Garczarek et al., 2000; Hess et al., 2001; Bibby et al.,2003) and DNA repair mechanisms (Scanlan et al.,2009)) strongly correlate with the phylogeneticstructure of this genus (Rocap et al., 2003; Colemanand Chisholm, 2007). In addition, genome contentdifferences among both Prochlorococcus and Syne-chococcus strains have been linked to variations inthe environments from which the strains wereisolated (West and Scanlan, 1999; Johnson et al.,2006; Martiny et al., 2006; Palenik et al., 2006;Dufresne et al., 2008). For members of the Vibriona-ceae, small-scale differences in environmentalconditions based on microenvironment and seasonhave been shown to drive lineage adaptation (Huntet al., 2008) and presumably genome content. Thephylogenetic- and environment-based frameworksused to interpret data from studies such as thesehave facilitated the development of predictivecommunity structure models for marine microbes(Follows et al., 2007; Rabouille et al., 2007).

Members of the Roseobacter lineage have beencharacterized as ecological generalists (Moran et al.,2004, 2007; Polz et al., 2006). Although the firstcultured roseobacters were aerobic anoxygenicphototrophs (AAnPs) (Shiba et al., 1979), numerousheterotrophic strains have since been found(Blankenship et al., 1995; Shimada, 1995; Buchanet al., 2005). Cultured roseobacters have a surpris-ingly flexible suite of mechanisms for energy andcarbon acquisition, including carbon monoxide andhydrogen sulfide oxidation (King, 2003; Moranet al., 2004), and anaplerotic CO2 fixation (Sorokinet al., 2003; Moran et al., 2004; Swingley et al.,2007). Together with the observation that thislineage’s genomes are large and variable, a pictureof considerable trophic versatility among roseobac-ters has emerged (Buchan et al., 2005; Wagner-Dobler and Biebl, 2006; Brinkhoff et al., 2008).

The 32 Roseobacter genomes provide an unprece-dented opportunity to examine the scope of extantgene systems and to explore various ecological andevolutionary perspectives that might distinguishfunctionally differentiated clusters within this line-age. In this study, we consider three theoreticalecological/evolutionary frameworks as possible pre-dictors of the gene repertoires of the 32 Roseobacterstrains. As a robust Roseobacter phylogeny has yetto emerge from rRNA gene analysis (Buchan et al.,2005; Brinkhoff et al., 2008), we first develop a well-supported phylogeny for the lineage from aconcatenation of conserved, single-copy genes andwithin this phylogenetic structure we examine theevolutionary relationships as possible constraintson genome content and predictors of the geneticcapabilities of each strain. Next, we explore lifestylestrategy (heterotroph, photoheterotroph or auto-troph) as a possible driver of genome attributes (thatis, imposing or releasing bacteria from constraintson genome content). Finally, we ask whetherenvironmental conditions (defined here by thegeographical location of isolation) might best

explain the observed differences in genetic traits andthe retention or acquisition of specific gene systems.

Materials and methods

Genome sequencing, annotation and completenessindexThe 32 Roseobacter genomes publicly available asof 15 August 2008 were used in analyses (see Table 1for genome details). Sequencing and annotationmethods for Ruegeria pomeroyi DSS-3 (Moranet al., 2004; formerly Silicibacter), Ruegeria sp.TM1040, Jannaschia sp. strain CCS1 (Moran et al.,2007), Roseobacter denitrificans OCh 114 (Swingleyet al., 2007) and Dinoroseobacter shibae DFL12(Wagner-Dobler et al., 2009) are described else-where. The remaining genomes (Table 1) weresequenced and auto-annotated by the J Craig VenterInstitute as part of the Moore Foundation MicrobialGenome Sequencing Project (see http://moore.jcvi.org/moore/ for details).

In all, 5 of the 32 Roseobacter genomes have beenassigned closed genome status, and we used thesegenomes as the basis for our genome completenessindex. The protein sequences of 143 universalsingle-copy bacterial genes (Santos and Ochman,2004; Santos, personal communication) were usedin a BLASTp query against the five closed Roseo-bacter genomes; manual gene-calling based onalignment score, E-value, and contextual analysiswas used to determine the presence or absence ofgenes in the five closed genomes. Of these 143genes, 111 were determined to be unambiguouslypresent in all five genomes (see SupplementaryTable S1). The protein sequences of these 111 geneswere then used in BLASTp analysis against theremaining 27 genomes. The percent presence ofthese 111 genes in a single genome constituted thatgenome’s completeness index (Table 1).

Phylogenetic tree inferenceOut of 111 universal single-copy genes identified inthe 32 Roseobacter genomes, only genes that werecompletely sequenced in all genomes and had noambiguous start/stop sites were used in phylo-genetic analyses. These 70 genes (SupplementaryTable S1) were concatenated and aligned withClustalW in Geneious 4.0 (available from http://www.geneious.com) using Escherichia coli K12substrain MG1655 as the outgroup. The alignmentwas imported into ARB (Ludwig et al., 2004), whereit was heuristically adjusted, and a filter was createdto remove all positions containing gaps in thealignment. The resultant alignment of 25 316 posi-tions was used in subsequent phylogenetic recon-struction analyses in ARB (neighbor joining withpoint accepted mutation substitution matrix and100 bootstrap runs) and in RAxML (Stamatakiset al., 2008) at CIPRES (http://www.phylo.org;maximum likelihood analysis with 200 bootstrap

Comparison of 32 Roseobacter genomesRJ Newton et al

785

The ISME Journal

Page 3: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

Table

1R

ose

obacte

rgen

om

ech

ara

cte

rist

ics

Org

an

ism

Cla

de

aIs

ola

tion

sou

rce

(cate

gory

)P

hoto

trop

hy

gen

esb

Gen

om

esi

ze

(Mb)

rRN

Aop

ero

ns

(16S

/23S

)c

No.

of

con

tigs

Gen

om

ecom

ple

ten

ess

(%)d

G+

Ccon

ten

t(%

)

Clo

sed

gen

om

eD

inoro

seobacte

rsh

ibae

DF

L12

5P

roro

cen

tru

mli

ma,

Bay

of

Tokyo

(E)

AA

nP

4.3

52/2

6100

65

Jan

nasc

hia

sp.

CC

S1

5B

od

ega

Head

,U

SA

surf

ace

wate

r(P

)A

An

P4.4

01/1

2100

62

Rose

obacte

rd

en

itri

fican

sO

Ch

114

2E

nte

rom

orp

ha

lin

za,

Au

stra

lia

(E)

AA

nP

4.1

31/1

5100

58

Ru

eger

iap

om

ero

yi

DS

S-3

1C

oast

al

Georg

ia,

US

Asu

rface

wate

r(A

)N

on

e4.6

03/3

2100

64

Ru

eger

iasp

.T

M1040

1P

fiest

eri

ap

iscic

da,

Ch

esa

peake

Bay

(E)

Non

e4.1

55/5

3100

60

Dra

ftgen

om

eLokta

nell

avest

fold

en

sis

SK

A53

4N

ort

hA

tlan

tic

surf

ace

wate

r(A

)A

An

P3.0

61-1

p/1

14

99

65

Mari

tim

ibacte

ralk

ali

ph

ilu

sH

TC

C2654

Non

eS

arg

ass

oS

ea

10

mw

ate

r(A

)N

on

e4.5

31/1

46

99

64

Pela

gib

aca

berm

ud

en

sis

HT

CC

2601

3S

arg

ass

oS

ea

10

mw

ate

r(A

)R

uB

isC

O5.4

34/4

103

98

66

Ocean

ibu

lbu

sin

doli

fex

HE

L-4

52

Nort

hS

ea

10

mw

ate

r(A

)N

on

e4.1

13/3

105

100

59

Ocean

icola

bats

en

sis

HT

CC

2597

3S

arg

ass

oS

ea

10

mw

ate

r(A

)N

on

e4.4

41/1

23

99

66

Ocean

icola

gra

nu

losu

sH

TC

C2516

4S

arg

ass

oS

ea

10

mw

ate

r(A

)N

on

e4.0

44/4

85

100

70

Octa

decabacte

ran

tarc

ticu

s307

4M

cM

urd

oS

ou

nd

(H)

Xan

thorh

od

op

sin

4.8

92/2

58

100

54

Octa

decabacte

rarc

ticu

s238

4O

ffsh

ore

Dead

hors

e,

Ala

ska

(H)

Xan

thorh

od

op

sin

5.3

92/2

80

96

55

Ph

aeobacte

rgall

aecie

nsi

s2.1

01

Ulv

ala

ctu

ca,

Au

stra

lian

wate

rs(E

)N

on

e4.1

64/4

33

100

59

Ph

aeobacte

rgall

aecie

nsi

sB

S107

1P

ecte

nm

axim

us,

Sp

an

ish

wate

rs(E

)N

on

e4.2

34/5

24

100

59

Rh

od

obacte

rale

sbact

eri

um

HT

CC

2083

2C

oast

al

Ore

gon

10

mw

ate

r(P

)A

An

P4.0

22/2

20

99

53

Rh

od

obacte

rale

sbact

eri

um

HT

CC

2150

Non

eC

oast

al

Ore

gon

,su

rface

wate

r(P

)N

on

e3.5

82/2

25

98

49

Rh

od

obacte

rale

sbact

eri

um

HT

CC

2255

Non

eC

oast

al

Ore

gon

,10

mw

ate

r(P

)P

rote

orh

od

op

sin

4.8

10-1

p/0

70

96

38

Rh

od

obacte

rale

sbact

eri

um

Y4I

1C

oast

al

Georg

ia,

US

Aw

ate

r(A

)N

on

e4.3

34/4

63

99

64

Rose

obacte

rli

tora

lis

Och

149

2S

eaw

eed

(E)

AA

nP

4.6

81/1

27

99

57

Rose

obacte

rsp

.A

zwK

-3b

3E

stu

ary

Mon

tere

yB

ayw

ate

r(P

)A

An

P4.1

82/2

31

100

61

Rose

obacte

rsp

.C

CS

24

Bod

ega

Head

,U

SA

surf

ace

wate

r(P

)A

An

P3.5

01/1

11

99

55

Rose

obacte

rsp

.G

AI1

01

2C

oast

al

Georg

ia,

US

Aw

ate

r(A

)N

on

e4.2

54/4

67

99

58

Rose

obacte

rsp

.M

ED

193

1N

WM

ed

iterr

an

ean

1m

wate

r(A

)N

on

e4.6

51-1

p/1

-4p

19

100

57

Rose

obacte

rsp

.S

K209-2

-61

Ara

bia

nS

ea

O2-m

in.,

267

mw

ate

r(I

)N

on

e4.5

65/5

29

100

57

Rose

ovari

us

nu

bin

hib

en

sIS

M3

Cari

bbean

Sea

surf

ace

wate

r(A

)N

on

e3.6

72/2

10

100

63

Rose

ovari

us

sp.

217

3C

oast

al

En

gla

nd

surf

ace

wate

r(A

)A

An

P4.7

61-1

p/1

-1p

37

100

60

Rose

ovari

us

sp.

TM

1035

3P

fiest

eri

ap

iscic

da,

Ch

esa

peake

Bay

(E)

AA

nP

4.2

13/3

15

100

60

Ru

eger

iasp

.R

11

1D

eli

sea

pu

lch

rap

ath

ogen

,A

ust

rali

a(E

)N

on

e3.8

24/4

17

98

60

Sagit

tula

stell

ata

E-3

73

Coast

al

Georg

ia,

US

Aw

ate

r(A

)N

on

e5.2

62/2

39

98

65

Su

lfit

obacte

rN

AS

-14.1

2C

oast

al

Georg

ia,

US

Asu

rface

wate

r(A

)N

on

e4.0

04/4

27

100

60

Su

lfit

obacte

rsp

.E

E-3

62

Nort

hA

tlan

tic

surf

ace

wate

r(A

)N

on

e3.5

44/4

15

100

60

Abbre

via

tion

s:A

,A

tlan

tic

Ocean

;E

,E

ukary

ote

ass

ocia

ted

;H

,p

ola

rocean

(hig

hla

titu

de;

Arc

tic

or

An

tarc

tic);

I,In

dia

nO

cean

;P,

Pacif

icO

cean

.aD

efi

ned

by

the

ph

ylo

gen

eti

ctr

ee

show

nin

Fig

ure

1.

bG

en

om

es

were

defi

ned

as

aero

bic

an

oxygen

icp

hoto

trop

h(A

An

P)

ifth

egen

om

econ

tain

ed

the

pu

fop

ero

nan

dbacte

rioch

loro

ph

yll

a;as

non

eif

the

gen

om

econ

tain

ed

no

ligh

t-h

arv

est

ing

gen

es;

as

xan

tho/p

rote

orh

od

op

sin

ifth

egen

om

econ

tain

ed

the

specif

icrh

od

op

sin

gen

e;

an

das

Ru

Bis

CO

ifth

egen

om

econ

tain

ed

aR

uB

isC

Oh

om

olo

gan

dh

om

olo

gs

toth

eC

alv

in–B

en

son

–B

ass

ham

cycle

.cF

or

exam

ple

,1-1

pin

dic

ate

sth

at

this

org

an

ism

has

on

efu

lly

sequ

en

ced

rRN

Agen

ean

don

ep

art

iall

yse

qu

en

ced

rRN

Agen

e.

dG

en

om

ecom

ple

ten

ess

was

defi

ned

by

exam

inin

g111

un

ivers

al

gen

es

fou

nd

inall

clo

sed

Rose

obacte

rgen

om

es

(see

Mate

rials

an

dm

eth

od

s).

Comparison of 32 Roseobacter genomesRJ Newton et al

786

The ISME Journal

Page 4: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

runs and Jones–Taylor–Thornton (JTT) substitutionmodel). The best-fit maximum likelihood tree isreported along with bootstrap values from eachphylogenetic inference method.

Identification of orthologs and ecologically relevantgenesOrthologs among the 32 genomes were identifiedby sequential two-way reciprocal best-hit (RBH)analysis, beginning with the R. pomeroyi andRhodobacterales HTCC2255 genome comparisonand continuing by adding each of the remaining30 genomes one at a time. The RBH Basic LocalAlignment Search Tool (BLAST) thresholds wereset at E-value o10�5 and amino acid identity430%. The RBH results were subsequently com-piled into a single matrix containing the distributionof all shared genes and used for the genome contentcomparisons described below (See SupplementaryTable S2 for matrix). This relaxed ortholog defini-tion was used because an all-way RBH requirementwas unworkable for the large number of genomes,each containing gene families represented by multi-ple members. We tested whether the order of thesequential best-hit analysis resulted in substantialchanges in the ortholog matrix or the outcome of theanalyses (that is, by using a different order of addinggenomes in the pair-wise RBH), and found it did not.

In addition to whole-genome ortholog identifica-tion, a select group of ecologically relevant genes/genepathways was also identified using representativeprotein sequences of the target genes from a Roseo-bacter for which the gene functions had beenexperimentally verified. If no Roseobacter met thiscriterion, then a protein sequence was obtainedfrom the closest Roseobacter relative containing thedesired experimentally verified gene. All query proteinsequences were used in BLASTp analysis against theRoseobacter genome database (http://www.roseobase.org). BLAST E-values, gene neighborhoods andclusters of orthologous group assignments (Tatusovet al., 2003) were manually examined and used todetermine the presence or absence of these genesand pathways in each of the 32 genomes.

Classification schemesThe 32 isolate genomes were sorted into groupswithin each of the three frameworks. First, fivedeeply branching nodes in our phylogenetic infer-ence best-fit tree were chosen to distinguish isolategroups based on shared ancestry and were desig-nated Clades 1–5. Next, we categorized isolates intolifestyles based on their trophic status: hetero-trophic, photoheterotrophic (that is, heterotrophicbut likely subsidized by aerobic anoxygenic photo-trophy or rhodopsin-based phototrophy) or auto-trophic. Organisms were considered AAnPs basedon the presence of the puf operon and genes forthe synthesis of bacteriochlorophyll a; they were

considered rhodopsin-supplemented photohetero-trophs based on the presence of gene orthologs forproteorhodopsin or xanthorhodopsin; and they wereconsidered autotrophs based on the presence ofRuBisCO and the Calvin–Benson–Bassham path-way. Although these designations were made fromdraft genome sequences for many strains, the highgenome completeness index suggests they arelargely correct. Finally, isolates were classified intoone of five broad environmental categories based onthe source of isolation: Pacific Ocean, AtlanticOcean, Indian Ocean, polar oceans or eukaryote-associated (Table 1).

Identification of genes in the Global Ocean SamplingGlobal Ocean Sampling (GOS) sample sites, sam-pling procedures and sequencing methods aredescribed elsewhere (Rusch et al., 2007; Yoosephet al., 2007). A subset of Roseobacter proteinsequences representing each of the major biogeo-chemical pathways and processes that we examined(Supplementary Table S3) was used in a BLASTpquery against the unassembled GOS data set at theCommunity Cyberinfrastructure for AdvancedMarine Microbial Ecology Research and Analysis(CAMERA) (Seshadri et al., 2007). Gene hits fromthe GOS samples were retained for further analysisas potential orthologs at E-value cutoffs rangingbetween 10�80 and 10�20, depending on the gene.Paired reads were then removed before the resultantmatches were used in BLAST analysis against theAll Prokaryotic Proteins (P) database. Only genematches that had a best hit to a gene in a Roseobactergenome were retained for further analysis. Finally,protein sequences from the Roseobacter-like GOSmatches underwent a BLASTp query at GenBank,and were eliminated if their top alignment scoreswere to proteins with a different annotated functionthan the original query protein.

To compare gene counts between oceans, theRoseobacter-like metagenomic sequences obtainedfrom the GOS data underwent several normaliza-tions. Counts for functional genes retrieved at eachsample location were size-normalized to the lengthof the recA gene from E. coli K12 substrain MG1655to account for effects of size on the probability ofsampling (Howard et al., 2008). The number ofRoseobacter genome equivalents for each samplelocation was then calculated by averaging size-normalized Roseobacter-like gene counts of theuniversal single-copy genes recA and rpoB. Toestimate per-cell frequency for each examinedRoseobacter gene (listed in Supplementary TableS3), the sample gene counts were summed by oceanbasin (Atlantic, Pacific and Indian) and divided bythe number of Roseobacter genome equivalents forthat basin. Only coastal and open ocean GOS samplesites were considered, with estuaries, embayments,lagoon reefs, fringing reefs, freshwater, mangroves,coral reefs, hypersaline lagoons, warm seeps and

Comparison of 32 Roseobacter genomesRJ Newton et al

787

The ISME Journal

Page 5: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

harbors excluded from the analyses (SupplementaryTable S4).

The clade distribution among ocean basins wasdetermined using the recA gene sequence. Afterretrieving Roseobacter-like RecA sequences byBLASTp analysis against the GOS database atCAMERA (as described above), each individualprotein sequence was used as a query sequence ina subsequent BLASTp analysis against all 32genomes at Roseobase (http://www.roseobase.org).The Roseobacter-like RecA sequences from GOSwere then assigned to a clade according to their bestmatch among the 32 genomes, and the occurrencesof each clade were summed across samples in eachocean basin.

Statistical analysesPatterns of ortholog distribution among the 32genomes were evaluated using the Bray–CurtisIndex of Similarity (Legendre and Legendre, 1998).The 32 genomes contained a total of 31 874orthologs. Similarities between genomes includeall orthologs in this matrix, so that both the sharedpresence and shared absence of a gene are taken intoaccount in the similarity calculation. This similaritymatrix was used to create a hierarchical clusteringdendrogram based on complete linkage grouping(that is, furthest neighbor analysis). An analysis ofsimilarity (ANOSIM) was used to test for significantdifferences among a priori assigned genome groupsbased on phylogenetic clade, trophic strategy orgeographical isolation location. The multivariateanalyses were performed using the statistical pack-age PRIMER 5 for Windows v. 5.2.7.

The average nucleotide identity was obtained fortwo Roseobacter strain comparisons, Phaeobactergallaeciensis 2.10 with P. gallaeciensis BS107 andD. shibae DFL12 with P. gallaeciensis 2.10 accordingto the method described by Goris et al. (2007). Thesecomparisons were chosen to bracket the amount ofsequence heterogeneity observed and to providecontext for our ortholog similarity comparisons.

Significance of gene distributions between anytwo assigned groups (for example, between twoclades, between two trophic strategies or betweentwo ocean basins) was assessed with a binomialdistribution d-score test (Markowitz et al., 2008).

Results and discussion

Because many of the 32 Roseobacter genomes are indraft status, we developed a completeness indexbased on the presence or absence of 111 universalsingle-copy genes. The lowest genome completenessindex obtained was 96% (for Octadecabacter arcticus238 and Rhodobacterales bacterium HTCC2255; 107out of 111 presumed universal genes were repre-sented), and 18 of 32 genomes had a completenessindex of 100% (Table 1). We therefore considered all

draft genomes to be good representations of theseorganisms’ gene content.

Phylogenetic inference and clade distributionPrevious phylogenetic reconstructions of the Roseo-bacter lineage using 16S rRNA gene relationshipshave led to the identification of subgroups withinthe lineage (Buchan et al., 2005; Brinkhoff et al.,2008). However, many of the nodes, especially thosedistinguishing deep branching points in thesephylogenies, do not have statistical support, andtherefore do not provide clear phylogenetic relation-ships for the members of this lineage. We tookadvantage of the genome sequence data to constructan alignment from the concatenation of 70 con-served single-copy genes; this alignment was sub-sequently used in phylogenetic tree inference (seeSupplementary Table S1 for gene list and Supple-mentary Figure S1 for concatenated gene, 16S rRNAgene and 23S rRNA gene tree comparisons). Theresultant tree topology suggested there are fivedeeply branching clades within the Roseobacterlineage (Figure 1). Three of the presumed roseo-bacters, Maritimibacter alkaliphilus HTCC2654,Rhodobacterales HTCC2150 and RhodobacteralesHTCC2255, fell outside these clades. Most memberswithin a single genus clustered together on the tree,although the placement of two members of thegenus Oceanicola into different clades suggeststhat a taxonomic reclassification may be neededfor some isolates.

Buchan et al. (2005) identified 13 major sequenceclusters based on 16S rRNA gene sequences withinthe Roseobacter lineage. Twelve of the 16S rRNA-based clusters can be mapped onto our 70-genephylogeny (data not shown). Clade 1 contains the16S rRNA gene sequence clusters RGALL, RATLand TM1040. Clade 2 contains sequence clustersANT9093, OBULB, SPON and AS-21. Clade 3contains sequence cluster CHAB-I-5. Clade 4 con-tains sequence clusters AS-26, DG1128, DC5-80-3(RCA cluster) and OCT. Clade 5 contains nopreviously identified sequence clusters, and clusterNAC11-7 is not covered by any of the clades inour study.

Two of the most abundant Roseobacter 16S rRNAgene sequence clusters recovered from marinehabitats do not have closely associated sequencedgenomes (Buchan et al., 2005), and thus are notincluded in the 70-gene phylogenetic tree. The first,the DC5-80-3 or RCA cluster, has often beenobserved as the most abundant Roseobacter groupin polar and temperate oceans (Brinkhoff et al.,2008). 16S rRNA genes from RCA distantly groupwith those from genomes in Clade 4 (Figure 1), aclade that harbors all the sequenced polar Roseo-bacter isolates thus far. A second abundant marinesequence cluster, NAC11-7, is frequently the domi-nant Roseobacter taxon found during phytoplanktonblooms (Buchan et al., 2005; West et al., 2008). 16S

Comparison of 32 Roseobacter genomesRJ Newton et al

788

The ISME Journal

Page 6: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

rRNA genes from the NAC11-7 group did not clusterwith those of any clade, and were most related tothat of Rhodobacterales HTCC2255 (data notshown), which also fell outside the five cladesestablished by the 70-gene phylogeny (Figure 1).

Roseobacter-like recA genes, a robust marker forbacterial phylogeny (Eisen, 1995), were obtainedfrom the GOS data set by BLASTp analysis (seeMaterials and methods) to ascertain which isolategenomes are most representative of wild roseo-bacters in surface ocean water. When the set ofRoseobacter-like RecA GOS sequences was used ina best-match BLASTp query against the 32 genomes,

hits to all five clades were found throughout themajor ocean habitats surveyed (Figure 2). In general,the distribution of clades is not remarkably differentbetween the Atlantic and Pacific, or Indian oceans(Figure 2). A large percentage of the RoseobacterRecA sequences from the GOS appear most closelyrelated to one of the three singleton genomes (that is,not belonging to one of the five defined clades). Thisfinding, along with the lack of genomic data for theRCA and NAC11-7 sequence clusters, suggests thatrepresentation of oceanic Roseobacter genomescould be improved with additional genomesequences.

Maritimibacter alkaliphilus HTCC2654

Roseobacter sp. AzwK-3b

Roseovarius nubinhibins ISM

Pelagibaca bermudensis HTCC2601

Loktanella vestfoldensis SKA53

Roseobacter sp. CCS2

Octadecabacter antarcticus 307

Octadecabacter arcticus 238

Oceanicola granulosus HTCC2516

Dinoroseobacter shibae DFL12

Jannaschia sp. CCS1

Rhodobacterales bacterium HTCC2150

Rhodobacterales bacterium HTCC2255

Escherichia coli K12

0.1

100 (100)

100 (100)

100(100)

80 (70)

100 (100)

86 (100)

100 (100)

100 (100)100 (100)

100 (100)

100 (100)

100 (100)

87 (67)

100 (86)

100 (100)100 (100)

100 (100)

100 (100)

100 (92)

100 (60)

100 (77)

100 (100)

100 (100)

100 (100)

100 (100)

100 (84)

77 (70)

100 (*)

100(100)100 (100)

Phaeobacter gallaeciensis 2.10

Phaeobacter gallaeciensis BS107

Ruegeria sp. R11

Roseobacter sp. MED193

Roseobacter sp. SK209-2-6

Rhodobacterales bacterium Y4I

Ruegeria sp. TM1040

Ruegeria pomeroyi DSS-3

Sulfitobacter sp. NAS-14.1

Sulfitobacter sp. EE-36

Roseobacter sp. GAI101

Oceanibulbus indolifex HEL-45

Roseobacter denitrificans OCh 114

Roseobacter litoralis Och 149Rhodobacterales bacterium HTCC2083

Roseovarius sp. TM1035

Roseovarius sp. 217

Sagittula stellata E-37

Oceanicola batsensis HTCC2597

Clade 1

Clade 2

Clade 3

Clade 4

Clade 5

Figure 1 A consensus maximum likelihood tree of the 32 sequenced Roseobacter genomes. The alignment for tree inference was createdfrom a concatenation of 70 universal single-copy genes contained in each of the Roseobacter genomes and in E. coli K12, which was usedas an outgroup. Bootstrap values of 450% for the maximum likelihood best-fit tree (200 iterations) and neighbor-joining tree (100iterations) are listed at each node. The neighbor-joining bootstrap values are listed in parentheses. (*) demarcates nodes where theneighbor-joining tree did not agree with the maximum likelihood tree. Designated Clades 1–5 are listed to the right of the tree. The scalebar represents 10% sequence divergence.

Comparison of 32 Roseobacter genomesRJ Newton et al

789

The ISME Journal

Page 7: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

Genome content related to phylogenyWe examined genome characteristics (for example,GþC content, rRNA copy number, genome size;Table 1) and gene content of the 32 roseobacters

within the context of the five-clade phylogeneticframework. Whole genomic content comparisons(based on distribution patterns of 31 874 orthologs;see Materials and methods; Supplementary TableS2) indicate weak but significant genome clusteringby clade (ANOSIM R¼ 0.410, Pp0.001, threeroseobacters not assigned to clades were excludedfrom the statistical test), with the within-cladesimilarity in gene repertoire for Clades 1, 2 and 3driving this pattern (Figure 3). Generally, neither theexamined genome characteristics nor the examinedgene distributions segregate strongly based onphylogenetic relatedness (Table 1 and Figure 4).Some exceptions include a greater mean rRNAoperon copy number for Clade 1 than for otherclades (t-test, Pp0.01); a strictly heterotrophiccomposition of Clade 1; a genetic potential forbiotin synthesis in Clade 1 (vitamin synthesisin bacteria has been identified as important inbacterial–phytoplankton relationships; Croft et al.,2005; Wagner-Dobler et al., 2009); a lack of Lux-typequorum sensing genes in Clades 4 and 5; a geneticpotential for H2 oxidation unique to Clade 3; theabsence of sulfur oxidation genes in Clade 4genomes; and absence of the ppk1 gene for polypho-sphate biosynthesis in Clade 1 (whereas all isolatesoutside Clade 1 have this gene).

The lack of a strong segregation by phylogeneticassignment for genome content (Figure 3) or ecolo-gically relevant gene systems (Figure 4) suggests theimportance of gene acquisition by horizontal trans-fer originating either within or outside the lineage.Other evolutionary processes known to shapegenome content (selective gene loss, gene duplica-tion, gene genesis; Snel et al., 2002) are no doubtimportant in this lineage, but are mechanisms lesslikely to produce the observed patchy distributionof ecologically relevant genes in the Roseobacterisolates relative to their phylogenetic reconstruc-tion. Although the rates of gene transfer within theRoseobacter lineage is not known, the occurrence in30 of 32 genomes of gene transfer agent operons(Figure 4), an unusual system for moving chromo-somal fragments to close relatives (Biers et al., 2008;Zhao et al., 2009), suggests a mechanism for shapingRoseobacter gene content through frequent within-lineage gene transfers.

Between-genome similarities were generally higherfor Roseobacter isolates in the same genus (forexample, P. gallaeciensis BS107 and P. gallaeciensis2.10; R. denitrificans Och114 and Roseobacterlitoralis Och149; Figure 3) than for roseobactersbelonging to different genera. Nonetheless,blurred gene content boundaries among deeplybranching clades would impose a requirementof dozens of taxonomically shallow groups (forexample, species level) to accurately representRoseobacter contributions to ecosystem functions,thus making the phylogenetic framework acumbersome approach for defining ecologicalsubgroups.

Atlantic

10%4%

23%

21%

14%

28%

Indian4%

14%

10%

12%

31%

29%

Pacific2% 2%

27%

6%

23%

23%8%

8%

Clade 1

Clade 2 Clade 4

Clade 3 Clade 5

Maritimibacter alkaliphilus HTCC2654

Rhodobacterales bacterium HTCC2150

Rhodobacterales bacterium HTCC22255

Figure 2 Clade distribution among ocean basins calculated fromBLASTp best match of Roseobacter recA genes from the GlobalOcean Sampling (GOS). Atlantic n¼ 71, Pacific n¼48 and Indiann¼ 51.

Comparison of 32 Roseobacter genomesRJ Newton et al

790

The ISME Journal

Page 8: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

Genome content related to trophic strategyOwing to the significant versatility in mechanismsfor obtaining carbon and energy previously observedfor this group (Buchan et al., 2005; Moran et al.,2007), we hypothesized that an organism’s trophicstrategy could impose or remove constraints ongenome content. For example, the ability to usesunlight for energy generation, which is widelydistributed within the Roseobacter lineage, mightmitigate an organism’s energy limitations in theoligotrophic marine environment, while imposingrequirements for metals and cofactors specific tophototrophy. Similarly, the ability to fix inorganiccarbon might reduce an organism’s requirements forsubstrate transporters. If such interplay betweentrophic strategy and functional gene repertoireexists, then significant and predictable differencesin genome content should be evident betweenlifestyle categories.

Thirteen of the 32 roseobacters have genes forphotoheterotrophy (10 AAnPs, 3 rhodopsin-contain-ing), whereas one has RuBisCO. The remaining 18are considered heterotrophs here (although somemay obtain energy from inorganic compounds suchas CO and H2S; Moran et al., 2007) (Table 1).

Genome ortholog comparisons suggest moderateand significant differences in genome contentamong these groups (ANOSIM R¼ 0.545,Pp0.001). The strength of these differences doesnot stem solely from the very unique rhodopsin-containing genomes (Figure 3). The differences alsoare not solely due to the presence of light-harvest-ing-related genes shared by the AAnP genomes orrhodopsin-containing genomes, as removal of therhodopsin genes and 29 genes specific for AAnPlight harvesting resulted in a similar level ofclustering by trophic strategy (ANOSIM R¼ 0.522,Pp0.001). The lifestyle framework accurately pre-dicts the gene repertoire groupings at similaritylevels X58% (Figure 3; red dashed line), whichrepresents the gene content relationships for 19 ofthe 32 genomes and is the best predictor of the threeframeworks analyzed.

The majority of non-light-harvesting gene orpathway-related differences among strains can betraced to hypothetical proteins unique to the AAnPsor heterotrophs, as well as to a number of genesencoding transcriptional regulators and amino aciduptake and synthesis systems (Figure 5). Althoughtrophic strategy was a good predictor of an isolate’s

Sim

ilarit

y

100

80

60

40

20

AtlanticIndian

PacificPolar

Euk. Assoc.

HeterotrophAAnP

RhodopsinRuBisCO

Clade 1Clade 2Clade 3Clade 4Clade 5

No Clade

Oct

adec

abac

ter

arct

icus

238

Oct

adec

abac

ter

anta

rctic

us 3

07

Rho

doba

cter

ales

HT

CC

2255

Mar

itim

ibac

ter

alka

liphi

lus

HT

CC

2654

Pel

agib

aca

berm

uden

sis

HT

CC

2601

Sag

ittul

a st

ella

ta E

-37

Ros

eova

rius

nubi

nhib

ins

ISM

Ros

eoba

cter

sp.

Azw

K-3

b

Ros

eova

rius

sp. 2

17

Ros

eova

rius

sp. T

M10

35

Sili

ciba

cter

sp.

TM

1040

Rue

geria

sp.

R11

Pha

eoba

cter

gal

laec

iens

is 2

.10

Pha

eoba

cter

gal

laec

iens

is B

S10

7

Sili

ciba

cter

pom

eroy

i DS

S-3

Rho

doba

cter

ales

Y4I

Ros

eoba

cter

sp.

SK

209-

2-6

Oce

anib

ulbu

s in

dolif

ex H

EL-

45

Ros

eoba

cter

sp.

ME

D19

3

Ros

eoba

cter

sp.

GA

I101

Sul

fitob

acte

r sp

. EE

-36

Sul

fitob

acte

r sp

. NA

S14

.1

Oce

anic

ola

bats

ensi

s H

TC

C25

97

Oce

anic

ola

gran

ulos

us H

TC

C25

16

Din

oros

eoba

cter

shi

bae

DF

L12

Ros

eoba

cter

den

itrifi

cans

OC

h114

Ros

eoba

cter

lito

ralis

Och

149

Rho

doba

cter

ales

HT

CC

2083

Rho

doba

cter

ales

HT

CC

2150

Jann

asch

ia s

p. C

CS

1

Lokt

anel

la v

estfo

lden

sis

SK

A53

Ros

eoba

cter

sp.

CC

S2

Figure 3 Complete linkage (that is, furthest neighbor) cluster analysis illustrating the gene content similarities among the genomes.Bray–Curtis similarities between all genome pairs were calculated from a matrix containing all 31 874 genes identified in the 32 genomes.In this manner, the similarity calculation was based on both the shared presence and shared absence of genes. For context, theP. gallaeciensis 2.10 to P. gallaeciensis BS107 comparison is 87.4% similar in this analysis compared with an average nucleotide identity(ANI) (Goris et al., 2007) of 97.0%. The D. shibae DFL12 to P. gallaeciensis 2.10 comparison is 46.7% similar here compared to an ANI of70.4%. The three framework groups of each isolate are illustrated by the shape and color pattern depicted at the tips of the clusterdiagram. The phylogenetic clade framework is represented by squares; the lifestyle framework is represented by triangles; and theenvironment framework is represented by circles. Unique combinations of these three frameworks are illustrated with colored boxesaround the names of isolates. Breaks in the three framework groupings are noted by an asterisk next to the strain name. Nodes below thedashed red line indicate groups with X58% similarity.

Comparison of 32 Roseobacter genomesRJ Newton et al

791

The ISME Journal

Page 9: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

Pha

eoba

cter

gal

laec

iens

is 2

.10

Pha

eoba

cter

gal

laec

iens

is B

S10

7

Rue

geria

sp.

R11

Ros

eoba

cter

sp.

ME

D19

3R

oseo

bact

er s

p. S

K20

9-2-

6

Rho

doba

cter

ales

Y4I

Sili

ciba

cter

sp.

TM

1040

Sili

ciba

cter

pom

eroy

i DS

S-3

Sul

fitob

acte

r sp

. NA

S14

.1

Sul

fitob

acte

r sp

. EE

-36

Ros

eoba

cter

sp.

GA

I101

Oec

anib

ulbu

s in

dolif

ex H

EL-

45

Ros

eoba

cter

den

itrifi

cans

OC

h114

Ros

eoba

cter

lito

ralis

Och

149

Rho

doba

cter

ales

HT

CC

2083

Ros

eova

rius

sp. T

M10

35

Ros

eova

rius

sp. 2

17

Ros

eoba

cter

sp.

Azw

K-3

b

Ros

eova

rious

nub

inhi

bins

ISM

Pel

agib

aca

berm

uden

sis

HT

CC

2601

Sag

ittul

a st

ella

ta E

-37

Oce

anic

ola

bats

ensi

s H

TC

C25

97

Lokt

anel

la v

estfo

lden

sis

SK

A53

Ros

eoba

cter

sp.

CC

S2

Oct

adec

abac

ter

arct

icus

238

Oct

adec

abac

ter

anta

rctic

us 3

07

Oce

anic

ola

gran

ulos

us H

TC

C25

16

Din

oros

eoba

cter

shi

bae

DF

L12

Jann

asch

ia s

p. C

CS

1

Mar

itim

ibac

ter

alka

liphi

lus

HT

CC

2654

Rho

doba

cter

ales

HT

CC

2150

Rho

doba

cter

ales

HT

CC

2255

Isolation Source E E E A I A E A A A A A E E P E A P A A A A A P H H A E P A P PRelated to Defined Trophic Strategy

Photosynthetic enter (puf) • • • • • • • • • •Proteorhodopsin •Xanthorhodopsin • •CO2 Fixation •

Gene Transfer Agents • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •H2 Oxidation • • •Aromatics Degradation

B-ketoadipate • • • • • • • • • • • • • • • • • • • • • • • •Gentisate pathway • • • • •Benzoate • • • • •Phenylacetic acid • • • • • • • • • • • • • • • • • •Homoprotocatechuate • • • • • • • • • • • • • •Homogenisate pathway • • • • • • • • • • • • • • • • • • • • • •

Carbon Monoxide UtilizationHigh affinity (OMP-type) • • • • • • • • • • •Low-affinity (BMS-type) • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

C1 Compound UtilizationC1 incorporation (serine) • • • • • • • • • • • • •MeOH oxidation • • • • • • •TMA oxidation • • • • • • • • • • •Formaldehyde oxidation • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Formate oxidation • • • • • • • • • • • • • • • • • • • • • • • • • •

C2 Compound UtilizationEthylmalonyl pathway • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Glyoxylate shunt •

Motility, Sensing, and AttachmentChemotactic ability • • • • • • • • • • • • • • • •Motility • • • • • • • • • • • • • • • • • • • • • •Flp pilus (Type IV) • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Lux quorum sensing • • • • • • • • • • • • • • • •VirB system – Type IV secretion • • • • • • • • • • • • • •

Secondary Metabolite ProductionAntibiotic production (TDA) • • •NRPS/PKS • • • • • • • • • • • • • • •

Vitamin RelatedBiotin synthesis • • • • • •Biotin uptake • • • • • • • • • • • • • • • • • • • • • • • • • •Cobalamin synthesis • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Thiamine synthesis • • • • • • • • • • • • • • • • • • • • • •

Iron RelatedHeme uptake • • • • • • • • • • • • • • • • •Siderophore uptake • • • • • • • • • • • • • •Fe

3+ uptake • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

Fe-S synthesis • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Nitrogen Related

Dissimilatory nitrate reduction • • •Periplasmic dissim. nitrate reduction • •Assimilatory nitrate reduction • • • • • • • • • • • • • •Dissimilatory nitrite reduction • • • • • • • • • • •Assimilatory nitrite reduction • • • • • • • • • • • • • •Nitric oxide reduction • • • • • • • • • •Nitrous oxide reduction • • • • • • • •Hydroxylamine oxidation •Nitroalkane denitrification • • • •Carbamate kinase • • •Glutamine synthetase • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Glut/leu/phenyl/val dehydrogenase • • • • • • • • • • • •Urease • • • • • • • • • • • • • • • • • • • • • • • • • • •

Phosphorus RelatedAlkaline phosphatase • • • • • • • • • • • • • • • • • • •High affinity Pi transport • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •Low affinity Pi transport • • • • • • • • • • • • • • • • • • • • • • • • • • •Phosphonate utilization • • • • • • • • • • • • • • • • • • • • • • • •Polyphosphate storage (ppk1) • • • • • • • • • • • • • • • • • • • • • • •

Sulfur RelatedDMSP demethylase(dmdA) • • • • • • • • • • • • • • • • • • • • • •DMSP lyase (dddL) • • • • • • •DMSP cleavage (dddD) • • • •DMSP cleavage (dddP) • • • • • • • • • • • • • • • • • • • •Sulfur oxidation (sox) • • • • • • • • • • • • • • • • • • • • • • •

Clade 1

Clade 2

Clade 3

Clade 4

Clade 5

Figure 4 A matrix depicting the presence of select genes or gene pathways in the 32 Roseobacter genomes arranged and color-coded by clade. Acolored box containing a dot indicates the presence of the gene/pathway. An ultrametric tree has been placed above the gene matrix for reference.Isolation source indicates the region where the Roseobacter strain was isolated and is coded as: A¼Atlantic Ocean, E¼Eukaryote Associated,H¼polar oceans (high latitude), I¼ Indian Ocean and P¼Pacific Ocean. Gene/pathway abbreviations are as follows: NRPS/PKS, non-ribosomalpeptide synthetase/polyketide synthase; Glut/leu/phenyl/val dehydrogenase, glutamate/luecine/phenylalanine/valine dehydrogenase.

Comparison of 32 Roseobacter genomesRJ Newton et al

792

The ISME Journal

Page 10: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

gene repertoire, only a few of the ecologicallyrelevant gene systems we examined were strictlydifferentiated according to this framework. The fiveC1 utilization pathways identified in Roseobactergenomes (serine cycle, methanol oxidation, tri-methylamine oxidation, formaldehyde oxidationand formate oxidation) had a 70% occurrence ratein AAnP genomes (that is, 35 out of the 50 possibleoccurrences if all 10 AAnP genomes had all fivepathways), but only a 42% occurrence rate in the 19heterotrophs (only 40 out of 95 possible occur-rences; d-score, Pp0.01). The heterotrophs tendedto have more genes for six identified aromaticdegradation pathways (b-ketoadipate, gentisate,benzoate, phenylacetic acid, homoprotocatechuateand homogenisate) with a 60% occurrence rate (68out of 114 possible occurrences) compared with33% in AAnPs (20 of 60 possible occurrences;d-score, Pp0.01). Compared with the genomes ofthe other groups, the rhodopsin-containing genomesshared few orthologs that distinguished them as acoherent group (data not shown).

As noted previously (Buchan et al., 2005; Moranet al., 2007) and strongly reinforced in this analysis,Roseobacter genomes exhibit a remarkably versatilesuite of mechanisms for energy and carbon acquisi-tion. Along with the presence of genes for oxidizingcarbon monoxide and hydrogen sulfide (King, 2003;Moran et al., 2004), we found evidence for energygeneration by H2 oxidation in Roseovarius sp.TM1035, Pelagibaca bermudensis HTCC2601, andin Sagittula stellata E-37, proteorhodopsin- (Rhodo-bacterales sp. HTCC2255) and xanthorhodopsin-based (O. antarcticus 307 and O. arcticus 238)phototrophy, and CO2 fixation based on the presenceof a Form IC RuBisCO and homologs to Calvin–Benson–Bassham cycle genes in P. bermudensisHTCC2601 (Figure 4). This emerging picture ofhigh trophic versatility among cultured roseobacters

(Buchan et al., 2005; Wagner-Dobler and Biebl,2006; Brinkhoff et al., 2008) is in accord withrecent shifts away from a perception of marinebacterioplankton communities consisting largely ofcanonical photosynthetic and heterotrophic cells(Karl, 2002).

The rhodopsin-containing genomesThe three rhodopsin-containing genomes harboredthe most unique genome content of any of theisolates (Figure 3). The proteorhodopsin-containingRhodobacterales bacterium HTCC2255 gene contentwas unique because it consisted of only 2197 genes,far fewer than any the other genome. The twoxanthorhodpsin-containing isolates, O. arcticus andO. antarcticus (which are also the only two polarocean isolates), are clearly part of the Roseobacterlineage (Figure 1) but possess the greatest number ofunique genes among all isolates (2230 genes for O.arcticus and 1822 genes for O. antarcticus; 32 isolatemean¼ 617 and s.d.¼ 437). The majority of theseunique genes were annotated as phage or transpo-sase genes with 65 and 52 phage gene annotationsand 935 and 574 transposase gene annotations for O.arcticus 238 and O. antarcticus 307, respectively;these numbers are extremely high compared withthose in the other Roseobacter genomes (phagemean¼ 21 and s.d.¼ 10; transposase mean¼ 52and s.d.¼ 38).

Genome content related to ocean environmentEnvironmental properties are potential drivers ofmarine bacterial genome evolution by selecting forniche-specific genetic capabilities. For example,the marine cyanobacteria Procholorococcus andSynechococcus exhibit gene content patterns thatcorrelate well with the geographical locations from

AAnP

Hypothetical Hypothetical

OtherOther

MotilityInorganic Ion

Coenzyme Lipids Energy Secondary Metabolites

TRAP Transporters

n=140

Transcription

Amino Acid

Aerobic Anoxygenic Photosynthesis

Heterotroph

Secondary Metabolites

Signal Transduction

Posttranslation Modification

TRAP Transporters Replication/Recombination/Repair

Energy Coenzyme

Cell wall/Membrane

n=122

Transcription

Phenylacetic acid degradation

Lipids

Amino Acid

Figure 5 Relative abundance of ortholog groups that are overrepresented in the isolate genomes from a particular lifestyle strategy. Anortholog was considered overrepresented when it was X50% more prevalent in the genomes from one lifestyle strategy than the other.The overrepresented orthologs were grouped into functional categories whose relative percent abundance is depicted. The rhodopsin-containing and RuBisCO-containing lifestyle groupings were not considered because of the low number of genomes in these categories.

Comparison of 32 Roseobacter genomesRJ Newton et al

793

The ISME Journal

Page 11: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

which they were isolated (Rocap et al., 2003; Paleniket al., 2006; Zwirglmaier et al., 2008; Martiny et al.,2009). For the third framework, we divided theRoseobacter strains into five broad environmentalcategories that resulted in the following distribution:15 isolates from the Atlantic Ocean, 1 isolate fromthe Indian Ocean, 6 isolates from the Pacific Ocean,2 isolates from the polar oceans and 8 isolates thatwere cultured in association with eukaryotic organ-isms (Table 1). We hypothesized that characteristicsdistinguishing major environments (for example,access to nutrients, differences in temperature)would exert a detectable influence on gene patternsin the Roseobacter genomes.

There was a significant but very weak relationshipbetween genome content and environmental originof the isolate (ANOSIM R¼ 0.296, P-value p0.002),although if only the Atlantic Ocean and PacificOcean isolates were compared, this relationship wasstronger (ANOSIM R¼ 0.398, P-value p0.002).Examination of ortholog patterns suggests that therelationship is based on the genomic distribution ofmotility genes, with 33% occurrence in PacificOcean isolates compared with 73% in AtlanticOcean isolates; chemotaxis genes, with 17% occur-rence in Pacific Ocean isolates compared with 47%in Atlantic Ocean isolates; denitrification systems,with 5% in Pacific Ocean isolates compared with29% in Atlantic Ocean isolates; phosphorus uptakesystems known to function at low phosphateconcentrations (alkaline phosphatases, high-affinityphosphate uptake and phosphonate uptake), with50% occurrence (12 out of 24 possible) in PacificOcean isolates compared with 85% (51 out of60 possible) for Atlantic Ocean isolates; andaromatic carbon degradation pathways (mixedocean basin patterns depending on the specificpathway) (Figure 4, all comparisons d-score P-valuep0.01). There were also a number of unique carbonand ion transporters, amino-acid metabolism genes,and transcription regulators restricted to each oceanbasin, and a large suite of unique genes shared bythe two polar isolates (data not shown). We testedwhether the higher frequency of coastal strainsamong the Pacific isolates compared with theAtlantic (Table 1) was the basis for the apparentocean basin pattern, but found it not to be thecase whether comparing whole-genome orthologpatterns (ANOSIM R¼ 0.031, P¼ 0.23 for coastal vsopen ocean isolate comparison) or individual genesystems (Supplementary Figure S2).

Geographical patterns in the GOS data setDespite the many factors that might obscure large-scale environmental imprints (including variedisolation methods, isolation dates spanning severaldecades and sparse spatial coverage), the oceanbasin of isolation seemingly had predictive powerfor the distribution of select genes/pathways amongthe Roseobacter genomes. To determine whether this

apparent grouping of genome content by geographi-cal origin applies broadly to populations of roseo-bacters in the world oceans, we probed the GOS dataset for similar environmental patterns. As the othertwo frameworks (phylogeny and lifestyle strategy)require assembled genomes, it is not possible to testfor these among the GOS Roseobacter populations.

Homologs to genes listed in Supplementary TableS3 were identified in the GOS peptide sequencedatabase (which currently does not include polarocean metagenomic data). They were designated asRoseobacter homologs if they had greatest similarityto a gene in a Roseobacter genome in subsequentBLASTp query analysis against all available bacterialgenome sequences (the CAMERA ‘All ProkaryoticProteins (P) database’). Many of the same patternsin gene distribution found for cultured roseo-bacters were evident in the metagenomic analysis(Figure 6a). Most notable was that all phosphorusacquisition systems known to function at lowphosphate concentrations (alkaline phosphatases,high-affinity phosphate uptake and phosphonateuptake) were much more abundant in wild roseo-bacters from the Atlantic Ocean, where the meanphosphate concentration is lower, than for either theIndian or Pacific Ocean (mean phosphate concen-tration is 0.06 mM for the Atlantic vs 0.15 mM for theIndian vs 0.53 mM for the Pacific; see Martiny et al.,2009 for details). The phosphate uptake system(pitA), which operates at high phosphate concentra-tions, had the opposite pattern, being more abun-dant in the Indian and Pacific Oceans (Figure 6a).Recently, other studies have noted similar trends forphosphorus gene distribution in the Prochlorococcusand SAR11 lineages (Rusch et al., 2007; Martinyet al., 2009), indicating that phosphorus concentra-tion may impart a strong selective force on marinebacterial genomes. Of particular note were Roseo-bacter genes encoding for phosphonate uptake andassimilation, which exhibited a very large bias indistribution toward the Atlantic Ocean (Figure 6a).

Most representative genes we examined weremore prevalent in the isolate genomes than in ourper-genome-equivalent calculations for the GOSsamples (Figure 6b), an observation that cannot beattributed to sampling disparities as 158 Roseobactergenome equivalents were sampled in the GOS (seeMaterials and methods). Compared with the roseo-bacters represented in culture, natural Roseobacterpopulations in the ocean are more likely to havegenes for processing DMSP and utilization of C1carbon compounds, but less likely to have genesinvolved in motility, adhesion, quorum sensing,gene transfer and iron uptake (Figure 6b). The higherprevalence of selected genes in the isolate genomescompared with GOS samples may indicate that thereare fewer genes per genome in wild cells, could beindicative of the differences in sampling locationsbetween the GOS samples and the isolates, or mightreflect a bias during our analysis in selecting genespreviously noted in cultured Roseobacter genomes.

Comparison of 32 Roseobacter genomesRJ Newton et al

794

The ISME Journal

Page 12: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

As the GOS samples were passed through a 0.8-mM

filter before sequencing (Rusch et al., 2007), there isalso poor representation of sequences from particle-associated cells.

Conclusions

Comparative genomic analysis of a bacterial lineageis a powerful approach for revealing ecological andevolutionary forces that influence genome content,and might form the basis for delineating ecologicallydifferentiated clusters in nature. The substantial 16SrRNA sequence divergence within the roseobacters(11%; Buchan et al., 2005), currently spanning aminimum of 45 described genera, makes this thebroadest marine bacterial lineage for which acomparative genomic analysis has yet been under-taken. This taxonomic level is consistent, however,with current methodological resolution in microbialecology, including target groups for 16S rRNAprobes and primers (Alonso-Saez et al., 2007; Lamiet al., 2009), and efforts to assign taxon-specificbiogeochemical roles (Alonso and Pernthaler, 2006;Mou et al., 2008; Poretsky et al., 2010).

The three predictive frameworks examined herefor the Roseobacter genomes have previously beenshown to correlate with the genome content inbacterial taxa, including phylogenetic relatedness inProchlorococcus (Garczarek et al., 2000; Bibby et al.,

2003; Rocap et al., 2003), environmental resourcepartitioning in Vibrionaceae (a lineage with similar16S rRNA divergence as the roseobacters; Huntet al., 2008), and trophic strategies in bacterialendosymbionts and aquatic bacterioplankton(Moran and Baumann, 2000; Lauro et al., 2009).For the Roseobacter lineage, whole-genome contentanalysis of the 32 genomes produced 23 genomeclusters (Figure 3) representing 20 unique combina-tions of clade, trophic strategy and environmentalsource. New sequences of Roseobacter strains maywell increase the number of known genome clusters,particularly because two environmentally abundant16S rRNA clades do not yet have reference genomesequences. While all three frameworks had statisti-cally significant predictive power, none emergedas the potential overriding force imprinting Roseo-bacter genome content. Although other possibleexplanatory frameworks might have been consid-ered here, all but two of the 23 genome clustershave unique clade–trophy–environment assignments(Figure 3), suggesting that these three frameworkstogether acceptably classify most of the variability ingenome content.

The finding that trophic strategy correlates betterthan phylogeny or environment with Roseobactergene inventories (ANOSIM, R¼ 0.545 vs 0.410 vs0.296) was not anticipated at the outset of ouranalysis, at least in part because it is not a corre-late that has been widely examined for marine

pitA

Pacific

90%

hmuR

50%

30%

10%90%

70%

50%

30%

10%

90%70%50%30%10%

70%

Indian

tadBluxI

Atlantic

phnJ

catB bioF

hpcC dddP

boxBsgaA

coxG

soxBpaaJ coxL

mesacon

gdhA

gtdA

pcaGtmd

dmdApufC

cbbLfdnH ureC

hmgA

phoXppk1

phoRthiO

thiFphnM

100

80

60

40

20

Gen

e %

in G

OS

(O

cean

) S

amp

les

narKtdaH napA

nirDhoxA norCdddL luxR

mcpQcheB

Gene % in Ocean isolates’ Genomes

fliC1

20 40 60 80 100

npdarcC dddD

atzFmoxF

aceAbioFcatB

cbbLnosR

ccpA

luxInasANRPS

hmuR

mxcBcobB

irpA flgEgta_HK97

gta_orf6

hmgA thiOthiF

coxLtadBgtdA

pufC

hpcCcoxG

boxB gdhA

paaJ

phnJpcaG mesacon

phnM

phoXsoxB ppk1

phoRpitAtmd

dddP

sgaA dmdA fdnH ureC

Figure 6 (a) Ocean basin (Atlantic, Indian and Pacific) three-way comparison of Roseobacter genes/gene pathways (see Figure 4). Theposition of each dot indicates the relative abundance of a gene in the Global Ocean Sampling (GOS) data set, based on 61, 52 and 45Roseobacter genome equivalents in the Atlantic, Indian and Pacific Ocean data sets, respectively. Only genes present at a frequencyX10% of genome equivalents in any one ocean basin are depicted. Triangle vertices indicate 100% relative abundance of that particulargene in the representative ocean. Three lines creating an inverted triangle have been drawn to aid in visual interpretation and indicate arelative abundance¼ 50% in a single ocean basin. (b) Gene occurrence percentage of Roseobacter genome equivalents in the GOS data set(n¼ 158) vs isolate genomes (n¼ 32). Gene occurrence percentage 4100% (that is, more than one copy per genome equivalent) in theGOS samples are represented as 100%. Gene descriptions are listed in Supplementary Table S2. GOS samples included in thecomparison are listed in Supplementary Table S3.

Comparison of 32 Roseobacter genomesRJ Newton et al

795

The ISME Journal

Page 13: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

bacterial genomes. Nevertheless, the past decadehas uncovered remarkable flexibility in the trophicstrategies of marine bacterioplankton, suggesting thatacquisition of alternate mechanisms for obtainingcarbon and energy may be a strong evolutionaryforce in the ocean. The occurrence of several distincttrophic schemes within the taxonomically broadRoseobacter lineage provided an ideal opportunityto explore whether a bacterium’s strategy forobtaining carbon and energy predicts other aspectsof genome content. Differences in gene contentamong trophic groups were unfortunately domi-nated by hypothetical proteins, which providelittle biological insight, although C1 and aromaticcarbon oxidation genes and amino-acid transportand metabolism genes contributed to the signal.This concept of lifestyle imprinting of genomecontent, which has been explored in great detailfor bacterial endosymbionts (for example, Moranand Baumann, 2000), may therefore also be impor-tant for understanding gene inventories of oceanmicrobes.

Roseobacter-like genes in the GOS data setshowed significant variation in frequency acrossocean basins, although only a fraction of all possiblegenes and gene systems appear to be shaped at thisgrand scale (Figure 6a). The GOS data set was alsovaluable for determining how well the genomesfrom the cultured roseobacters represent the reper-toire and stoichiometry of genes in ocean-dwelling‘wild’ roseobacters, an important perspective forassessing the relevance of this isolate-based genomeanalysis. Although the mismatch in frequency ofsome examined genes between isolates and the GOSdata set suggests that the currently cultured strainsmay not yet provide a faithful representation of theprevalent natural Roseobacter populations, manygenes and gene systems were indeed present atcomparable frequencies (Figure 6b).

Overall, our analysis has firmly establishedroseobacters as ecological generalists, harboringlarge gene inventories and a remarkable suite ofmechanisms by which to obtain carbon and energy.Further, this comparative analysis has illustratedthat members of the lineage cannot be easilycondensed into a few ecologically differentiatedclusters; rather, each genome is largely unique in itsassortment of genes for acquisition and transforma-tion of carbon and nutrients. The fact that the bestframework for predicting genome content is lifestylestrategy, not phylogeny, indicates that horizontalgene transfer and homologous recombination maybe particularly dominant evolutionary forces in thismarine bacterial lineage (possibly facilitated by anunusual gene transfer agent system that is prevalent;Biers et al., 2008; Zhao et al., 2009). Further insightsinto correlates of genome content, coupled withcontinued efforts to identify Roseobacter genes thatare common in the world oceans, will betterelucidate the functional roles of roseobacters inmarine ecosystems.

Acknowledgements

This project was supported by grants from the Gordonand Betty Moore Foundation and the National ScienceFoundation (OCE0724017 and MCB0702125). We thankDr S Santos for providing a list of universal genes.

References

Alonso C, Pernthaler J. (2006). Concentration-dependentpatterns of leucine incorporation by coastal pico-plankton. Appl Environ Microbiol 72: 2141–2147.

Alonso-Saez L, Balague V, Sa EL, Sanchez O, Gonzalez JM,Pinhassi J et al. (2007). Seasonality in bacterialdiversity in north-west Mediterranean coastal waters:assessment through clone libraries, fingerprinting andFISH. FEMS Microbiol Ecol 60: 98–112.

Bibby TS, Mary I, Nield J, Partensky F, Barber J. (2003).Low-light-adapted Prochlorococcus species possessspecific antennae for each photosystem. Nature 424:1051–1054.

Biers EJ, Wang K, Pennington C, Belas R, Chen F, MoranMA. (2008). Occurrence and expression of genetransfer agent genes in marine bacterioplankton. ApplEnviron Microbiol 74: 2933–2939.

Blankenship RE, Madigan MT, Bauer CE. (1995). Anoxy-genic Photosynthetic Bacteria. Kluwer AcademicPublishers: Dordrecht, Boston.

Brinkhoff T, Giebel HA, Simon M. (2008). Diversity,ecology, and genomics of the Roseobacter clade: ashort overview. Arch Microbiol 189: 531–539.

Buchan A, Gonzalez JM, Moran MA. (2005). Overview ofthe marine Roseobacter lineage. Appl Environ Micro-biol 71: 5665–5677.

Coleman ML, Chisholm SW. (2007). Code and context:Prochlorococcus as a model for cross-scale biology.Trends Microbiol 15: 398–407.

Croft MT, Lawrence AD, Raux-Deery E, Warren MJ, SmithAG. (2005). Algae acquire vitamin B12 through asymbiotic relationship with bacteria. Nature 438:90–93.

Dufresne A, Ostrowski M, Scanlan DJ, Garczarek L,Mazard S, Palenik BP et al. (2008). Unraveling thegenomic mosaic of a ubiquitous genus of marinecyanobacteria. Genome Biol 9: R90. .1-R90.16.

Eisen JA. (1995). The RecA protein as a model moleculefor molecular systematic studies of bacteria: Compar-ison of trees of RecAs and 16S rRNAs from the samespecies. J Mol Evol 41: 1105–1123.

Follows MJ, Dutkiewicz S, Grant S, Chisholm SW. (2007).Emergent biogeography of microbial communities in amodel ocean. Science 315: 1843–1846.

Garczarek L, Hess WR, Holtzendorff J, van der StaayGWM, Partensky F. (2000). Multiplication of antennagenes as a major adaptation to low light in a marineprokaryote. Proc Natl Acad Sci USA 97: 4098–4101.

Gonzalez JM, Moran MA. (1997). Numerical dominance ofa group of marine bacteria in the alpha-subclass of theclass Proteobacteria in coastal seawater. Appl EnvironMicrobiol 63: 4237–4242.

Goris J, Konstantinos KT, Klappenbach JA, Coenye T,Vandamme P, Tiedje JM. (2007). DNA-DNA hybridiza-tion values and their relationship to whole-genomesequence similarities. Int J Syst Evol Microbiol 57:81–91.

Comparison of 32 Roseobacter genomesRJ Newton et al

796

The ISME Journal

Page 14: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

Hess WR, Rocap G, Ting CS, Larimer F, Stilwagen S,Lamerdin J et al. (2001). The photosynthetic apparatusof Prochlorococcus: insights through comparativegenomics. Photosynth Res 70: 53–71.

Howard EC, Sun SL, Biers EJ, Moran MA. (2008).Abundant and diverse bacteria involved in DMSPdegradation in marine surface waters. Environ Micro-biol 10: 2397–2410.

Hunt DE, David LA, Gevers D, Preheim SP, Alm EJ, PolzMF. (2008). Resource partitioning and sympatricdifferentiation among closely related bacterioplank-ton. Science 320: 1081–1085.

Johnson ZI, Zinser ER, Coe A, McNulty NP, WoodwardEMS, Chisholm SW. (2006). Niche partitioning amongProchlorococcus ecotypes along ocean-scale environ-mental gradients. Science 311: 1737–1740.

Karl DM. (2002). Nutrient dynamics in the deep blue sea.Trends Microbiol 10: 410–418.

King GA. (2003). Molecular and culture-based analyses ofaerobic carbon monoxide oxidizer diversity. ApplEnviron Microbiol 69: 7257–7265.

Lami R, Ghiglione J-F, Desdevises Y, West NJ, Lebaron P.(2009). Annual patterns of presence and activity ofmarine bacteria monitored by 16S rDNA–16S rRNAfingerprints in the coastal NW Mediterranean Sea.Aquat Microb Ecol 54: 199–210.

Lauro FM, McDouglad D, Thomas T, Williams TJ, Egan S,Rice S et al. (2009). The genomic basis of trophicstrategy in marine bacteria. Proc Natl Acad Sci USA106: 15527–15533.

Legendre P, Legendre L. (1998). Numerical Ecology.Second English edn. Elsevier Science, BV, Elsevier:Amsterdam, Netherlands.

Ludwig W, Strunk O, Westram R, Richter L, Meier H,Kumar Yet al. (2004). ARB: a software environment forsequence data. Nucleic Acids Res 32: 1363–1371.

Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, ChuK, Dalevi D et al. (2008). IMG/M: a data managementand analysis system for metagenomes. Nucleic AcidsRes 36: D534–D538.

Martiny AC, Coleman ML, Chisholm SW. (2006). Phos-phate acquisition genes in Prochlorococcus ecotypes:evidence for genome-wide adaptation. Proc Natl AcadSci USA 103: 12552–12557.

Martiny AC, Huang Y, Li WZ. (2009). Occurrence ofphosphate acquisition genes in Prochlorococcus cellsfrom different ocean regions. Environ Microbiol 11:1340–1347.

Moran MA, Belas R, Schell MA, Gonzalez JM, Sun F, SunS et al. (2007). Ecological genomics of marineroseobacters. Appl Environ Microbiol 73: 4559–4569.

Moran MA, Buchan A, Gonzalez JM, Heidelberg JF,Whitman WB, Kiene RP et al. (2004). Genomesequence of Silicibacter pomeroyi reveals adaptationsto the marine environment. Nature 432: 910–913.

Moran NA, Baumann P. (2000). Bacterial endosymbiontsin animals. Curr Opin Microbiol 3: 270–275.

Mou X, Sun S, Edwards RA, Hodson RE, Moran MA.(2008). Bacterial carbon processing by generalistspecies in the coastal ocean. Nature 451: 708–711.

Palenik B, Ren QH, Dupont CL, Myers GS, Heidelberg JF,Badger JH et al. (2006). Genome sequence ofSynechococcus CC9311: insights into adaptation toa coastal environment. Proc Natl Acad Sci USA 103:13555–13559.

Polz MF, Hunt DE, Preheim SP, Weinreich DM. (2006).Patterns and mechanisms of genetic and phenotypic

differentiation in marine microbes. Phil Trans R Soc B361: 2009–2021.

Poretsky RS, Sun S, Mou X, Moran MA. (2010). Transpor-ter genes expressed by coastal bacterioplankton inresponse to dissolved organic carbon. Environ Micro-biol. Advance online publication doi:10.1111/j.1462–2920.2009.02102.x.

Rabouille S, Edwards CA, Zehr JP. (2007). Modeling thevertical distribution of Prochlorococcus and Synecho-coccus in the North Pacific Subtropical Ocean.Environ Microbiol 9: 2588–2602.

Rocap G, Larimer FW, Lamerdin J, Malfatti S, Chain P,Ahlgren NA et al. (2003). Genome divergence intwo Prochlorococcus ecotypes reflects oceanic nichedifferentiation. Nature 424: 1042–1047.

Rusch DB, Halpern A, Sutton G, Heidelberg KB,Williamson S, Yooseph S et al. (2007). The SorcererII Global Ocean Sampling expedition: NorthwestAtlantic through Eastern Tropical Pacific. PLoS Biol5: 398–431.

Santos SR, Ochman H. (2004). Identification and phylo-genetic sorting of bacterial lineages with universallyconserved genes and proteins. Environ Microbiol 6:754–759.

Scanlan DJ, Ostrowski M, Mazard S, Dufresne A, Garczar-ek L, Hess WR et al. (2009). Ecological genomics ofmarine picocyanobacteria. Microbiol Mol Bio Rev 73:249–299.

Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M.(2007). CAMERA: a community resource for meta-genomics. PLoS Biol 5: 394–397.

Shiba T, Simidu U, Taga N. (1979). Distribution of aerobic-bacteria which contain bacteriochlorophyll-a. ApplEnviron Microbiol 38: 43–45.

Shimada K. (1995). Aerobic anoxygenic phototrophs. In:Blankenship RE, Madigan MT, Bauer CE (eds).Anoxygenic Photosynthetic Bacteria. Vol. 2. SpringerNetherlands, Dordrecht; Boston: Kluwer AcademicPublishers, pp 105–122.

Snel B, Bork P, Huynen MA. (2002). Genomes in flux:the evolution of archaeal and proteobacterial genecontent. Genome Res 12: 17–25.

Sorokin DY, Banciu H, van Loosdrecht M, Kuenen JG.(2003). Growth physiology and competitive interac-tion of obligately chemolithoautotrophic, haloalkali-philic, sulfur-oxidizing bacteria from soda lakes.Extremophiles 7: 195–203.

Stamatakis A, Hoover P, Rougemont J. (2008). A rapidbootstrap algorithm for the RAxML web servers. SystBiol 57: 758–771.

Swingley WD, Sadekar S, Mastrian SD, Matthies HJ, Hao J,Ramos H et al. (2007). The complete genome sequenceof Roseobacter denitrificans reveals a mixotrophicrather than photosynthetic metabolism. J Bacteriol189: 683–690.

Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, KiryutinB, Koonin EV et al. (2003). The COG database: anupdated version includes eukaryotes. BMC Bioinfor-matics 4: 1–14.

Wagner-Dobler I, Biebl H. (2006). Environmental biologyof the marine Roseobacter lineage. Annu Rev Micro-biol 60: 255–280.

Wagner-Dobler I, Ballhausen B, Berger M, Brinkhoff T,Buchholz I, Bunk B et al. (2009). The completegenome sequence of the algal symbiont Dinoroseobactershibae: a hitchiker’s guide to life in the sea. ISME J 4:61–77.

Comparison of 32 Roseobacter genomesRJ Newton et al

797

The ISME Journal

Page 15: Genome characteristics of a generalist marine … · Genome characteristics of a generalist marine bacterial lineage Ryan J Newton 1, Laura E Griffin , Kathy M Bowles1, ... (for example,

West NJ, Obernosterer I, Zemb O, Lebaron P. (2008). Majordifferences of bacterial diversity and activity insideand outside of a natural iron-fertilized phytoplanktonbloom in the Southern Ocean. Environ Microbiol 10:738–756.

West NJ, Scanlan DJ. (1999). Niche-partitioning ofProchlorococcus populations in a stratified watercolumn in the eastern North Atlantic Ocean. ApplEnviron Microbiol 65: 2585–2591.

Yooseph S, Sutton G, Rusch DB, Halpern AL, WilliamsonSJ, Remington K et al. (2007). The Sorcerer II Global

Ocean Sampling expedition: expanding the universeof protein families. PLoS Biol 5: 432–466.

Zhao YL, Wang K, Budinoff C, Buchan A, Lang A, Jiao NZet al. (2009). Gene transfer agent (GTA) genes revealdiverse and dynamic Roseobacter and Rhodobacterpopulations in the Chesapeake Bay. ISME J 3: 364–373.

Zwirglmaier K, Jardillier L, Ostrowski M, Mazard S,Garczarek L, Vaulot D et al. (2008). Global phylogeo-graphy of marine Synechococcus and Prochlorococcusreveals a distinct partitioning of lineages amongoceanic biomes. Environ Microbiol 10: 147–161.

Supplementary Information accompanies the paper on The ISME Journal website (http://www.nature.com/ismej)

Comparison of 32 Roseobacter genomesRJ Newton et al

798

The ISME Journal