alignments and phylogenies review€¦ · supplementary figure 2 | annotated visualization of the...

9
nature | methods Visualization of multiple alignments, phylogenies and gene family evolution James B Procter, Julie Thompson, Ivica Letunic, Chris Creevey, Fabrice Jossinet & Geoffrey J Barton Supplementary figures and text: Supplementary Figure 1 BLAST results for the human aryl sulfatase sequence as viewed in VectorNTI and Geneious Supplementary Figure 2 Annotated visualization of the Pfam alignment for the sulfatase family and linked view of PDB structure 1fsu Supplementary Figure 3 Visualizing the NCBI taxonomy Supplementary Figure 4 Tree-based alignment analysis Supplementary Figure 5 Annotating phylogenetic trees with complex data Nature Methods: doi: 10.1038/nmeth.1434

Upload: others

Post on 30-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Alignments and Phylogenies Review€¦ · Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the sulfatase family and linked view of PDB structure 1fsu. Sulfatases

nature | methods

Visualization of multiple alignments, phylogenies and gene family evolution James B Procter, Julie Thompson, Ivica Letunic, Chris Creevey, Fabrice Jossinet & Geoffrey J Barton

Supplementary figures and text:

Supplementary Figure 1 BLAST results for the human aryl sulfatase sequence as viewed in

VectorNTI and Geneious

Supplementary Figure 2 Annotated visualization of the Pfam alignment for the sulfatase family

and linked view of PDB structure 1fsu

Supplementary Figure 3 Visualizing the NCBI taxonomy

Supplementary Figure 4 Tree-based alignment analysis

Supplementary Figure 5 Annotating phylogenetic trees with complex data

Nature Methods: doi: 10.1038/nmeth.1434

Page 2: Alignments and Phylogenies Review€¦ · Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the sulfatase family and linked view of PDB structure 1fsu. Sulfatases

Supplementary Figure 1 | Blast results for the human aryl sulfatase sequence

as viewed in VectorNTI (a), and Geneious (b). See main text for details.

Hierarchical hit list for further

details.

Hit distribution and consensus profile on query positions

Birds-eye view of hits on query

sequence.

Alignment of Query and Hit

distance tree for hits

in clade

a

b

Top Hit

Alignment Trace showing context

of aligned segments.

Slider sets threshold to grey out clades in hitlist

Nature Methods: doi: 10.1038/nmeth.1434

Page 3: Alignments and Phylogenies Review€¦ · Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the sulfatase family and linked view of PDB structure 1fsu. Sulfatases

Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the

sulfatase family and linked view of PDB structure 1fsu.

Sulfatases are a highly conserved enzyme family. They hydrolyze sulfate ester

bonds in a variety of structurally diverse compounds but have similar overall

folds, mechanisms of action, and bivalent metal ion-binding sites1, 2. (a) Pfam

family alignment rendered with Jalview. Knowledge of the substrate specificity

for each sequence allows the alignment to be divided into 6 functional sub-

families (names on the left were added manually). Conserved sequence regions

calculated by MACSIMS are indicated by colored shapes. Regions 1-4 (above

the alignment) are shown to be shared by all the sub-families. Within these,

highlighted single residues correspond to known functionally active sites.

Secondary structure annotation from PDB structure 1fsu (ARSB_HUMAN) is

shown below the alignment, above the Livingstone and Barton conservation

score.Conserved sequence regions (Regions 1 and 2 above the alignment)

were detected by MACSIMS and shared by all sub-families. Disulphide bond

annotation above the alignment was obtained from PDB sequence 1n2l

(ARSA_HUMAN). Secondary structure annotation from PDB structure 1fsu

(ARSB_HUMAN) is below the alignment, above the Livingstone and Barton

conservation score. (b) Image taken from Jalview’s linked Jmol view of 1fsu,

showing the structural context of regions 1-4 in a. (c). Close up of region

underscored in red in (a) with annotated regions of sequences colored

according to type and origin, locating PROSITE motifs and known mutations in

the alignment. Inset box shows close up of Jalview tooltip conveying additional

annotation information. The sequence annotations that were obtained from

public databases (Uniprot2, PDB3 or Interpro4), are colored using a dark shade.

The features shown in a lighter shade were inferred by MACSIMS, which

propagates these known properties to the uncharacterized sequences.

Nature Methods: doi: 10.1038/nmeth.1434

Page 4: Alignments and Phylogenies Review€¦ · Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the sulfatase family and linked view of PDB structure 1fsu. Sulfatases

Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the

sulfatase family and linked view of PDB structure 1fsu.

ARSA

ARSGARS

STS/ARSE/F

ARSB

ARSI/J

Region 1 Region 2 Region 3 Region 4a

b

Region 1 Region 2disulphide bond

ARSA

ARSG

ARS

STS/ARSE/F

ARSB

ARSI/J

PROSITEsulfatase1sulfatase2

metal bindingactive site

glycosylationNAG binding mutation

hydrophobic

c

Nature Methods: doi: 10.1038/nmeth.1434

Page 5: Alignments and Phylogenies Review€¦ · Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the sulfatase family and linked view of PDB structure 1fsu. Sulfatases

Supplementary Figure 3 | Visualizing the NCBI taxonomy. Perhaps, the closest

we can get to the visualisation of the entire tree of life. In this figure, taken from

Hughes et. al.,5 Walrus and Phylo3D were used to visualize the complete NCBI

taxonomy, containing close to 200 000 species, in a hyperbolic 3D space.

Bacteria are focused in the image, and shown in orange. Eukaryotes are shown

in yellow, in the left hand side. Archaea are represented with the red colored

nodes shown in the top and background of the image.

Nature Methods: doi: 10.1038/nmeth.1434

Page 6: Alignments and Phylogenies Review€¦ · Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the sulfatase family and linked view of PDB structure 1fsu. Sulfatases

Supplementary Figure 4 | Tree Based Alignment Analysis. (a) Tree based

alignment analysis of the sulfatase family using JevTrace, applied to the Pfam

family alignment and tree. The panel on the left shows the algorithm’s

automatically generated tree partition. The alignment, shown adjacent to the

leaves, is annotated with regions found to exhibit sub-family conservation. A

view of the associated PDB structure (1fsu) is shown on the right, colored

according to sub-family specific mutations. (b) Snapshot from Jalview showing

same region of sulfatase alignment as in Figure S2C, with Clustal conservation

based shading and colouring applied to each subgroup. This rendering style

reveals subfamily specific conservation patterns that generally contain the

residues known to be involved in substrate binding (c.f. annotation in figure

S2c). (c) Neighbor-joining tree for the alignment in S2A calculated with Jalview,

with sub-trees corresponding to each sub-family highlighted with different

colors.

Nature Methods: doi: 10.1038/nmeth.1434

Page 7: Alignments and Phylogenies Review€¦ · Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the sulfatase family and linked view of PDB structure 1fsu. Sulfatases

Supplementary Figure 4 | Tree Based Alignment Analysis.

a

ARSA

ARSG

ARS

STS/ARSE/F

ARSB

ARSI/J

b c

Nature Methods: doi: 10.1038/nmeth.1434

Page 8: Alignments and Phylogenies Review€¦ · Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the sulfatase family and linked view of PDB structure 1fsu. Sulfatases

Supplementary Figure 5 | Annotating phylogenetic trees with complex data.

iTOL6 was used to annotate an automatically generated Tree of life.7 Blue

barcharts represent the genome sizes, defined as number of predicted protein

coding genes. Piecharts show the distribution of preferred habitats for various

taxa identified in several metagenomics sequencing projects.8

Giardia lamblia ATCC 50803

Leishmania major

Thalassiosira pseudonana CCMP1335

Cryptosporidium hominis

Plasmodium falciparum 3D7

Oryza sativa

Arabidopsis thaliana

Cyanidioschyzon merolae

Dictyostelium discoideum

Gallus gallus

Mus m

usculusR

attus norvegicusH

omo sapiens

Pan troglodytes

Takifugu rubripesDanio rerio

Drosophila melanogaster

Anopheles gambiae str PEST

Caenorhabditis elegans

Caenorhabditis briggsae

Schizosaccharomyces pombe

Eremothecium gossypii

Saccharomyces cerevisiae

Pyrobaculum aerophilum

Aeropyrum pernix

Sulfolobus tokodaii

Sulfolobus solfataricus

Nanoarchaeum equitans

Thermoplasma acidophilum

Thermoplasma volcanium

Pyrococcus furiosus

Pyrococcus abyssi

Pyrococcus horikoshii

Methanopyrus kandleri

Methanothermobacter thermautotrophicus str Delta H

Methanocaldococcus jannaschii

Methanococcus maripaludis

Archaeoglobus fulgidus

Methanosarcina acetivorans

Methanosarcina mazei

Halobacterium sp NRC−1

Thermoanaerobacter tengcongensisC

lostridium acetobutylicum

Clostridium

tetaniC

lostridium perfringens

Stap

hylo

cocc

us a

ureu

s su

bsp

aure

us M

W2

Stap

hylo

cocc

us a

ureu

s su

bsp

aure

us N

315

Stap

hylo

cocc

us a

ureu

s su

bsp

aure

us M

u50

Stap

hylo

cocc

us e

pide

rmid

is

List

eria

inno

cua

List

eria

mon

ocyt

ogen

es s

tr 4b

F23

65

List

eria

mon

ocyt

ogen

es

Bacil

lus

subt

ilisBa

cillu

s an

thra

cis s

tr Am

es

Bacil

lus

cere

us A

TCC

1098

7

Bacil

lus

cere

us A

TCC

1457

9

Baci

llus

halo

dura

nsO

cean

obac

illus

ihey

ensi

s

Ente

roco

ccus

faec

alis

Lacto

cocc

us la

ctis s

ubsp

lacti

s

Strepto

cocc

us pn

eumon

iae R

6

Strepto

cocc

us pn

eumon

iae

Strepto

cocc

us ag

alacti

ae se

rogrou

p III

Strepto

cocc

us ag

alacti

ae se

rogrou

p V

Streptoco

ccus p

yogenes

Streptococcu

s pyogenes M

GAS8232

Streptococcu

s pyogenes M

GAS315

Streptococcu

s pyogenes S

SI−1

Strepto

cocc

us m

utans

Lacto

bacil

lus p

lanta

rum

Lacto

bacil

lus jo

hnso

nii

Onion yellow

s phytoplasma

Mycoplasm

a mycoides subsp m

ycoides SCM

ycoplasma m

obile 163K

Mycoplasm

a pulmonis

Ureaplasm

a parvum

Myc

opla

sma

pene

trans

Myc

opla

sma

gallis

eptic

um

Myc

opla

sma

pneu

mon

iae

Myc

opla

sma

geni

taliu

m

Fibrobacter su

ccinogenes s

ubsp succi

nogenes S85

Chlorobaculum tepidum

Porphyromonas gingivalis

Bacteroides thetaiotaomicron

Chlamydia muridarum

Chlamydia trachomatis

Chlamydophila caviae

Chlamydophila pneumoniae TW−183

Chlamydophila pneumoniae J138

Chlamydophila pneumoniae CWL029

Chlamydophila pneumoniae AR39

Gemmata obscuriglobus UQM 2246

Pirellula sp

Leptospira interrogans serovar Copenhageni

Leptospira interrogans

Borrelia burgdorferi

Treponema denticola

Treponema pallidum

Streptomyces coelicolorStreptomyces avermitilis

Mycobacterium avium subsp paratuberculosisMycobacterium tuberculosis CDC1551

Mycobacterium tuberculosis H37Rv

Mycobacterium bovis

Mycobacterium leprae

Corynebacterium diphtheriaeCorynebacterium efficiensCorynebacterium glutamicumCorynebacterium glutamicum ATCC 13032

Bifidobacterium longum

Tropheryma whipplei TW0827

Tropheryma whipplei str Twist

Fusobacterium nucleatum subsp nucleatum

Thermotoga maritima

Aquifex aeolicusDehalococcoides ethenogenes 195

Thermus thermophilus HB27

Deinococcus radiodurans

Gloeobacter violaceus

Synechococcus elongatus

Nostoc sp PCC 7120

Synechocystis sp PCC 6803

Prochlorococcus marinus

Prochlorococcus marinus str MIT 9313

Synechococcus sp WH 8102

Prochlorococcus marinus subsp pastoris str CCMP1986

Acidobacterium capsulatum ATCC 51196

Candidatus Solibacter usitatus Ellin6076

Desulfovibrio vulgaris str Hildenborough

Geobacter sulfurreducens

Bdellovibrio bacteriovorus

Campylobacter jejuni

Wolinella succinogenes

Helicobacter hepaticus

Helicobacter pylori

Helicobacter pylori J99

Caulobacter vibrioides

Sinorhizobium m

eliloti

181661

180835

Brucella suis

Brucella m

elitensis

Mesorhizobium

loti

Rhodopseudom

onas palustris

Bradyrhizobium japonicum

Rickettsia conorii

Rickettsia prow

azekiiW

olbachia sp wM

el

Nitr

osom

onas

eur

opae

a

Chr

omob

acte

rium

vio

lace

um

Nei

sser

ia m

enin

gitid

is s

erog

roup

B

Nei

sser

ia m

enin

gitid

is s

erog

roup

A

Ral

ston

ia s

olan

acea

rum

Bord

etel

la p

ertu

ssis

Bord

etel

la b

ronc

hise

ptic

a

Bord

etel

la p

arap

ertu

ssis

Cox

iella

bur

netii

Xant

hom

onas

cam

pest

ris p

v ca

mpe

stris

Xant

hom

onas

axo

nopo

dis

pv c

itri

Xyle

lla fa

stid

iosa

Xyle

lla fa

stid

iosa

Tem

ecul

a1

Pseu

dom

onas

aer

ugin

osa

Pseu

dom

onas

put

ida

KT24

40

Pseu

dom

onas

syr

inga

e pv

tom

ato

Shew

anel

la o

neid

ensis

Phot

obac

teriu

m p

rofu

ndum

Vibr

io ch

olera

e

Vibr

io vu

lnific

us Y

J016

Vibrio

vulni

ficus

Vibr

io pa

raha

emoly

ticusPas

teurel

la mult

ocida

Haemop

hilus

influ

enza

e

Haemop

hilus

ducre

yi

602

Salmonella enterica subsp enterica serovar Typhi str Ty2601

Escherichia coli O157:H7 EDL933Escherichia coli O157:H7Escherichia coli O6Escherichia coli

Shigella flexneri 2a str 2457TShigella flexneri

Yersinia pestis biovar Microtus str 9

1001

Yersinia pestis KIMYersinia pestis

Photorhabdus luminesce

ns subsp laumondii

Buchnera aphidicola (A

cyrthosip

hon pisum)

Buchnera aphidicola (S

chizaphis g

raminum)

Buchnera aphidico

la (Baizo

ngia pistacia

e)

Candid

atus B

lochm

annia

florid

anus

Wiggles

worthia

glos

sinidi

a end

osym

biont

of Glos

sina b

revipa

lpis

Nature Methods: doi: 10.1038/nmeth.1434

Page 9: Alignments and Phylogenies Review€¦ · Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the sulfatase family and linked view of PDB structure 1fsu. Sulfatases

References

1. Ghosh, D. Human sulfatases: a structural perspective to catalysis. Cell Mol Life Sci 64, 2013-22 (2007).

2. The UniProt Consortium. The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res 37, D169-74 (2009).

3. Berman, H.M. et al. The Protein Data Bank. Nucleic Acids Res 28, 235-42 (2000).

4. Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res 37, D211-5 (2009).

5. Hughes, T., Hyun, Y. & Liberles, D.A. Visualising very large phylogenetic trees in three dimensional hyperbolic space. BMC Bioinformatics 5, 48 (2004).

6. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127-8 (2007).

7. Ciccarelli, F.D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283-7 (2006).

8. von Mering, C. et al. Quantitative phylogenetic assessment of microbial communities in diverse environments. Science 315, 1126-30 (2007).

Nature Methods: doi: 10.1038/nmeth.1434