alignments and phylogenies review€¦ · supplementary figure 2 | annotated visualization of the...
TRANSCRIPT
nature | methods
Visualization of multiple alignments, phylogenies and gene family evolution James B Procter, Julie Thompson, Ivica Letunic, Chris Creevey, Fabrice Jossinet & Geoffrey J Barton
Supplementary figures and text:
Supplementary Figure 1 BLAST results for the human aryl sulfatase sequence as viewed in
VectorNTI and Geneious
Supplementary Figure 2 Annotated visualization of the Pfam alignment for the sulfatase family
and linked view of PDB structure 1fsu
Supplementary Figure 3 Visualizing the NCBI taxonomy
Supplementary Figure 4 Tree-based alignment analysis
Supplementary Figure 5 Annotating phylogenetic trees with complex data
Nature Methods: doi: 10.1038/nmeth.1434
Supplementary Figure 1 | Blast results for the human aryl sulfatase sequence
as viewed in VectorNTI (a), and Geneious (b). See main text for details.
Hierarchical hit list for further
details.
Hit distribution and consensus profile on query positions
Birds-eye view of hits on query
sequence.
Alignment of Query and Hit
distance tree for hits
in clade
a
b
Top Hit
Alignment Trace showing context
of aligned segments.
Slider sets threshold to grey out clades in hitlist
Nature Methods: doi: 10.1038/nmeth.1434
Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the
sulfatase family and linked view of PDB structure 1fsu.
Sulfatases are a highly conserved enzyme family. They hydrolyze sulfate ester
bonds in a variety of structurally diverse compounds but have similar overall
folds, mechanisms of action, and bivalent metal ion-binding sites1, 2. (a) Pfam
family alignment rendered with Jalview. Knowledge of the substrate specificity
for each sequence allows the alignment to be divided into 6 functional sub-
families (names on the left were added manually). Conserved sequence regions
calculated by MACSIMS are indicated by colored shapes. Regions 1-4 (above
the alignment) are shown to be shared by all the sub-families. Within these,
highlighted single residues correspond to known functionally active sites.
Secondary structure annotation from PDB structure 1fsu (ARSB_HUMAN) is
shown below the alignment, above the Livingstone and Barton conservation
score.Conserved sequence regions (Regions 1 and 2 above the alignment)
were detected by MACSIMS and shared by all sub-families. Disulphide bond
annotation above the alignment was obtained from PDB sequence 1n2l
(ARSA_HUMAN). Secondary structure annotation from PDB structure 1fsu
(ARSB_HUMAN) is below the alignment, above the Livingstone and Barton
conservation score. (b) Image taken from Jalview’s linked Jmol view of 1fsu,
showing the structural context of regions 1-4 in a. (c). Close up of region
underscored in red in (a) with annotated regions of sequences colored
according to type and origin, locating PROSITE motifs and known mutations in
the alignment. Inset box shows close up of Jalview tooltip conveying additional
annotation information. The sequence annotations that were obtained from
public databases (Uniprot2, PDB3 or Interpro4), are colored using a dark shade.
The features shown in a lighter shade were inferred by MACSIMS, which
propagates these known properties to the uncharacterized sequences.
Nature Methods: doi: 10.1038/nmeth.1434
Supplementary Figure 2 | Annotated visualization of the Pfam alignment for the
sulfatase family and linked view of PDB structure 1fsu.
ARSA
ARSGARS
STS/ARSE/F
ARSB
ARSI/J
Region 1 Region 2 Region 3 Region 4a
b
Region 1 Region 2disulphide bond
ARSA
ARSG
ARS
STS/ARSE/F
ARSB
ARSI/J
PROSITEsulfatase1sulfatase2
metal bindingactive site
glycosylationNAG binding mutation
hydrophobic
c
Nature Methods: doi: 10.1038/nmeth.1434
Supplementary Figure 3 | Visualizing the NCBI taxonomy. Perhaps, the closest
we can get to the visualisation of the entire tree of life. In this figure, taken from
Hughes et. al.,5 Walrus and Phylo3D were used to visualize the complete NCBI
taxonomy, containing close to 200 000 species, in a hyperbolic 3D space.
Bacteria are focused in the image, and shown in orange. Eukaryotes are shown
in yellow, in the left hand side. Archaea are represented with the red colored
nodes shown in the top and background of the image.
Nature Methods: doi: 10.1038/nmeth.1434
Supplementary Figure 4 | Tree Based Alignment Analysis. (a) Tree based
alignment analysis of the sulfatase family using JevTrace, applied to the Pfam
family alignment and tree. The panel on the left shows the algorithm’s
automatically generated tree partition. The alignment, shown adjacent to the
leaves, is annotated with regions found to exhibit sub-family conservation. A
view of the associated PDB structure (1fsu) is shown on the right, colored
according to sub-family specific mutations. (b) Snapshot from Jalview showing
same region of sulfatase alignment as in Figure S2C, with Clustal conservation
based shading and colouring applied to each subgroup. This rendering style
reveals subfamily specific conservation patterns that generally contain the
residues known to be involved in substrate binding (c.f. annotation in figure
S2c). (c) Neighbor-joining tree for the alignment in S2A calculated with Jalview,
with sub-trees corresponding to each sub-family highlighted with different
colors.
Nature Methods: doi: 10.1038/nmeth.1434
Supplementary Figure 4 | Tree Based Alignment Analysis.
a
ARSA
ARSG
ARS
STS/ARSE/F
ARSB
ARSI/J
b c
Nature Methods: doi: 10.1038/nmeth.1434
Supplementary Figure 5 | Annotating phylogenetic trees with complex data.
iTOL6 was used to annotate an automatically generated Tree of life.7 Blue
barcharts represent the genome sizes, defined as number of predicted protein
coding genes. Piecharts show the distribution of preferred habitats for various
taxa identified in several metagenomics sequencing projects.8
Giardia lamblia ATCC 50803
Leishmania major
Thalassiosira pseudonana CCMP1335
Cryptosporidium hominis
Plasmodium falciparum 3D7
Oryza sativa
Arabidopsis thaliana
Cyanidioschyzon merolae
Dictyostelium discoideum
Gallus gallus
Mus m
usculusR
attus norvegicusH
omo sapiens
Pan troglodytes
Takifugu rubripesDanio rerio
Drosophila melanogaster
Anopheles gambiae str PEST
Caenorhabditis elegans
Caenorhabditis briggsae
Schizosaccharomyces pombe
Eremothecium gossypii
Saccharomyces cerevisiae
Pyrobaculum aerophilum
Aeropyrum pernix
Sulfolobus tokodaii
Sulfolobus solfataricus
Nanoarchaeum equitans
Thermoplasma acidophilum
Thermoplasma volcanium
Pyrococcus furiosus
Pyrococcus abyssi
Pyrococcus horikoshii
Methanopyrus kandleri
Methanothermobacter thermautotrophicus str Delta H
Methanocaldococcus jannaschii
Methanococcus maripaludis
Archaeoglobus fulgidus
Methanosarcina acetivorans
Methanosarcina mazei
Halobacterium sp NRC−1
Thermoanaerobacter tengcongensisC
lostridium acetobutylicum
Clostridium
tetaniC
lostridium perfringens
Stap
hylo
cocc
us a
ureu
s su
bsp
aure
us M
W2
Stap
hylo
cocc
us a
ureu
s su
bsp
aure
us N
315
Stap
hylo
cocc
us a
ureu
s su
bsp
aure
us M
u50
Stap
hylo
cocc
us e
pide
rmid
is
List
eria
inno
cua
List
eria
mon
ocyt
ogen
es s
tr 4b
F23
65
List
eria
mon
ocyt
ogen
es
Bacil
lus
subt
ilisBa
cillu
s an
thra
cis s
tr Am
es
Bacil
lus
cere
us A
TCC
1098
7
Bacil
lus
cere
us A
TCC
1457
9
Baci
llus
halo
dura
nsO
cean
obac
illus
ihey
ensi
s
Ente
roco
ccus
faec
alis
Lacto
cocc
us la
ctis s
ubsp
lacti
s
Strepto
cocc
us pn
eumon
iae R
6
Strepto
cocc
us pn
eumon
iae
Strepto
cocc
us ag
alacti
ae se
rogrou
p III
Strepto
cocc
us ag
alacti
ae se
rogrou
p V
Streptoco
ccus p
yogenes
Streptococcu
s pyogenes M
GAS8232
Streptococcu
s pyogenes M
GAS315
Streptococcu
s pyogenes S
SI−1
Strepto
cocc
us m
utans
Lacto
bacil
lus p
lanta
rum
Lacto
bacil
lus jo
hnso
nii
Onion yellow
s phytoplasma
Mycoplasm
a mycoides subsp m
ycoides SCM
ycoplasma m
obile 163K
Mycoplasm
a pulmonis
Ureaplasm
a parvum
Myc
opla
sma
pene
trans
Myc
opla
sma
gallis
eptic
um
Myc
opla
sma
pneu
mon
iae
Myc
opla
sma
geni
taliu
m
Fibrobacter su
ccinogenes s
ubsp succi
nogenes S85
Chlorobaculum tepidum
Porphyromonas gingivalis
Bacteroides thetaiotaomicron
Chlamydia muridarum
Chlamydia trachomatis
Chlamydophila caviae
Chlamydophila pneumoniae TW−183
Chlamydophila pneumoniae J138
Chlamydophila pneumoniae CWL029
Chlamydophila pneumoniae AR39
Gemmata obscuriglobus UQM 2246
Pirellula sp
Leptospira interrogans serovar Copenhageni
Leptospira interrogans
Borrelia burgdorferi
Treponema denticola
Treponema pallidum
Streptomyces coelicolorStreptomyces avermitilis
Mycobacterium avium subsp paratuberculosisMycobacterium tuberculosis CDC1551
Mycobacterium tuberculosis H37Rv
Mycobacterium bovis
Mycobacterium leprae
Corynebacterium diphtheriaeCorynebacterium efficiensCorynebacterium glutamicumCorynebacterium glutamicum ATCC 13032
Bifidobacterium longum
Tropheryma whipplei TW0827
Tropheryma whipplei str Twist
Fusobacterium nucleatum subsp nucleatum
Thermotoga maritima
Aquifex aeolicusDehalococcoides ethenogenes 195
Thermus thermophilus HB27
Deinococcus radiodurans
Gloeobacter violaceus
Synechococcus elongatus
Nostoc sp PCC 7120
Synechocystis sp PCC 6803
Prochlorococcus marinus
Prochlorococcus marinus str MIT 9313
Synechococcus sp WH 8102
Prochlorococcus marinus subsp pastoris str CCMP1986
Acidobacterium capsulatum ATCC 51196
Candidatus Solibacter usitatus Ellin6076
Desulfovibrio vulgaris str Hildenborough
Geobacter sulfurreducens
Bdellovibrio bacteriovorus
Campylobacter jejuni
Wolinella succinogenes
Helicobacter hepaticus
Helicobacter pylori
Helicobacter pylori J99
Caulobacter vibrioides
Sinorhizobium m
eliloti
181661
180835
Brucella suis
Brucella m
elitensis
Mesorhizobium
loti
Rhodopseudom
onas palustris
Bradyrhizobium japonicum
Rickettsia conorii
Rickettsia prow
azekiiW
olbachia sp wM
el
Nitr
osom
onas
eur
opae
a
Chr
omob
acte
rium
vio
lace
um
Nei
sser
ia m
enin
gitid
is s
erog
roup
B
Nei
sser
ia m
enin
gitid
is s
erog
roup
A
Ral
ston
ia s
olan
acea
rum
Bord
etel
la p
ertu
ssis
Bord
etel
la b
ronc
hise
ptic
a
Bord
etel
la p
arap
ertu
ssis
Cox
iella
bur
netii
Xant
hom
onas
cam
pest
ris p
v ca
mpe
stris
Xant
hom
onas
axo
nopo
dis
pv c
itri
Xyle
lla fa
stid
iosa
Xyle
lla fa
stid
iosa
Tem
ecul
a1
Pseu
dom
onas
aer
ugin
osa
Pseu
dom
onas
put
ida
KT24
40
Pseu
dom
onas
syr
inga
e pv
tom
ato
Shew
anel
la o
neid
ensis
Phot
obac
teriu
m p
rofu
ndum
Vibr
io ch
olera
e
Vibr
io vu
lnific
us Y
J016
Vibrio
vulni
ficus
Vibr
io pa
raha
emoly
ticusPas
teurel
la mult
ocida
Haemop
hilus
influ
enza
e
Haemop
hilus
ducre
yi
602
Salmonella enterica subsp enterica serovar Typhi str Ty2601
Escherichia coli O157:H7 EDL933Escherichia coli O157:H7Escherichia coli O6Escherichia coli
Shigella flexneri 2a str 2457TShigella flexneri
Yersinia pestis biovar Microtus str 9
1001
Yersinia pestis KIMYersinia pestis
Photorhabdus luminesce
ns subsp laumondii
Buchnera aphidicola (A
cyrthosip
hon pisum)
Buchnera aphidicola (S
chizaphis g
raminum)
Buchnera aphidico
la (Baizo
ngia pistacia
e)
Candid
atus B
lochm
annia
florid
anus
Wiggles
worthia
glos
sinidi
a end
osym
biont
of Glos
sina b
revipa
lpis
Nature Methods: doi: 10.1038/nmeth.1434
References
1. Ghosh, D. Human sulfatases: a structural perspective to catalysis. Cell Mol Life Sci 64, 2013-22 (2007).
2. The UniProt Consortium. The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res 37, D169-74 (2009).
3. Berman, H.M. et al. The Protein Data Bank. Nucleic Acids Res 28, 235-42 (2000).
4. Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res 37, D211-5 (2009).
5. Hughes, T., Hyun, Y. & Liberles, D.A. Visualising very large phylogenetic trees in three dimensional hyperbolic space. BMC Bioinformatics 5, 48 (2004).
6. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127-8 (2007).
7. Ciccarelli, F.D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283-7 (2006).
8. von Mering, C. et al. Quantitative phylogenetic assessment of microbial communities in diverse environments. Science 315, 1126-30 (2007).
Nature Methods: doi: 10.1038/nmeth.1434