protease phylogeny

24
Exploiting new Genome Data and Web Resources for the Phylogenetic Analyisis of Proteases, Substrates and Inhibitors Christopher Southan Global Compound Sciences, AstraZeneca R&D, Mölndal, Sweden

Upload: chris-southan

Post on 24-Jun-2015

880 views

Category:

Technology


5 download

DESCRIPTION

Review of sources for protease phylogeny. Presented at Biochemical Society Protease meeting, Cirencester , UK Jan07

TRANSCRIPT

Page 1: Protease Phylogeny

Exploiting new Genome Data and Web Resources for the Phylogenetic Analyisis of

Proteases, Substrates and Inhibitors

Christopher SouthanGlobal Compound Sciences, AstraZeneca R&D, Mölndal, Sweden

Page 2: Protease Phylogeny

Introduction

• Multiple alignments and phylogenetic trees are established techniques for examining the evolutionary history of proteases

• New complete genome data gives a broader and deeper phylogentic range

• We have improved and expanded annotation of not only proteases but also their substrates, interaction partners in complexes and inhibitor proteins

• This facilitates the mapping of conserved postions to structure and catalytic and/or binding functions

• Structural coverage across protease classes is increasing

Page 3: Protease Phylogeny

Evolutionary Context of Mapping Proteases to Functions

From: Searls DB. Pharmacophylogenomics: genes, evolution and drug targets. Nat Rev Drug Discov. 2003

Page 4: Protease Phylogeny

Once upon a time: all we had was …

Page 5: Protease Phylogeny

So what we used to do was …..

P12821|ACE : P47820|ACE : P09470|ACE : Q18581|ACN : : 808 : 814 : 813 : 341

Page 6: Protease Phylogeny

But now we have ……

http://www.ensembl.org/index.html

Page 7: Protease Phylogeny

Human

MouseRat

Fugu

Tetraodon

Zebrafish

C. savignyi *

Fruitfly

Malaria mosquito,

C. elegans

Medaka

Rhesus macaqueChimpanzee

DogCow

Chicken

Xenopus

C. intestinalis

Fever mosquito*

523

41

91

83

310

92

360

450

990 25

70

140

?

550

25070?

1002003004005001000Million years

Honey bee

340

Yeast

Opposum

170

1500?

?

Stickleback

Armadillo *

Elephant *

Tenrec *

105

?

Rabbit *95

?

Chordata

Vertebrata

AmniotaTetrapoda

Mammalia

Eutheria

Teleostei

Urochordata

Arthropoda

Nematoda

Fungi

Aves

Amphibia

Metatheria

Much more data ….

Page 8: Protease Phylogeny

So what can we do with all these sequences …

• Filter conserved features by comparisons at all levels from 5 million to 1 billion years

• Percieve more gradual changes• Root and/or outgroup trees at deeper levels• Follow lineage-specific duplications and losses more clearly• Resolve complex orthology questions• Asses the consequences of SNPs or alternative splicing • Spot horizonal gene transfer events• Look at the evolution of substrates and inhibitors• Spot discordant evolution where protease and substrate diverge• Design new experiments

Page 9: Protease Phylogeny

You can use Ensembl for orthologue

prediction - but its getting

complicated …

Page 10: Protease Phylogeny

So you can use http://www.treefam.org/

• TreeFam is a database of phylogenetic trees of animal genes

• Provides curated resources for ortholog and paralog assignments, and evolutionary history of various gene families

• Includes yeast and plant outgroup genes to reveal these distant members

• Infers orthologs by fiting a gene tree into the universal species tree and finds historical duplications, speciations and losses events

Page 11: Protease Phylogeny

To get pre-cooked

protease trees

Page 12: Protease Phylogeny

And sub-trees with schematic P-fam matches

Page 13: Protease Phylogeny

You can then look at substrate evolution

Page 14: Protease Phylogeny

….and compare the two

Page 15: Protease Phylogeny

Trees can give

clear ancestry

And structural inferences from

Pfam

Page 16: Protease Phylogeny

They can also show divergence

between protease and

substrate

Page 17: Protease Phylogeny

You can also track

protease complex

members

Page 18: Protease Phylogeny

… here’s a new one (see

poster 13)

Page 19: Protease Phylogeny

And tight conservation

of some substrates

Page 20: Protease Phylogeny

You can resolve complex non-orthology

López-Otín et al. TreeView

Page 21: Protease Phylogeny

You may find cases of pre-

cooked mapping of conservation onto structure

http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/

Page 22: Protease Phylogeny

Advantages of Pre-cooked Alignments and Trees

• Instant results• Automaticaly synchromised with genome pipeline updates• Exploits massive global CPU horsepower resources• Standardised• Includes some curation• Good reliability for 1:1 orthologues• Operative bioinformatic skills not necessary• Conectivity with other Web resources including structure • Rapid exploration of large protein families• Can identify where to focus the DIY

Page 23: Protease Phylogeny

Advantages of DIY • Inspecting alignments and building your own trees helps you

understand their interpretation• Good Web toolbox available• Automated default gapping parameters not always best• Global alignments not always suitable for paralogues• Can be informative to de-signal, de-gap, make domain-specific or

non-contiguous comparisons• Can select sub-famly branches out of big trees• Need to look out for “dead” proteases• Can explore twighlight-zone relationships• cDNA/genome assembly/gene prediction errors can be identified

and fixed• Can include new pre-assembly genomic or EST data• Can extend to DNA alignments and ka/ks comparison• Direct projection onto protein structures or models• Different pre-cooked resources don’t always agree• Can explore comparative expression data

Page 24: Protease Phylogeny

Conclusions

• Don’t drown in the genome deluge – mine it !• Pre-cooked Web bioinformatic sources offer increasing

coverage for your favourite proteases• Don’t forget substrates and inhibitors• There are many tools to make DIY phylogenies easier• Not only can you plausibly interpret evolutionary histories but

you can also perform experiments to test your hypotheses• We owe a big debt of gratitute to those who work to provide

genome annotation pipelines and bioinformatic Web resources