Download - Protease Phylogeny

Exploiting new Genome Data and Web Resources for the Phylogenetic Analyisis of

Proteases, Substrates and Inhibitors

Christopher SouthanGlobal Compound Sciences, AstraZeneca R&D, Mölndal, Sweden

Introduction

• Multiple alignments and phylogenetic trees are established techniques for examining the evolutionary history of proteases

• New complete genome data gives a broader and deeper phylogentic range

• We have improved and expanded annotation of not only proteases but also their substrates, interaction partners in complexes and inhibitor proteins

• This facilitates the mapping of conserved postions to structure and catalytic and/or binding functions

• Structural coverage across protease classes is increasing

Evolutionary Context of Mapping Proteases to Functions

From: Searls DB. Pharmacophylogenomics: genes, evolution and drug targets. Nat Rev Drug Discov. 2003

Once upon a time: all we had was …

So what we used to do was …..

P12821|ACE : P47820|ACE : P09470|ACE : Q18581|ACN : : 808 : 814 : 813 : 341

But now we have ……

http://www.ensembl.org/index.html

Human

MouseRat

Fugu

Tetraodon

Zebrafish

C. savignyi *

Fruitfly

Malaria mosquito,

C. elegans

Medaka

Rhesus macaqueChimpanzee

DogCow

Chicken

Xenopus

C. intestinalis

Fever mosquito*

523

41

91

83

310

92

360

450

990 25

70

140

?

550

25070?

1002003004005001000Million years

Honey bee

340

Yeast

Opposum

170

1500?

?

Stickleback

Armadillo *

Elephant *

Tenrec *

105

?

Rabbit *95

?

Chordata

Vertebrata

AmniotaTetrapoda

Mammalia

Eutheria

Teleostei

Urochordata

Arthropoda

Nematoda

Fungi

Aves

Amphibia

Metatheria

Much more data ….

So what can we do with all these sequences …

• Filter conserved features by comparisons at all levels from 5 million to 1 billion years

• Percieve more gradual changes• Root and/or outgroup trees at deeper levels• Follow lineage-specific duplications and losses more clearly• Resolve complex orthology questions• Asses the consequences of SNPs or alternative splicing • Spot horizonal gene transfer events• Look at the evolution of substrates and inhibitors• Spot discordant evolution where protease and substrate diverge• Design new experiments

You can use Ensembl for orthologue

prediction - but its getting

complicated …

So you can use http://www.treefam.org/

• TreeFam is a database of phylogenetic trees of animal genes

• Provides curated resources for ortholog and paralog assignments, and evolutionary history of various gene families

• Includes yeast and plant outgroup genes to reveal these distant members

• Infers orthologs by fiting a gene tree into the universal species tree and finds historical duplications, speciations and losses events

To get pre-cooked

protease trees

And sub-trees with schematic P-fam matches

You can then look at substrate evolution

….and compare the two

Trees can give

clear ancestry

And structural inferences from

Pfam

They can also show divergence

between protease and

substrate

You can also track

protease complex

members

… here’s a new one (see

poster 13)

And tight conservation

of some substrates

You can resolve complex non-orthology

López-Otín et al. TreeView

You may find cases of pre-

cooked mapping of conservation onto structure

http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/

Advantages of Pre-cooked Alignments and Trees

• Instant results• Automaticaly synchromised with genome pipeline updates• Exploits massive global CPU horsepower resources• Standardised• Includes some curation• Good reliability for 1:1 orthologues• Operative bioinformatic skills not necessary• Conectivity with other Web resources including structure • Rapid exploration of large protein families• Can identify where to focus the DIY

Advantages of DIY • Inspecting alignments and building your own trees helps you

understand their interpretation• Good Web toolbox available• Automated default gapping parameters not always best• Global alignments not always suitable for paralogues• Can be informative to de-signal, de-gap, make domain-specific or

non-contiguous comparisons• Can select sub-famly branches out of big trees• Need to look out for “dead” proteases• Can explore twighlight-zone relationships• cDNA/genome assembly/gene prediction errors can be identified

and fixed• Can include new pre-assembly genomic or EST data• Can extend to DNA alignments and ka/ks comparison• Direct projection onto protein structures or models• Different pre-cooked resources don’t always agree• Can explore comparative expression data

Conclusions

• Don’t drown in the genome deluge – mine it !• Pre-cooked Web bioinformatic sources offer increasing

coverage for your favourite proteases• Don’t forget substrates and inhibitors• There are many tools to make DIY phylogenies easier• Not only can you plausibly interpret evolutionary histories but

you can also perform experiments to test your hypotheses• We owe a big debt of gratitute to those who work to provide

genome annotation pipelines and bioinformatic Web resources

Download - Protease Phylogeny

Top Related