Download - Protease Phylogeny
Exploiting new Genome Data and Web Resources for the Phylogenetic Analyisis of
Proteases, Substrates and Inhibitors
Christopher SouthanGlobal Compound Sciences, AstraZeneca R&D, Mölndal, Sweden
Introduction
• Multiple alignments and phylogenetic trees are established techniques for examining the evolutionary history of proteases
• New complete genome data gives a broader and deeper phylogentic range
• We have improved and expanded annotation of not only proteases but also their substrates, interaction partners in complexes and inhibitor proteins
• This facilitates the mapping of conserved postions to structure and catalytic and/or binding functions
• Structural coverage across protease classes is increasing
Evolutionary Context of Mapping Proteases to Functions
From: Searls DB. Pharmacophylogenomics: genes, evolution and drug targets. Nat Rev Drug Discov. 2003
Once upon a time: all we had was …
So what we used to do was …..
P12821|ACE : P47820|ACE : P09470|ACE : Q18581|ACN : : 808 : 814 : 813 : 341
But now we have ……
http://www.ensembl.org/index.html
Human
MouseRat
Fugu
Tetraodon
Zebrafish
C. savignyi *
Fruitfly
Malaria mosquito,
C. elegans
Medaka
Rhesus macaqueChimpanzee
DogCow
Chicken
Xenopus
C. intestinalis
Fever mosquito*
523
41
91
83
310
92
360
450
990 25
70
140
?
550
25070?
1002003004005001000Million years
Honey bee
340
Yeast
Opposum
170
1500?
?
Stickleback
Armadillo *
Elephant *
Tenrec *
105
?
Rabbit *95
?
Chordata
Vertebrata
AmniotaTetrapoda
Mammalia
Eutheria
Teleostei
Urochordata
Arthropoda
Nematoda
Fungi
Aves
Amphibia
Metatheria
Much more data ….
So what can we do with all these sequences …
• Filter conserved features by comparisons at all levels from 5 million to 1 billion years
• Percieve more gradual changes• Root and/or outgroup trees at deeper levels• Follow lineage-specific duplications and losses more clearly• Resolve complex orthology questions• Asses the consequences of SNPs or alternative splicing • Spot horizonal gene transfer events• Look at the evolution of substrates and inhibitors• Spot discordant evolution where protease and substrate diverge• Design new experiments
You can use Ensembl for orthologue
prediction - but its getting
complicated …
So you can use http://www.treefam.org/
• TreeFam is a database of phylogenetic trees of animal genes
• Provides curated resources for ortholog and paralog assignments, and evolutionary history of various gene families
• Includes yeast and plant outgroup genes to reveal these distant members
• Infers orthologs by fiting a gene tree into the universal species tree and finds historical duplications, speciations and losses events
To get pre-cooked
protease trees
And sub-trees with schematic P-fam matches
You can then look at substrate evolution
….and compare the two
Trees can give
clear ancestry
And structural inferences from
Pfam
They can also show divergence
between protease and
substrate
You can also track
protease complex
members
… here’s a new one (see
poster 13)
And tight conservation
of some substrates
You can resolve complex non-orthology
López-Otín et al. TreeView
You may find cases of pre-
cooked mapping of conservation onto structure
http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/
Advantages of Pre-cooked Alignments and Trees
• Instant results• Automaticaly synchromised with genome pipeline updates• Exploits massive global CPU horsepower resources• Standardised• Includes some curation• Good reliability for 1:1 orthologues• Operative bioinformatic skills not necessary• Conectivity with other Web resources including structure • Rapid exploration of large protein families• Can identify where to focus the DIY
Advantages of DIY • Inspecting alignments and building your own trees helps you
understand their interpretation• Good Web toolbox available• Automated default gapping parameters not always best• Global alignments not always suitable for paralogues• Can be informative to de-signal, de-gap, make domain-specific or
non-contiguous comparisons• Can select sub-famly branches out of big trees• Need to look out for “dead” proteases• Can explore twighlight-zone relationships• cDNA/genome assembly/gene prediction errors can be identified
and fixed• Can include new pre-assembly genomic or EST data• Can extend to DNA alignments and ka/ks comparison• Direct projection onto protein structures or models• Different pre-cooked resources don’t always agree• Can explore comparative expression data
Conclusions
• Don’t drown in the genome deluge – mine it !• Pre-cooked Web bioinformatic sources offer increasing
coverage for your favourite proteases• Don’t forget substrates and inhibitors• There are many tools to make DIY phylogenies easier• Not only can you plausibly interpret evolutionary histories but
you can also perform experiments to test your hypotheses• We owe a big debt of gratitute to those who work to provide
genome annotation pipelines and bioinformatic Web resources