from structure to function janet thornton european bioinformatics institute
Post on 12-Jan-2016
219 Views
Preview:
TRANSCRIPT
From Structure to Function
Janet Thornton
European Bioinformatics Institute
From Structure to Functional Annotation
Mid-West Center forStructural Genomics (MCSG)
University of TorontoAled Edwards
Argonne National LaboratoryAndrzej Joachimiak
Northwestern UniversityWayne Anderson
University of Washington at St LouisDaved Fremont
UT Southwestern Medical CenterZbyszek Otwinowski
University of VirginiaWladek Minor
EBI / University College LondonJanet Thornton, Christine Orengo
60 structures solved to date
ylxR hypothetical cytosolic protein
Hypothetical protein (EC4030_F)
Hypothetical protein (MTH1)
ygbM hypothetical protein (EC1530)
Conserved hypothetical protein (MT777)
cutA protein implicated in Cu homeostasis (TM1056)
Some examples …~30% are ‘hypothetical proteins’
TIM barrel enzymes – 18 different homologous families
>60 different E.C. numbers
EC Wheel of TIM barrelsStructure of TIM barrel:Triose phosphate isomerase
Pairwise sequence identity and conservation of enzyme function (Todd et al 2001)
• Single-domain proteins: >81,000 homologous enzyme / enzyme and enzyme / non-enzyme pairs
0%10%20%30%40%50%60%70%80%90%
100%
0-10
11-2
0
21-3
0
31-4
0
41-5
0
51-6
0
61-7
0
71-8
0
81-9
0
91-1
00
Sequence identity (%)
UnconservedConserved
Fractionalpercentage
From Structure To Biochemical Function
Gene Protein 3D Structure Function
Given a protein structure:• Where is the functional site?• What is the multimeric state of the protein?
– PQS – Hannes Ponstingl (this morning)
• Which ligands bind to the protein?• What is biochemical function?
Automated Structure Comparison
• The most powerful method for assigning function from structure is global or partial 3D structure comparison (e.g. Dali, SSAP; SSM)
• Hidden Markov Models derived from structural domains can often recognise distant relatives from sequence– Christine Orengo (tomorrow)
Aspartate Amino Transferase Superfamily
Aspartate Aminotransfera
se
2,2-Dialkylglycine Decarboxylase
Tyrosine Phenolyase
Ornithine Decarboxylase
Aspartate Amino Transferase Superfamily
Aspartate Aminotransferase
2,2-Dialkylglycine Decarboxylase
Tyrosine Phenolyase
Ornithine Decarboxylase
2.6.1.1
4.1.1.64 4.1.1.17
4.1.99.2
77
76
77
76
73
79
11
106
9
7
7
Aspartate Amino Transferase Family
Aspartate Aminotransferase
2,2-Dialkylglycine Decarboxylase
Tyrosine Phenolyase
Ornithine Decarboxylase
2.6.1.1
4.1.1.64
4.1.1.17
4.1.99.2
all bind Pyridoxal 5’ Phosphate (PLP) co-factor
Number of enzyme functions
0
10
20
30
40
50
60
superfamilies
num
ber
of
enzy
me
fu
nctio
ns
structural data
structural andsequence data
/ hydrolases
type I PLP-dependent enzymes
TIM barrel glycosyl hydrolases
Convergent and Divergent Evolution
• Unrelated proteins can perform the same function (convergent evolution), sometimes using the same mechanism – sometimes using different mechanisms
• Related proteins can perform different functions – divergent evolution
Active site convergence
Trypsin Subtilisin
Alpha/beta hydrolaseTrypsin Subtilisin
Brain platelet activating factor acetylhydrolase
CheB methylesterase
Clp protease
Predicting Binding SiteBinding-site analysis: cutA
Most likely binding site
Surface clefts
Residue conservation
Conserved surface patches
Identifying Binding Site Function Using Motifs
- 3D enzyme active site structural motifs (Craig Porter)
- Catalytic Site Atlas - Identification of catalytic residues (Gail Bartlett, Alex Gutteridge)
- Metal binding sites (Malcolm MacArthur)
- Binding site features (Gareth Stockwell)
- Automatically generated templates of ligand-binding and
- DNA binding motifs (Sue Jones, Hugh Shanahan)
- “Reverse” templates (Roman Laskowski)
JESS – fast template search algorithm (Jonathan Barker)
PINTS - Searches for similar clusters (Aloy, Russell … – EMBL Heidelberg))
Catalytic Site Atlas
Enzyme reports from primary literature information -lactamase Class A– EC: 3.5.2.6– PDB: 1btl– Reaction: -lactam + H2O -amino acid– Active site residues: S70, K73, S130, E166– Plausible mechanism:N
O
OH
N H 2
OH
S e r
L y s
S e r
N H 3 +
O
H
O
N
O
S e r
L y s
S e r
N H 3 +
O
O
NH
O
O
O
OH
H
S e r
L y s
S e r
G l u
OO H
O
OHO
NH
O
H
N H
S e r
L y s
S e r
G l u
3-D templates
•Use 3D templates to describe the active site of the enzyme
–analogous to 1-D sequence motifs such as PROSITE, but in 3-D
•Sequence position independent
•Captures essence of functional site in protein
TEmplate Search and Superposition TESS
• defines a functional site as a sequence-independent set of atoms in 3-D space
• search a new structure for a functional site
• search a database of structures for similar clusters
Wallace et al., 1997
e.g. serine proteinase,catalytic triad
Pepsin
Eukaryotic & Fungal Aspartic Proteinases: all-atom DTG-DTG Template
Aspartic Proteinase - Active Site residues - [DTG]x2
A template of 8 atoms is sufficient to identifyall Aspartic Proteinases
Asp CO2 Gly C
Gly CAsp O
Thr/Ser O
Thr O
Aspartic Proteases: Active Site Template
green= truered=false
Aspartic Protease Template Search
against all PDB
3D Templates to Characterise Functional Sites
Template searches
(189 enzyme active site templates)
(~600 Metal binding site templates)
GARTfaseCholesterol oxidaseIIAglc histidine kinase
Carbamoylsarcosineamidohhydrase
Dihydrofolate reductase Ser-His-Aspcatalytic triad
…
Database of enzyme active site templates189 templates
MCSG structure
BioH – unknown function involved in biotin synthesis in E.coli
An example
Structure: Rossmann fold, hence many structural homologues
Expected to be an enzyme
Sequence contains two Gly-X-Ser-X-Gly motifs typical ofacyltransferases and thioesterases
Ser-His-Asp catalytic triad of the lipases with rmsd=0.28Å
(template cut-off is 1.2Å)
CSA template searchOne very strong hit
Experimentally confirmed by hydrolase assays
Novel carboxylesterase acting on short acyl chain substrates
Templates of Active Sites• Catalytic cluster conserved – Simple template
–e.g. Aspartic Proteinase (DTG)x2
• Order and geometry of catalytic residues varies–Multiple templates e.g. Polymerases
• Same catalytic cluster used in many different enzyme functions – one template identifies multiple active sites in unrelated structures
– eg Asp/His/Ser catalytic triad is well conserved in structure
Instances of convergence Ser-His-Asp triads Cys-His-Asp triads Ribonuclease T1s Malic enzyme and isocitrate dehydrogenase Haloperoxidases Creatinase and carboxypeptidase G2 Glycosidases Class II extradiol-type dioxygenase and class III
extradiol-type dioxygenase Receptor tyrosine phosphatase and low-molecular
weight tyrosine phosphatase Pyridoxal 5' phosphate enzymes
James Torrance
Template databases
• HAND CURATED– Enzyme active sites (PROCAT) – 189 templates
• Currently being extended
– Metal-binding sites – 600 templates
• AUTOMATED– Ligand-binding sites – 10,000 templates
– DNA-binding sites – 800 templates
Another example of convergent evolution: The DNA HTH Binding Motif
1jhg
1hcr 1b9m 1eto
1lmb
1ais
1orc Sue Jones
ProFunc – function from 3D structure
Homologous sequences of known function
Binding site identification and analysis
Homologous structures of known function
Functional sequence motifsQ-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC]
Enzyme active site 3D-templates
HTH-motifs Electrostatics Surface comparison
… etc
DNA-, ligand- binding and “reverse” templates
Residue conservation analysis
Three MCSG Examples(James Watson)
Three examples show the varying levels of information that can be retrieved from structures:
1. Almost full functional information. GOOD
•APC 1040
2. General information. NOT SO GOOD
•APC 012
3. Little or no information obtained. UGLY
•APC 078
Acknowledgements
• Roman Laskowski, James Watson, Richard Morris, Rafael Najmanovich, Fabian Glaser - EBI
• Christine Orengo, Annabel Todd, James Bray, Russell Marsden – University College, London
• MCSG members – Andzrej Jaochimiak, Al Edwards etc
• Funding: NIH - PSI; EU - SPINE; DoE – DNA Motifs; UK BBSRC LINK
top related