další strukturní databáze
TRANSCRIPT
KFC/STBIStrukturní bioinformatika
04_databáze
Karel Berka
htttp://www.rcsb.org/pdb/
Databáze – není jich málo…
Primární strukturní databáze
• PDBe: Protein Data Bank in Europe– doplnění PDB z BMRB (NMR) a EMDB (EM)
• PDBsum :– shromažďuje další informace o struktuře
• PDBwiki: A community annotated knowledge base of biological molecular structures
– wikipedia o PDB strukturách
• NDB: Nucleic Acid Structure Database– databáze Nukleových struktur
• CSD: Cambridge Structural Database– databáze krystalů malých molekul – placená
• MODBASE: Database of Comparative Protein Structure Models– databáze modelů proteinů
Sekundární databáze
• SCOP: Structural Classification of Proteins– hledání strukturních rodin proteinů
• CATH: – hledání strukturních rodin proteinů
• GENE3D:– strukturní genomika
• 3Dee– Database of Protein Domain Definitions
• FSSP: – Based on exhaustive all-against-all 3D structure comparison of
protein structures currently in the Protein Data Bank (PDB)• DALI:
– Fold Classification based on Structure-Structure Assignments
PDBehttp://www.ebi.ac.uk/pdbe/• Souhrnná relační databáze macromolekulárních struktur
Example of an Atlas page, in this case for PDB entr y 1E9F.
Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.
PDBe
Navigačnímenu
sekvence anotovanáz dalších databází
Uniprot
CATH
Pfam
SCOP
Schematic overview of the process by which SIFTS fi les are generated (see text for details).
Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.
SIFTS format
Structure Integration with Function, Taxonomy and Sequence
PDBe – služby
http://www.ebi.ac.uk/pdbe-srv/msdmineSupports ad-hoc queries and data analysis based on the
relational PDBe databasePDBeMine
http://www.ebi.ac.uk/pdbe/olderado/Clustering information for NMR entries in the PDBOLDERADO
http://www.ebi.ac.uk/pdbe-as/PDBeValidateValidation and analysis of PDBe dataPDBeAnalysis
http://www.ebi.ac.uk/pdbe-as/PDBeTemplate/Search of local residue interactions in the PDBPDBeTemplate
http://www.ebi.ac.uk/msd-srv/ssm/Secondary Structure Matching (SSM) service for
comparing protein structures in 3DPDBeFold
http://www.ebi.ac.uk/msd-srv/prot_int/pistart.htmlSearch and analysis of Protein Interfaces, Surfaces
and AssembliesPDBePISA
http://www.ebi.ac.uk/pdbe-site/PDBeMotif/Query and analysis of structure, sequence motifs and
interactionsPDBeMotif
http://www.ebi.ac.uk/msd-srv/chempdbLigand search using the PDB reference dictionaryPDBeChem
http://www.ebi.ac.uk/pdbe-srv/emsearchSearch system for the EM DatabaseEMsearch
http://www.ebi.ac.uk/pdbe-srv/pdbeliteSearch system based on the relational PDBe databasePDBeLite
http://www.ebi.ac.uk/pdbe-srv/viewText-based and advanced PDB search toolPDBeView
http://www.ebi.ac.uk/pdbe-as/PDBeMapQuick/Quick access to cross-reference information to external
databases based on PDB IDPDBeMapQuick
http://www.ebi.ac.uk/pdbe-as/pdbStatusSearch system to query the status of PDB entriesPDBeStatus
http://www.ebi.ac.uk/pdbe/docs/biobar.htmlSearch system implemented as a toolbar application
for Mozilla browsersBIObar
A toolbar search application for Mozilla/Netscape or firefox browsers
http://biobar.mozdev.org/
Simple and quick retrieval of data from PDBe and 45 other Databases
Biobar
PDBeChem• „Ligandy” v PDB• Vázané molekuly (např. cukry,
lipidy, inhibitory, koenzymy and kofaktory)
• Unikátní 3 písmenný kód– atom, element type, connectivity,
bond orders, stereochemicalconfiguration
• Hledání dle– By ligand code
– By ligand name
– By formula– By non-stereo SMILE
– By stereo SMILE
– By exact stereo structure– By fingerprint similarity
– By fragment expression
Example of a graphically defined query that can be submitted to
PDBeMotif.
Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.
PDBeMotif• Hledání dle
a) Ligands and their 3D environment
b) protein families (SCOP, CATH, UNIPROT, EC-number)
c) protein secondary structures and different 3D motifs (PROSITE, beta turn, catalytic sites etc.)
d) protein Φ/Ψ angle sequences
• Výsledky:
a) Sequence multiple alignment
b) 3D multiple alignment of fragments, motifs and protein chains.
c) Interactions statistics
d) Motifs characteristics and properties distribution charts.
• Define search by ligand
• Define search by sequence motif (pattern)
• Define search by metal site geometry
• Define search by environment
• has same environment
• has similar environment
PDBe-site page
• Compare ligand environments.
• Analyze interactions between ligand and protein.
• Compare binding environment.
• Look for ligands within a certain environment.
• Superpose binding sites and ligands.• Predict what could bind that empty
pocket in your structure
What assembly can my structure have ?
PDBePisa
• PQS – protein quarternary structure
• velmi obtížné získat predikcí –krystalografie a EM
The new EMViewer 3D visualization Java applet is av ailable on the EMDB Atlas pages and allows interactive generation of isosurface represe ntations.
Velankar S et al. Nucl. Acids Res. 2010;38:D308-D31 7© The Author(s) 2009. Published by Oxford University Press.
EMviewer
PDBsum
Schematic diagrams from the PDBsum ‘Protein page’ fo r entry 1a5z: lactate dehydrogenase from Thermatoga maritima (16).
Laskowski R A Nucl. Acids Res. 2009;37:D355-D359© 2008 The Author(s)
PDBSum
• Snaha mít všechny informace na jednom místě
• Dodatečné analýzy– schéma sekundárních
struktur– Ligplot
Extracts from the protein–protein interaction diagr ams in PDBsum for PDB entry 1mmo, a non-haem iron hydroxylase from Methylococcus capsul atus (17).
Laskowski R A Nucl. Acids Res. 2009;37:D355-D359© 2008 The Author(s)
PDBSum interfaces
NDB
NDB
• DNA• RNA
NDB3D struktura 2D struktura
RNAview
CSD
• The Cambridge Structural Database
• www.ccdc.cam.ac.uk• malé látky
• placená + pro výukové účely otevřený set 500 látek
600050730ProteinsPDB
5003555Nucleic AcidsNDB
40000488057Organics, Metal-OrganicsCSD
9000100200Inorganics & MineralsICSD
9000119600Metals, alloys, inorganicsCRYSTMET
za rokTotal (2009)co?DB
CSD - komponenty
WebCSD
Mercury• Mercury visualiser
– Crystal structure visualisation program by CCDC
• Free• Teaching subset embedded
A zpátky k proteinům...
Klasifikace struktur proteinů
Class:similar contents of secondary structures
Architecture (Fold):structural similarity
Superclass (Topology):probably same ancestor
• SCOP, CATH, FSSP, 3Dee
SCOP
• Structural Classification of Proteins• manual classification of protein structural domains based on
similarities of their amino acid sequences and three-dimensional structures.
• SCOP utilizes four levels of hierarchic structural classification:– class - general "structural architecture" of the domain– fold - similar arrangement of regular secondary structures but without
evidence of evolutionary relatedness– superfamily - sufficient structural and functional similarity to infer a
divergent evolutionary relationship but not necessarily detectablesequence homology
– family - some sequence similarity can be detected.
Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995). SCOP: a structural classification of proteins database for theinvestigation of sequences and structures. J. Mol. Biol. 247, 536-540.
CATH• manually-curated hierarchical
classification of protein domainstructures.
• více automatizované, než SCOP • Class
– secondary structure content• (mainly-alpha, mainly-beta,
mixed alpha/beta or 'fewsecondary structures');
• Architecture– general arrangement of the
secondary structuresirrespective of connectivitybetween them
• (e.g. alpha/beta sandwich);
• Topology (Fold)– connectivity of secondary
structures in the chain;• Homologous Superfamily
– domains that are believed to berelated by a commonancestor .
• S-levels– automated clustering based on
sequence identity.
CATH
GENE3D
• Gene3D – large collection of CATH protein domain
assignments for ENSEMBL genomes andUniprot sequences
– functional information, as well as taxonomicdistributions, multi-domain architectures andprotein-protein interaction (PPI) data.
FSSP - fold classificationwww2.embl-
ebi.ac.uk/dali/fssp/
structurallysuperimposedproteins by (DALI)
"Distance-matrix ALIgnment"
3Dee – domény
http://www.compbio.dundee.ac.uk/3Dee/Hierarchie jednotlivých domén
klastrování dle strukturní podobnosti
Dengler, U., Siddiqui, A. S. & Barton, G. J. (2001). Protein structural domains: Analysis of the 3Dee domains database. Proteins 42 , 332-344. Siddiqui, A. S., Dengler, U. & Barton, G. J. (2001). 3Dee: A database of protein structural domains. Bioinformatics 17, 200-201.
Databáze, na které se nedostalo...
• Relibase– protein-ligand interactions
• Modbase, SWISSModel repository, MMDB– databáze modelů
• MolMovdb– Macromolecular Motions database
• A spousta dalších většinou specifických pro daný problém– např. jen pro cytochromy P450
• CYPED, SuperCyp, Cytochrome P450 Homepage, Fungal CYP database, CYPallelles, Arabidopsis Cytochrome P450s, Cytochrome P450 Drug Interactions Table, a další.
• Pak nezbývá, než použít Google. :o)