11/07/05 databases, visualization - iowa state...
TRANSCRIPT
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 1
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
1
11/7/05
Protein Structure:Classification, Databases,
Visualization
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
2
AnnouncementsBCB 544 Projects - Important Dates:
Nov 2 Wed noon - Project proposals due to David/Drena
Nov 4 Fri PM - Approvals/responses & tentative presentation schedule to students
Dec 2 Fri noon - Written project reports due
Dec 5,7,8,9 class/lab - Oral Presentations (20')
(Dec 15 Thurs = Final Exam)
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
3
Bioinformatics Seminars
Nov 7 Mon 12:10 IG Faculty Seminar in 101 Ind Ed IIInborn Errors of Metabolism in Humans &
Animal ModelsMatt Ellinwood, Animal Science, ISU
Nov 10 Thurs 3:40 Com S Seminar in 223 AtanasoffComputational EpidemiologyArmin R. Mikler, Univ. North Texashttp://www.cs.iastate.edu/~colloq/#t3
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
4
Bioinformatics SeminarsCORRECTION:
Next week - Baker Center/BCB Seminars: (seminar abstracts available at above link)
Nov 14 Mon 1:10 PM Doug Brutlag, StanfordDiscovering transcription factor binding sites
Nov 15 Tues 1:10 PM Ilya Vakser, Univ KansasModeling protein-protein interactions both seminars will be in Howe Hall Auditorium
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
5
Protein Structure & Function:Analysis & Prediction
Mon Protein structure: classification,databases, visualization
Wed Protein structure: prediction & modeling
Thurs Lab Protein structure prediction
Fri Protein-nucleic acid interactions Protein-ligand docking
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
6
Reading Assignment (for Mon-Fri)
Mount Bioinformatics• Chp 10 Protein classification & structure prediction
http://www.bioinformaticsonline.org/ch/ch10/index.html
• pp. 409-491• Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html
Other? Additional reading assignments for BCB 544
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 2
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
7
Review last lecture:
RNA Structure PredictionAlgorithms
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
8
RNA structure prediction strategies
1) Energy minimization(thermodynamics)
2) Comparative sequence analysis(co-variation)
3) Combined experimental & computational
Secondary structure prediction
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
9
1) Energy minimization method
What are the assumptions?
Native tertiary structure or "fold" of an RNAmolecule is (one of) its lowest free energyconfiguration(s)
Gibbs free energy = ΔG in kcal/mol at 37°C= equilibrium stability of structure
lower values (negative) are more favorableIs this assumption valid?
in vivo? - this may not hold, but we don't really know
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
10
Gibbs free energy: ΔGGibbs Free energy (G) is formally defined in terms ofstate functions enthalpy & entropy, & state variable,temperature
G = H - TSΔG = ΔH - TΔS (for constant temp)
Enthalpy (H) = amount of heat absorbed by a system atconstant pressure
Entropy (S) = measure of the amount of disorder or randomness in a system
Note = this is not the same as "entropy" in information theory, but isrelated, see: http://en.wikipedia.org/wiki/Information_theory
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
11
Gibbs free energy: ΔGGibbs free energy for formation of an RNA or protein
structure = ΔG° = equilibrium stability of that
structure at a specific temperature (kcal/mol at 37°C)
ΔG° = -RT lnKeq
R = gas constant
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
12
Nearest-neighbor parameters
Most methods for free energy minimizationuse nearest-neighbor parameters (derivedfrom experiment) for predicting stability of anRNA secondary structure (in terms of ΔG at 37°C)
& most available software packages usethe same set of parameters: Mathews, Sabina, Zuker & Turner, 1999
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 3
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
13
Energy minimization - calculations:
Total free energy of a specificconformation for a specificRNA molecule = sum ofincremental energy terms for:
• helical stacking (sequence dependent)• loop initiation• unpaired stacking
(favorable "increments" are < 0)
Fig 6.3Baxevanis &Ouellette 2005 11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,
Databases, Visualization14
But how many possible conformations for asingle RNA molecule?
Huge number:Zuker estimates (1.8)N possible secondary structures for a sequence of N nucleotides
for 100 nts (small RNA…) =3 X 1025 structures!
Solution? Not exhaustive enumeration… Dynamic programming
O(N3) in timeO(N2) in space/storage
iff pseudoknots excluded, otherwise:O(N6 ), timeO(N4 ), space
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
15
Algorithms based on energy minimization
For outline of algorithm used in Mfold, includingdescription of dynamic programming recursion, pleasevisit Michael Zuker's lecture:http://www.bioinfo.rpi.edu/~zukerm/lectures/RNAfold-html
From this site, you may also download Zuker's lectureas either PDF or PS file.
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
16
2) Comparative sequence analysis(co-variation)
Two basic approaches:
• Algorithms constrained by initial alignmentMuch faster, but not as robust as unconstrained
Base-pairing probabilities determined by a partition function
• Algorithms not constrained by initial alignmentGenetic algorithms often used for finding analignment & set of structures
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
17
RNA structure prediction strategies
Requires "craft" & significant user input & insight1) Extensive comparative sequence analysis to predict
tertiary contacts (co-variation)e.g., MANIP - Westhof
2) Use experimental data to constrain model buildinge.g., MC-CYM - Major
3) Homology modeling using sequence alignment & reference tertiary structure (not many of these!)
4) Low resolution molecular mechanicse.g., yammp - Harvey
Tertiary structure prediction
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
18
New Last Time:
Protein Structure & Function
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 4
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
19
Protein Structure & FunctionProtein structure - primarily determined by sequence
Protein function - primarily determined by structure
• Globular proteins: compact hydrophobic core & hydrophilic surface
• Membrane proteins: special hydrophobic surfaces• Folded proteins are only marginally stable• Some proteins do not assume a stable "fold" until they
bind to something = Intrinsically disordered Predicting protein structure and function can be very
hard -- & fun!
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
20
4 Basic Levels of Protein Structure
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
21
Primary & Secondary Structure
Primary• Linear sequence of amino acids• Description of covalent bonds linking aa’s
Secondary• Local spatial arrangement of amino acids• Description of short-range non-covalent
interactions• Periodic structural patterns: α-helix, β-sheet
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
22
Tertiary & Quaternary Structure
Tertiary• Overall 3-D "fold" of a single polypeptide chain• Spatial arrangement of 2’ structural elements;
packing of these into compact "domains"• Description of long-range non-covalent interactions
(plus disulfide bonds)
Quaternary• In proteins with > 1 polypeptide chain, spatial
arrangement of subunits
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
23
"Additional" Structural Levels
• Super-secondary elements• Motifs• Domains• Foldons
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
24
New Today:
Protein Structure & FunctionAmino acids characteristicsStructural classes & motifsProtein functions & functional families
not much - more on this laterClassificationDatabasesVisualization
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 5
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
25
Amino Acids
Each of 20 different amino acids has different"R-Group," side chain attached to Cα
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
26
Peptide bond is rigid and planar
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
27
Hydrophobic Amino Acids
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
28
Charged Amino Acids
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
29
Polar Amino Acids
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
30
Certain side-chain configurations areenergetically favored (rotamers)
Ramachandran plot:"Allowable" psi & phi angles
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 6
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
31
Glycine is smallest amino acidR group = H atom
• Glycine residues increasebackbone flexibility becausethey have no R group
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
32
Proline is cyclic• Proline residuesreduce flexibility ofpolypeptide chain
• Proline cis-transisomerization is oftena rate-limiting step inprotein folding• Recent worksuggests it also mayalso regulate ligandbinding in nativeproteins -Andreotti
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
33
Cysteines can form disulfide bonds
• Disulfide bonds(covalent) stabilize3-D structures
• In eukaryotes,disulfide bonds arefound only in secretedproteins orextracellular domains
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
34
Globular proteins have a compacthydrophobic core
Packing of hydrophobic side chains into interior is maindriving force for folding
Problem? Polypeptide backbone is highly polar(hydrophilic) due to polar -NH and C=O in eachpeptide unit; these polar groups must be neutralized
Solution? Form regular secondary structures,e.g., α-helix, β-sheet, stabilized by H-bonds
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
35
Exterior surface of globular proteinsis generally hydrophilic
Hydrophobic core formed by packed secondarystructural elements provides compact, stable core
"Functional groups" of protein are attached to thisframework; exterior has more flexible regions(loops) and polar/charged residues
Hydrophobic "patches" on protein surface are ofteninvolved in protein-protein interactions
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
36
Protein Secondary Structures
α−Helixβ− SheetsLoopsCoils
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 7
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
37
α - Helix
Most abundant 2' structure in proteinsAverage length = 10 aa's (~10 Angstroms)
• Length varies from 5-40 aa's• Alignment of H-bonds creates dipole moment
(positive charge at NH end)• Often at surface of core, with hydrophobic residues
on inner-facing side, hydrophilic on other side
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
38
α−helix is stabilized by H-bondsbetween ~ every 4th residue
C = blackO = redN = blue
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
39
R-groups are on outside of α−helix
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
40
Types of α−helices"Standard" α−helix: 3.6 residues per turn
H-bonds between C=0 of residue n and NH of residue n + 4
Helix ends are polar; almost always on surface of protein
Other types of helices?n + 5 = π helixn + 3 = 310 helix
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
41
Certain amino acids are "preferred" &others are rare in α−helices
• Ala, Glu, Leu, Met = good helix formers• Pro, Gly Tyr, Ser = very poor• Amino acid composition & distribution varies, depending
on on location of helix in 3-D structure
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
42
β-Strands & Sheets
H-bonds formed between 5-10 consecutiveresidues in one portion of chain with anotherset of 5-10 residues farther down chain
Interacting regions may be adjacent (with shortloop between) or far apart
β-sheets usually have all strands either parallelor antiparallel
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 8
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
43
Antiparallel β-sheet
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
44
Antiparallel β-sheet
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
45
Parallel β-sheet
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
46
Mixed β-Sheets also occur
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
47
Loops
• Connect helices and sheets• Vary in length and 3-D configurations• Are located on surface of structure• Are more "tolerant" of mutations• Are more flexible and can adopt multiple
conformations• Tend to have charged and polar amino acids• Are frequently components of active sites• Some fall into distinct structural families
(e.g., hairpin loops, reverse turns)
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
48
Coils
• Regions of 2' structure that are nothelices, sheets, or recognizable turns
• Intrinsically disordered regions appear toplay important functional roles
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 9
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
49
Globular proteins are built fromrecurring structural patterns
Motifs or supersecondary structures =combinations of 2' structural elements
Domains = combinations of motifs• Independently folding unit (foldon)• Functional unit
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
50
A few common structural motifs
Helix-turn-helix e.g., DNA binding
Helix-loop-helix e.g., Calcium binding
β-hairpin 2 adjacent antiparallel strandsconnected by short loop
Greek key 4 adjacent antiparallel strands
β−α−β 2 parallel strands connected by helix
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
51
H-T-H H-L-H
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
52
β-hairpin
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
53
Greek key
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
54
Beta-alpha-beta
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 10
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
55
Simple motifs combine to form domains
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
56
Large polypeptide chains fold intoseveral domains
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
57
6 main classes of protein structure1) α Domains
• Bundles of helices connected by loops
2) β Domains• Mainly antiparallel sheets, usually with 2 sheets forming
sandwich
3) α/β Domains• Mainly parallel sheets with intervening helices, also
mixed sheets
4) α+β Domains• Mainly segregated helices and sheets
5) Multidomain (α & β)• Containing domains from more than one class
6) Membrane & cell-surface proteins
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
58
α-domain structures:coiled-coils
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
59
α-domain structures:4-helix bundles
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
60
All-α proteins: Globins
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 11
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
61
β-domain structures
Anti-parallel β structuresFunctionally most diverseIncludes:
• Up-and-down sheets or barrels• Propeller-like structures• Jelly roll barrels (from Greek key motifs)
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
62
Up-and-down sheets and barrel
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
63
Up-and-down sheets can formpropeller-like structures
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
64
Greek key motifs can formjelly roll barrels
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
65
α/β-domain structures
3 main classesTIM barrel = Core of twisted parallel strands close
togetherRossman fold = open twisted sheet surrounded by
helices on both sidesLeucine-rich motif = specific pattern of Leu residues,
strands form a curved sheet with helices on outside
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
66
TIM barrel Rossman fold
Protein Structure: Classification,Databases, Visualization
11/07/05
D Dobbs ISU - BCB 444/544X 12
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
67
Leucine rich motifs can formα/β horseshoes
11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization
68
Protein structure databases, structuralclassification & visualization
PDB = Protein Data Bank http://www.rcsb.org/pdb/
(RISC) - several different structure viewers
MMDB = Molecular Modeling Databasehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure
(NCBI Entrez) - Cn3D viewer
SCOP = Structural Classification of ProteinsLevels reflect both evolutionary and structural relationships
CATH = Classification by Class, Architecture, Topologyand Homology