11/07/05 databases, visualization - iowa state...

12
Protein Structure: Classification, Databases, Visualization 11/07/05 D Dobbs ISU - BCB 444/544X 1 11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization 1 11/7/05 Protein Structure: Classification, Databases, Visualization 11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization 2 Announcements BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri PM - Approvals/responses & tentative presentation schedule to students Dec 2 Fri noon - Written project reports due Dec 5,7,8,9 class/lab - Oral Presentations (20') (Dec 15 Thurs = Final Exam) 11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization 3 Bioinformatics Seminars Nov 7 Mon 12:10 IG Faculty Seminar in 101 Ind Ed II Inborn Errors of Metabolism in Humans & Animal Models Matt Ellinwood, Animal Science, ISU Nov 10 Thurs 3:40 Com S Seminar in 223 Atanasoff Computational Epidemiology Armin R. Mikler, Univ. North Texas http://www.cs.iastate.edu/~colloq/#t3 11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization 4 Bioinformatics Seminars CORRECTION: Next week - Baker Center/BCB Seminars: (seminar abstracts available at above link) Nov 14 Mon 1:10 PM Doug Brutlag, Stanford Discovering transcription factor binding sites Nov 15 Tues 1:10 PM Ilya Vakser, Univ Kansas Modeling protein-protein interactions both seminars will be in Howe Hall Auditorium 11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization 5 Protein Structure & Function: Analysis & Prediction Mon Protein structure: classification, databases, visualization Wed Protein structure: prediction & modeling Thurs Lab Protein structure prediction Fri Protein-nucleic acid interactions Protein-ligand docking 11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification, Databases, Visualization 6 Reading Assignment (for Mon-Fri) Mount Bioinformatics Chp 10 Protein classification & structure prediction http://www.bioinformaticsonline.org/ch/ch10/index.html pp. 409-491 Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html Other? Additional reading assignments for BCB 544

Upload: duongtuyen

Post on 03-May-2018

222 views

Category:

Documents


6 download

TRANSCRIPT

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 1

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

1

11/7/05

Protein Structure:Classification, Databases,

Visualization

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

2

AnnouncementsBCB 544 Projects - Important Dates:

Nov 2 Wed noon - Project proposals due to David/Drena

Nov 4 Fri PM - Approvals/responses & tentative presentation schedule to students

Dec 2 Fri noon - Written project reports due

Dec 5,7,8,9 class/lab - Oral Presentations (20')

(Dec 15 Thurs = Final Exam)

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

3

Bioinformatics Seminars

Nov 7 Mon 12:10 IG Faculty Seminar in 101 Ind Ed IIInborn Errors of Metabolism in Humans &

Animal ModelsMatt Ellinwood, Animal Science, ISU

Nov 10 Thurs 3:40 Com S Seminar in 223 AtanasoffComputational EpidemiologyArmin R. Mikler, Univ. North Texashttp://www.cs.iastate.edu/~colloq/#t3

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

4

Bioinformatics SeminarsCORRECTION:

Next week - Baker Center/BCB Seminars: (seminar abstracts available at above link)

Nov 14 Mon 1:10 PM Doug Brutlag, StanfordDiscovering transcription factor binding sites

Nov 15 Tues 1:10 PM Ilya Vakser, Univ KansasModeling protein-protein interactions both seminars will be in Howe Hall Auditorium

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

5

Protein Structure & Function:Analysis & Prediction

Mon Protein structure: classification,databases, visualization

Wed Protein structure: prediction & modeling

Thurs Lab Protein structure prediction

Fri Protein-nucleic acid interactions Protein-ligand docking

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

6

Reading Assignment (for Mon-Fri)

Mount Bioinformatics• Chp 10 Protein classification & structure prediction

http://www.bioinformaticsonline.org/ch/ch10/index.html

• pp. 409-491• Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html

Other? Additional reading assignments for BCB 544

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 2

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

7

Review last lecture:

RNA Structure PredictionAlgorithms

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

8

RNA structure prediction strategies

1) Energy minimization(thermodynamics)

2) Comparative sequence analysis(co-variation)

3) Combined experimental & computational

Secondary structure prediction

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

9

1) Energy minimization method

What are the assumptions?

Native tertiary structure or "fold" of an RNAmolecule is (one of) its lowest free energyconfiguration(s)

Gibbs free energy = ΔG in kcal/mol at 37°C= equilibrium stability of structure

lower values (negative) are more favorableIs this assumption valid?

in vivo? - this may not hold, but we don't really know

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

10

Gibbs free energy: ΔGGibbs Free energy (G) is formally defined in terms ofstate functions enthalpy & entropy, & state variable,temperature

G = H - TSΔG = ΔH - TΔS (for constant temp)

Enthalpy (H) = amount of heat absorbed by a system atconstant pressure

Entropy (S) = measure of the amount of disorder or randomness in a system

Note = this is not the same as "entropy" in information theory, but isrelated, see: http://en.wikipedia.org/wiki/Information_theory

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

11

Gibbs free energy: ΔGGibbs free energy for formation of an RNA or protein

structure = ΔG° = equilibrium stability of that

structure at a specific temperature (kcal/mol at 37°C)

ΔG° = -RT lnKeq

R = gas constant

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

12

Nearest-neighbor parameters

Most methods for free energy minimizationuse nearest-neighbor parameters (derivedfrom experiment) for predicting stability of anRNA secondary structure (in terms of ΔG at 37°C)

& most available software packages usethe same set of parameters: Mathews, Sabina, Zuker & Turner, 1999

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 3

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

13

Energy minimization - calculations:

Total free energy of a specificconformation for a specificRNA molecule = sum ofincremental energy terms for:

• helical stacking (sequence dependent)• loop initiation• unpaired stacking

(favorable "increments" are < 0)

Fig 6.3Baxevanis &Ouellette 2005 11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,

Databases, Visualization14

But how many possible conformations for asingle RNA molecule?

Huge number:Zuker estimates (1.8)N possible secondary structures for a sequence of N nucleotides

for 100 nts (small RNA…) =3 X 1025 structures!

Solution? Not exhaustive enumeration… Dynamic programming

O(N3) in timeO(N2) in space/storage

iff pseudoknots excluded, otherwise:O(N6 ), timeO(N4 ), space

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

15

Algorithms based on energy minimization

For outline of algorithm used in Mfold, includingdescription of dynamic programming recursion, pleasevisit Michael Zuker's lecture:http://www.bioinfo.rpi.edu/~zukerm/lectures/RNAfold-html

From this site, you may also download Zuker's lectureas either PDF or PS file.

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

16

2) Comparative sequence analysis(co-variation)

Two basic approaches:

• Algorithms constrained by initial alignmentMuch faster, but not as robust as unconstrained

Base-pairing probabilities determined by a partition function

• Algorithms not constrained by initial alignmentGenetic algorithms often used for finding analignment & set of structures

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

17

RNA structure prediction strategies

Requires "craft" & significant user input & insight1) Extensive comparative sequence analysis to predict

tertiary contacts (co-variation)e.g., MANIP - Westhof

2) Use experimental data to constrain model buildinge.g., MC-CYM - Major

3) Homology modeling using sequence alignment & reference tertiary structure (not many of these!)

4) Low resolution molecular mechanicse.g., yammp - Harvey

Tertiary structure prediction

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

18

New Last Time:

Protein Structure & Function

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 4

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

19

Protein Structure & FunctionProtein structure - primarily determined by sequence

Protein function - primarily determined by structure

• Globular proteins: compact hydrophobic core & hydrophilic surface

• Membrane proteins: special hydrophobic surfaces• Folded proteins are only marginally stable• Some proteins do not assume a stable "fold" until they

bind to something = Intrinsically disordered Predicting protein structure and function can be very

hard -- & fun!

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

20

4 Basic Levels of Protein Structure

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

21

Primary & Secondary Structure

Primary• Linear sequence of amino acids• Description of covalent bonds linking aa’s

Secondary• Local spatial arrangement of amino acids• Description of short-range non-covalent

interactions• Periodic structural patterns: α-helix, β-sheet

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

22

Tertiary & Quaternary Structure

Tertiary• Overall 3-D "fold" of a single polypeptide chain• Spatial arrangement of 2’ structural elements;

packing of these into compact "domains"• Description of long-range non-covalent interactions

(plus disulfide bonds)

Quaternary• In proteins with > 1 polypeptide chain, spatial

arrangement of subunits

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

23

"Additional" Structural Levels

• Super-secondary elements• Motifs• Domains• Foldons

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

24

New Today:

Protein Structure & FunctionAmino acids characteristicsStructural classes & motifsProtein functions & functional families

not much - more on this laterClassificationDatabasesVisualization

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 5

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

25

Amino Acids

Each of 20 different amino acids has different"R-Group," side chain attached to Cα

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

26

Peptide bond is rigid and planar

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

27

Hydrophobic Amino Acids

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

28

Charged Amino Acids

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

29

Polar Amino Acids

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

30

Certain side-chain configurations areenergetically favored (rotamers)

Ramachandran plot:"Allowable" psi & phi angles

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 6

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

31

Glycine is smallest amino acidR group = H atom

• Glycine residues increasebackbone flexibility becausethey have no R group

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

32

Proline is cyclic• Proline residuesreduce flexibility ofpolypeptide chain

• Proline cis-transisomerization is oftena rate-limiting step inprotein folding• Recent worksuggests it also mayalso regulate ligandbinding in nativeproteins -Andreotti

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

33

Cysteines can form disulfide bonds

• Disulfide bonds(covalent) stabilize3-D structures

• In eukaryotes,disulfide bonds arefound only in secretedproteins orextracellular domains

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

34

Globular proteins have a compacthydrophobic core

Packing of hydrophobic side chains into interior is maindriving force for folding

Problem? Polypeptide backbone is highly polar(hydrophilic) due to polar -NH and C=O in eachpeptide unit; these polar groups must be neutralized

Solution? Form regular secondary structures,e.g., α-helix, β-sheet, stabilized by H-bonds

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

35

Exterior surface of globular proteinsis generally hydrophilic

Hydrophobic core formed by packed secondarystructural elements provides compact, stable core

"Functional groups" of protein are attached to thisframework; exterior has more flexible regions(loops) and polar/charged residues

Hydrophobic "patches" on protein surface are ofteninvolved in protein-protein interactions

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

36

Protein Secondary Structures

α−Helixβ− SheetsLoopsCoils

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 7

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

37

α - Helix

Most abundant 2' structure in proteinsAverage length = 10 aa's (~10 Angstroms)

• Length varies from 5-40 aa's• Alignment of H-bonds creates dipole moment

(positive charge at NH end)• Often at surface of core, with hydrophobic residues

on inner-facing side, hydrophilic on other side

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

38

α−helix is stabilized by H-bondsbetween ~ every 4th residue

C = blackO = redN = blue

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

39

R-groups are on outside of α−helix

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

40

Types of α−helices"Standard" α−helix: 3.6 residues per turn

H-bonds between C=0 of residue n and NH of residue n + 4

Helix ends are polar; almost always on surface of protein

Other types of helices?n + 5 = π helixn + 3 = 310 helix

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

41

Certain amino acids are "preferred" &others are rare in α−helices

• Ala, Glu, Leu, Met = good helix formers• Pro, Gly Tyr, Ser = very poor• Amino acid composition & distribution varies, depending

on on location of helix in 3-D structure

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

42

β-Strands & Sheets

H-bonds formed between 5-10 consecutiveresidues in one portion of chain with anotherset of 5-10 residues farther down chain

Interacting regions may be adjacent (with shortloop between) or far apart

β-sheets usually have all strands either parallelor antiparallel

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 8

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

43

Antiparallel β-sheet

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

44

Antiparallel β-sheet

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

45

Parallel β-sheet

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

46

Mixed β-Sheets also occur

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

47

Loops

• Connect helices and sheets• Vary in length and 3-D configurations• Are located on surface of structure• Are more "tolerant" of mutations• Are more flexible and can adopt multiple

conformations• Tend to have charged and polar amino acids• Are frequently components of active sites• Some fall into distinct structural families

(e.g., hairpin loops, reverse turns)

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

48

Coils

• Regions of 2' structure that are nothelices, sheets, or recognizable turns

• Intrinsically disordered regions appear toplay important functional roles

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 9

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

49

Globular proteins are built fromrecurring structural patterns

Motifs or supersecondary structures =combinations of 2' structural elements

Domains = combinations of motifs• Independently folding unit (foldon)• Functional unit

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

50

A few common structural motifs

Helix-turn-helix e.g., DNA binding

Helix-loop-helix e.g., Calcium binding

β-hairpin 2 adjacent antiparallel strandsconnected by short loop

Greek key 4 adjacent antiparallel strands

β−α−β 2 parallel strands connected by helix

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

51

H-T-H H-L-H

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

52

β-hairpin

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

53

Greek key

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

54

Beta-alpha-beta

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 10

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

55

Simple motifs combine to form domains

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

56

Large polypeptide chains fold intoseveral domains

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

57

6 main classes of protein structure1) α Domains

• Bundles of helices connected by loops

2) β Domains• Mainly antiparallel sheets, usually with 2 sheets forming

sandwich

3) α/β Domains• Mainly parallel sheets with intervening helices, also

mixed sheets

4) α+β Domains• Mainly segregated helices and sheets

5) Multidomain (α & β)• Containing domains from more than one class

6) Membrane & cell-surface proteins

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

58

α-domain structures:coiled-coils

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

59

α-domain structures:4-helix bundles

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

60

All-α proteins: Globins

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 11

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

61

β-domain structures

Anti-parallel β structuresFunctionally most diverseIncludes:

• Up-and-down sheets or barrels• Propeller-like structures• Jelly roll barrels (from Greek key motifs)

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

62

Up-and-down sheets and barrel

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

63

Up-and-down sheets can formpropeller-like structures

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

64

Greek key motifs can formjelly roll barrels

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

65

α/β-domain structures

3 main classesTIM barrel = Core of twisted parallel strands close

togetherRossman fold = open twisted sheet surrounded by

helices on both sidesLeucine-rich motif = specific pattern of Leu residues,

strands form a curved sheet with helices on outside

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

66

TIM barrel Rossman fold

Protein Structure: Classification,Databases, Visualization

11/07/05

D Dobbs ISU - BCB 444/544X 12

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

67

Leucine rich motifs can formα/β horseshoes

11/07/05 D Dobbs ISU - BCB 444/544X: Protein Structure: Classification,Databases, Visualization

68

Protein structure databases, structuralclassification & visualization

PDB = Protein Data Bank http://www.rcsb.org/pdb/

(RISC) - several different structure viewers

MMDB = Molecular Modeling Databasehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure

(NCBI Entrez) - Cn3D viewer

SCOP = Structural Classification of ProteinsLevels reflect both evolutionary and structural relationships

CATH = Classification by Class, Architecture, Topologyand Homology