structural bioinformatics workshop max shatsky email: [email protected]@post.tau.ac.il...

42
Bioinformatics Workshop •Max Shatsky •Email: [email protected] Workshop home page: http://bioinfo3d.cs.tau.ac.il/Educ ation/Workshop/

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Structural Bioinformatics Workshop

•Max Shatsky

•Email: [email protected]

Workshop home page: http://bioinfo3d.cs.tau.ac.il/Education/Workshop/

Page 2: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Schedule

•Introduction to protein structure.

•Introduction to pattern matching.

•Protein structure alignment (comparison).

•Protein Docking

•GAMB++ library.

Page 3: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Presentation and Design ReviewFinal Project

–Software Engineering–Efficiency of Solution–Working Examples and Test Cases–Documentation–Knowledge of all project aspects

Grade Ingredients

Page 4: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Bioinformatics - Computational Genomics

DNA mapping.

Protein or DNA sequence comparisons.

Exploration of huge textual databases.

In essence one- dimensional methods and intuition.

Page 5: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Structural Bioinformatics - Structural Genomics

Elucidation of the 3D structures of biomolecules.Analysis and comparison of biomolecular structures.Prediction of biomolecular recognition.

Handles three-dimensional (3-D) structures.Geometric Computing. (a methodology shared by Computational Geometry, Computer Vision, Computer Graphics, Pattern Recognition etc.)

Page 6: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Protein Structural Comparison

ApoAmicyanin - 1aaj Pseudoazurin - 1pmy

Page 7: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Algorithmic Solution

About 1 sec. Fischer, Nussinov, Wolfson ~ 1990.

Page 8: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Introduction to Protein Structure

Page 9: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

The central dogma

DNA ---> mRNA ---> Protein

{A,C,G,T} {A,C,G,U} {A,D,..Y} Guanine-Cytosine T->U

Thymine-Adenine

4 letter alphabets 20 letter alphabet

Sequence of nucleic acids seq of amino acids

Page 10: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

When genes are expressed, the genetic information (base sequence) on DNA is first transcribed (copied) to a molecule of messenger RNA in a process similar to DNA replication .

The mRNA molecules then leave the cell nucleus and enter the cytoplasm, where triplets of bases ((codons) forming the genetic code specify the particular amino acids that make up an

individual protein.This process, called translation, is accomplished by ribosomes (cellular components composed

of proteins and another class of RNA) that read the genetic code from the mRNA, and transfer RNAs (tRNAs) that transport amino acids to the ribosomes for attachment to the

growing protein. (From www.ornl.gov/hgmis/publicat/primer/ )

Page 11: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Amino acids and the peptide bond

C – first side chain carbon (except for glycine).

Cα atoms

Page 12: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:
Page 13: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Wire-frame or ribbons display

Page 14: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:
Page 15: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Geometric Representation

3-D Curve{vi}, i=1…n

Page 16: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:
Page 17: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Secondary structure

Page 18: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Hydrogen bonds.

strands and sheets

Page 19: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:
Page 20: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:
Page 21: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

The Holy Grail - Protein Folding

From Sequence to Structure.

Relatively primitive computational folding models have proved to be NP hard even in the 2-D case.

Page 22: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Determination of protein structures

X-ray Crystallography

NMR (Nuclear Magnetic Resonance)

EM (Electron microscopy)

Page 23: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

An NMR result is an ensemble of models

Cystatin (1a67)

Page 24: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

The Protein Data Bank (PDB)

International repository of 3D molecular data.

Contains x-y-z coordinates of all atoms of the molecule and additional data.

http://pdb.tau.ac.ilhttp://www.rcsb.org/pdb/

Page 25: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:
Page 26: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:
Page 27: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Why bother with structureswhen we have sequences?

In evolutionary related proteins structure is much better preserved than sequence.

Structural motifs may predict similar biological function .

Getting insight into protein folding. Recovering the limited (?) number of protein folds.

Page 28: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Applications

Classification of protein databases by structure.

Search of partial and disconnected structural patterns in large databases.

Extracting Structure information is difficult, we want to extract “new” folds.

Page 29: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Applications (continued)

Speed up of drug discovery.

Detection of structural pharmacophores in an ensemble of drugs (similar substructures in drugs acting on a given receptor – pharmacophore).

Comparison and detection of drug receptor active sites (structurally similar receptor cavities could bind similar drugs).

Page 30: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Object Recognition

Page 31: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Model Database

Page 32: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Scene

Page 33: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Recognition

Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.

Page 34: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Protein Alignment = Geometric Pattern Discovery

Page 35: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Protein Alignment

• The superimposition pattern is not known a-priori – pattern discovery .

• The matching recovered can be inexact.

• We are looking not necessarily for thelargest superimposition, since other matchings may have biological meaning.

Page 36: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Geometric Task:

find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points.

Given two configurations of points in the three dimensional space,

T

Page 37: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Geometric Task (continued)

Aspects:

•Object representation (points, vectors, segments)

•Object resemblance (distance function)

•Transformation (translations, rotations, scaling)

-> Optimization technique

Page 38: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Transformations

Translation

Translation and Rotation Rigid Motion (Euclidian Trans.)

Translation, Rotation + Scaling

txx

txUxRx

)( txUsxTx

Page 39: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Inexact Alignment.

Simple case – two closely related proteins with the same number of amino acids.

T

Question: how to measure alignment error?

Page 40: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Superposition - best least squares(RMSD – Root Mean Square Deviation)

Given two sets of 3-D points :P={pi}, Q={qi} , i=1,…,n;

rmsd(P,Q) = √ i|pi - qi |2 /n

Find a 3-D rigid transformation T* such that:

rmsd( T*(P), Q ) = minT √ i|T*pi - qi |2 /n

A closed form solution exists for this task.It can be computed in O(n) time.

Page 41: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Problem statement with RMSD metric.

find the largest alignment, a set of matched elements and transformation, with RMSD less than ε.

(belong to NP, is it in NPC?)

Given two configurations of points in the three dimensional space, and ε threshold

T

Page 42: Structural Bioinformatics Workshop Max Shatsky Email: maxshats@post.tau.ac.ilmaxshats@post.tau.ac.il Workshop home page:

Distance Functions

Two point sets: A={ai} i=1…n

B={bj} j=1…m• Pairwise Correspondence:

(ak1,bt1) (ak2,bt2)… (akN,btN)

(1) Exact Matching: ||aki – bti||=0

(2) RMSD (Root Mean Square Distance)

Sqrt( Σ||aki – bti||2/N) < ε

(3) Bottleneck max ||aki – bti||

• Hausdorff distance: h(A,B)=maxaєA minbєB ||a– b|| H(A,B)=max( h(A,B), h(B,A))