3d visualization of drugs-protein complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g....

26
3D Visualization of Drugs-Protein Complexes Goal: Develop better understanding of Protein Database and its entries Plan Introductory information about protein structure database Learn Molsoft-browser for molecular visualization (and/or iMolView for portable devices) Learn how to find a drug-protein complex in the database Molecular visualization exercises Contact: Ruben Abagyan : email ruben @ ucsd.edu Website: http://xablab.ucsd.edu

Upload: others

Post on 31-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

3D Visualization of Drugs-Protein Complexes

•  Goal: Develop better understanding of Protein Database and its entries

•  Plan –  Introductory information about protein structure database –  Learn Molsoft-browser for molecular visualization (and/or iMolView for

portable devices) –  Learn how to find a drug-protein complex in the database –  Molecular visualization exercises

•  Contact: Ruben Abagyan : email ruben @ ucsd.edu Website: http://xablab.ucsd.edu

Page 2: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Molecular Crystals and X-ray diffraction

An X-ray picture (radiograph), taken by Wilhelm Röntgen in 1896, of Albert von Kölliker's hand.

Wilhelm Röntgen

Wavelength of X-rays is around 1 Angstrom

Page 3: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

From X-ray diffraction spots to 3D structure

Direct methods Anomalous diffraction (MAD) molecular replacement

•  Each spot needs: 3 coordinates, h,k,l, intensity and phase, Fhkl, Ahkl

•  To get the electron density map the Fourier transformation is used

•  We need spot intensities and phases, but we only have intensities. Yet another limitation of quality

Page 4: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Drug Targets in PDB

•  Uniprot contains ~20,250 human proteins, with the mean length of 550 and median of 400 amino acids

Page 5: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

PDB FILE FORMAT

•  HEADER EXTRACELLULAR MATRIX 22-JAN-98 1A3I•  TITLE X-RAY CRYSTALLOGRAPHIC DETERMINATION OF A COLLAGEN-LIKE•  TITLE 2 PEPTIDE WITH THE REPEATING SEQUENCE (PRO-PRO-GLY)•  ...•  EXPDTA X-RAY DIFFRACTION•  AUTHOR R.Z.KRAMER,L.VITAGLIANO,J.BELLA,R.BERISIO,L.MAZZARELLA,•  AUTHOR 2 B.BRODSKY,A.ZAGARI,H.M.BERMAN•  ...•  REMARK 350 BIOMOLECULE: 1•  REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B, C•  REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000•  REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000•  ...•  SEQRES 1 A 9 PRO PRO GLY PRO PRO GLY PRO PRO GLY•  SEQRES 1 B 6 PRO PRO GLY PRO PRO GLY•  SEQRES 1 C 6 PRO PRO GLY PRO PRO GLY•  ...•  ATOM 1 N PRO A 1 8.316 21.206 21.530 1.00 17.44 N•  ATOM 2 CA PRO A 1 7.608 20.729 20.336 1.00 17.44 C•  ATOM 3 C PRO A 1 8.487 20.707 19.092 1.00 17.44 C•  ATOM 4 O PRO A 1 9.466 21.457 19.005 1.00 17.44 O•  ATOM 5 CB PRO A 1 6.460 21.723 20.211 1.00 22.26 C•  ...•  HETATM 130 C ACY 401 3.682 22.541 11.236 1.00 21.19 C•  HETATM 131 O ACY 401 2.807 23.097 10.553 1.00 21.19 O•  HETATM 132 OXT ACY 401 4.306 23.101 12.291 1.00 21.19 O•  …

•  Each atom has X, Y, Z, O, B

Bfactor, - how “smeared” the electron density is Occupancy

PDB Lingo: •  ATOM •  HETATM (het) •  SEQRES

Page 6: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Difficult cases for crystallography

•  Membrane proteins: only ten GPCRs out of a thousand human ones

•  Fibrils (tubulin, miosin, actin filaments, amyloid)

•  Large particles: ribosomes •  Flexible multi-domain proteins

Page 7: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Unit Cells and Translations •  Crystals are regular periodic

arrays •  Unit cell is the smallest

volume from which the entire crystal can be constructed by translation only

•  Each unit cell contains one or several Asymmetric Units related by crystallographic symmetry (e.g. a mirror plane or a 2-fold rotation axis)

a

b

Unit Cell

Page 8: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Unit Cell and Asymmetric Unit

•  One unit cell may contain elements of crystallographic symmetry

Page 9: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Asymmetric Unit

Asymmetric Unit is the smallest volume from which the unit cell can be constructed by application of the crystallographic symmetry.

Page 10: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Asymmetric unit is not unique

•  Two steps: – AU is multiplied by

the symmetry elements of the cell (unique for each of 230 space groups)

– Result is translated in 3D

Page 11: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Unit Cell and Asymmetric Unit •  The Unit cell here is one

card since by parallel TRANSLATION only the space can be filled

•  The Asymmetric Unit is half a card, related to another half by ROTATION.

•  There are many ways to define that half however. In protein crystals AU does not break proteins

Better choice of AU Legal, but poor choice of AU

Page 12: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Non-crystallographic symmetry (NCS)

•  An asymmetric unit may still contain several identical molecules or groups of molecules related by LOCAL symmetry

•  Example: a crystal of viral particles, each virus consists of 60 groups of envelope proteins related by local NCS

Page 13: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Space Groups •  17 groups in 2D •  2D group example: p2

•  230 groups in 3D •  The most frequent:

–  P21 21 21 –  P 1 21 1 –  C 1 2 1 –  P 21 21 2 –  C 2 2 21 –  P 32 2 1 –  P1 –  P21

Page 14: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Symmetry Group: P212121

•  Many biomolecules crystallize in P21 21 21 group

•  It has 4 asymmetric units in a unit cell

Page 15: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Reconstructing the biological structure from the asymmetric unit

•  Biological protein a part of a biological assembly (oligomer, homo- or hetero-, a complex,..)

•  A structural domain of interest

•  1,2,.. domains may form an asymmetric unit via NCS

•  Several asymmetric units form a unit cell

•  Unit cells fill space via translation to form a crystal

If you are lucky, your space group will be P1 (no internal symmetry),

and the asymmetric unit will consist of only your protein

Page 16: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Unit Cell Example

•  Transthyretin – binds drugs, transports thyroxine (T4). 2flm (1.65A)

•  TTR Amyloid: Familial amyloid polyneuropathy (or cardiomyopathy)

•  Space group P21212

Page 17: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

In P21212 there are four dimers in unit cell

•  The biological unit is a HOMOTETRAMER

Page 18: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

From electron density of a protein drug complex to a full atom model

Ambiguities and gaps •  Missing/unclear ligand density •  Missing/amb. loop or side chains •  Rotations of Asn, Gln •  Crystallographic and non-crystallographic

symmetry, bio-molecule, water, UNLs Fantasy Heavy Atoms •  Wrong atoms with full occupancy and low B-

factors Protonation and Tautomerization •  ε and δ Histidines, His rotations, and ligand

tautomers •  protons in His, Asp, Glu,Arg, Lys, Cys •  Protonation and tautomerization of the ligand

2gb3

2o5r

UNLs (unrecognized ligands)

Page 19: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

iMolView on Mobiles

Page 20: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

iMolView on Google/Android

Page 21: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

ICM-Graphics: Mouse Controls

•  Center of the window: –  left mouse button = rotate –  middle button = translate –  right button = menu

•  Left margin: left mouse button = zoom in/out •  Top margin: left mouse button = Z-rotation •  Bottom margin: translate

–  With slides: two triangular arrows = switch to next/previous

•  Right Margin: –  Bottom: front clipping plane –  Top: back clipping plane

Page 22: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Exercise: Finding a PDB

•  By 4-symbol code – Every PDB file has a 4-symbol code starting

from a digit, e.g. 1crn – Try to find code of interest online – Type the code in ICM Search panel

•  By text (eg drug name): – Type ‘Aspirin’ into the PDB/Search field – Double click on a result row

Page 23: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Exercise: Playing with Graphics •  Search ‘Aspirin’ via ICM-PDB-search box •  Choose 1pth

–  Find Aspirin moiety in Workspace –  Understand all molecules in the structure –  Select chain ‘B’ and delete it –  Center on the Salicylic Acid –  Find ‘Sites’ under chain ‘a’ –  Select Ser 530 by double clicking on the site and

display modified Serine in CPK •  Make a picture

Page 24: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

Exercise: Slides and Prepared .icb

•  Go to the http://xablab.ucsd.edu site •  Download prepared Aspirin.icb file •  Flip through ‘slides’ using arrows at the bottom

margin of the graphics-window •  Make a picture

Page 25: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

MolSoft ICM Reference ICM Selections When a 3D structure of a protein, DNA or small molecule is loaded from PDB or other file format into ICM shell, it exists in a data structure that we term as molecular 'object'. Multiple objects may co-exist in ICM. Each object consists of one or more molecules, each molecule of one or more residues and each residue of one or more atoms. Complete atom-level selection constant contains four fields: a_objects.molecules/residues/atoms Example: read pdb “1stp” as_graph = a_1stp.a/15/ca The above selects atom ca in residue number 15 of chain 'a' of object '1stp‘ and puts it into a variable called as_graph. This is a graphical selection in the graphics window you will see green crosses on the selected atoms. Selection examples: a_1stp.a/15 # residue number 15 a_1stp.a # molecule chain a a_1stp. # object a_ICM. # ICM-type objects a_.A # polypeptide chains in all objects a_1.W # water molecules of the first object a_1.!W # excludes water molecules from 1st object a_1.* # everything in first object Current Object selections are useful because most operations are performed on the last loaded object a_ #current object, use: set object a_<n>. to reset a_H # heteroatom molecules (ligands, ions etc) in curent object a_/20:30 # residues from 20 to 30 a_/"VTA" # consecutive residues matching sequence 'VTA', i.e. val-thr-ala a_/lys,arg,glu,asp # all lysines,arginines,glutamic and aspartic acids a_/SH # residues in Sec.structure Helices a_//c,ca,n # backbone (C,Ca,N atoms) a_//c* # all carbons

Basic ICM Commands An instruction one can execute in the ICM-shell interactively or from an ICM script file. Typically a command consists of a verb (like read or delete ) and a bunch of arguments. The word order in the argument list is not important, if arguments have different types. If two or more argument of the same type are present the order becomes important. Example verbs: display,undisplay,color,center,  delete, connect Example nouns: wire,cpk,ball,stick,xstick, Surface,skin,ribbon,label,residue,atom Example: read pdb "1crn" display ribbon color ribbon a_/4:8 blue display xstick a_1.1/10 green center display residue label a_/6:12 display string "Crambin" 36 red Write Image write image window=2*View(window) Display Hydrogen Bonds display hbond a_1.* # display all hbonds in object one Superimpose Protein Structures superimpose a_1.1 a_2.1 align minimize

ICM Functions ICM functions have the following general format FunctionName(arg1, arg2 ...). They can return a variety of types. The order of the function arguments is fiexible Example: avB=Min( Bfactor(a_//ca) ) show Bfactor(a_//!h*) color a_//* Bfactor(a_//*) color ribbon a_/A Bfactor(a_/A)

ICM Macros A Macro is a group of ICM commands in a separate named function with arguments. Convert PDB to ICM Object (convertObject) To display hydrogen bonds display electrostatic or property surfaces then you need to convert a PDB file into an ICM object using the convert command [more]. read pdb "1abe" convertObject a_1,2 yes yes yes no show r_residualRmsd _macro file. A collection of ICM macros. This file contains a set of ICM macros. You can use them, modify them, or browse them to develop your own macros. _macro is downloaded by the call _macro command. ICM-shell Objects The ICM-shell can handle many different types such as integer , (e.g. a=10, b= -3 ) real , (e.g. c = -3.14 ) string , text in single or double quotes (e.g. d = ”Hi, guys" ) logical , (e.g. e = (2 > 43); f = yes ) preference , (i.e. fixed multiple choices, try show wireStyle ) iarray , (i.e. integer arrays, g={-2,3,-1} ) rarray , (i.e. real arrays, h={ -2.3, 3.12, -1.} ) sarray , (i.e. string arrays, i={"mek","yerku","erek"} ) parray, including array of 0D,2D or 3D chemicals, e.g. chm = Chemical({"CC","CC(=O)O","C1CC1"}) matrix , (read from a disk file, e.g. read matrix "def.mat" ) sequence , (i.e. amino acid or nucleotide sequences, e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read from a file) grob , mesh, surface of shape table , or spreadsheet. Several arrays of the same size are grouped in a table. Table can also have a header with some additional data fields. Tables are essentially simple databases which can be manipulated with, sorted and searched with ICM commands.

Page 26: 3D Visualization of Drugs-Protein Complexesxablab.ucsd.edu/15/15_ochem_icm_training.pdf · e.g. a=Sequence("ASDQWE") alignment , (i.e. pairwise or multiple sequence alignments, read

iPhone &iPad