![Page 1: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/1.jpg)
10/19/2015 BCHB524 - 2015
Protein StructureInformatics
using Bio.PDB
BCHB5242015
Lecture 12
By Edwards & Li
![Page 2: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/2.jpg)
10/19/2015 BCHB524 - 2015
Outline
Review Python modules Biopython Sequence modules
Biopython’s Bio.PDB Protein structure primer / PyMOL PDB file parsing PDB data navigation: SMCRA Examples
![Page 3: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/3.jpg)
10/19/2015 BCHB524 - 2015
Python Modules Review
Access the program environment sys, os, os.path
Specialized functions math, random
Access file-like resources as files: zipfile, gzip, urllib
Make specialized formats into “lists” and “dictionaries” csv (, XML, …)
![Page 4: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/4.jpg)
10/19/2015 BCHB524 - 2015
BioPython Sequence Modules
Provide “sequence” abstraction More powerful than a python string
Knows its alphabet! Basic tasks already available
Easy parsing of (many) downloadable sequence database formats FASTA, Genbank, SwissProt/UniProt, etc… Simplify access to large collections of sequence Access by iteration, get sequence and accession Other content available as lists and dictionaries. Little semantic extraction or interpretation
![Page 5: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/5.jpg)
Biopython Bio.SeqIO
Access to additional information annotations dictionary features list
Information, keys, and keywords vary with database! Semantic content extraction (still) up to you!
10/19/2015 BCHB524 - 2015
import Bio.SeqIOimport sysseqfile = open(sys.argv[1])for seq_record in Bio.SeqIO.parse(seqfile, "uniprot-xml"): print "\n------NEW SEQRECORD------\n" print "seq_record.annotations\n\t",seq_record.annotations print "seq_record.features\n\t",seq_record.features print "seq_record.dbxrefs\n\t",seq_record.dbxrefs print "seq_record.format('fasta')\n",seq_record.format('fasta')seqfile.close()
![Page 6: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/6.jpg)
10/19/2015 BCHB524 - 2015
Proteins are…
…a linear sequence of amino-acids, after transcription from DNA, and translation from mRNA.
![Page 7: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/7.jpg)
10/19/2015 BCHB524 - 2015
Proteins are…
…3-D molecules that interact with other (biological) molecules to carry out biological functions…
DNA Polymerase Hemoglobin
![Page 8: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/8.jpg)
Proteins are…
www.uniprot.org
Primary Secondary Tertiary Quaternary
Introduction to Protein Structure, by Carl Branden and John Tooze
![Page 9: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/9.jpg)
Proteins are Composed of 20 Amino Acids
![Page 10: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/10.jpg)
Representations of Protein 3D Structures
Ball-Stick Ribbon Surface
Ball-Stick: Atom-Bond
Helix, Sheet and Random Coil
Solvent Accessible Surface
![Page 11: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/11.jpg)
10/19/2015 BCHB524 - 2015
Protein Data Bank (PDB)
Repository of the 3-D conformation(s) / structure of proteins.
The result of laborious and expensive experiments using X-ray crystallography and/or nuclear magnetic resonance (NMR). (x,y,z) position of every atom of every amino-acid
Some entries contain multi-protein complexes, small-molecule ligands, docked epitopes and antibody-antigen complexes…
![Page 12: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/12.jpg)
10/19/2015 BCHB524 - 2015
Visualization (PyMOL)
![Page 13: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/13.jpg)
Molecular Visualization Tools PyMol: http://pymol.sourceforge.net/ Chimera: http://www.cgl.ucsf.edu/chimera/ PMV: http://www.scripps.edu/~sanner/python/ Coot: http://www.ysbl.york.ac.uk/~emsley/coot/ CCP4mg: http://www.ysbl.york.ac.uk/~lizp/molgraphics.html mmLib: http://pymmlib.sourceforge.net/ VMD: http://www.ks.uiuc.edu/Research/vmd/ MMTK: http://starship.python.net/crew/hinsen/MMTK/
http://biopython.org/
![Page 14: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/14.jpg)
Overall Layout of a Structure Object
A structure consists of models
A model consists of chains
A chain consists of residues
A residue consists of atoms
http://biopython.org/
![Page 15: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/15.jpg)
10/19/2015 BCHB524 - 2015
Biopython Bio.PDB
Parser for PDB format files Navigate structure and answer atom-atom
distance/angle questions.
Structure (PDB File) >> Model >> Chain >> Residue >> Atom >> (x,y,z) coordinates SMCRA representation mirrors PDB format
![Page 16: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/16.jpg)
10/19/2015 BCHB524 - 2015
SMCRA Data-Model
Each PDB file represents one “structure” Each structure may contain many models
In most cases there is only one model, index 0. Each polypeptide (amino-acid sequence) is a
“chain”. A single-protein structure has one chain, “A” 1HPV is a dimer and has chains “A” and “B”.
![Page 17: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/17.jpg)
10/19/2015 BCHB524 - 2015
SMCRA Data-Model
#eg1.pyimport Bio.PDB.PDBParserimport sys
# Use QUIET=True to avoid lots of warnings...parser = Bio.PDB.PDBParser(QUIET=True)structure = parser.get_structure("1HPV", "1HPV.pdb")model = structure[0]
# This structure is a dimer with two chainsachain = model['A']bchain = model['B']
![Page 18: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/18.jpg)
10/19/2015 BCHB524 - 2015
SMCRA
Chains are composed of amino-acid residues Access by iteration, or by index Residue “index” may not be sequence position
Residues are composed of atoms: Access by iteration or by atom name …except for H!
Water molecules are also represented as atoms – HOH residue name, het=“W”
![Page 19: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/19.jpg)
10/19/2015 BCHB524 - 2015
SMCRA Data-Model
#eg2.pyimport Bio.PDB.PDBParserimport sys
# Use QUIET=True to avoid lots of warnings...parser = Bio.PDB.PDBParser(QUIET=True)structure = parser.get_structure("1HPV", "1HPV.pdb")model = structure[0]
for chain in model: for residue in chain: for atom in residue: print chain, residue, atom, atom.get_coord()
![Page 20: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/20.jpg)
10/19/2015 BCHB524 - 2015
Polypeptide molecules
S-G-Y-A-L
![Page 21: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/21.jpg)
10/19/2015 BCHB524 - 2015
SMCRA Atom names
![Page 22: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/22.jpg)
10/19/2015 BCHB524 - 2015
Check polypeptide backbone#eg3.pyimport Bio.PDB.PDBParserimport sys# Use QUIET=True to avoid lots of warnings...parser = Bio.PDB.PDBParser(QUIET=True)structure = parser.get_structure("1HPV", "1HPV.pdb")model = structure[0]achain = model['A']
for residue in achain: index = residue.get_id()[1] calpha = residue['CA'] carbon = residue['C'] nitrogen = residue['N'] oxygen = residue['O']
print "Residue:",residue.get_resname(),index print "N - Ca",(nitrogen - calpha) print "Ca - C ",(calpha - carbon) print "C - O ",(carbon - oxygen) print break
![Page 23: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/23.jpg)
10/19/2015 BCHB524 - 2015
Check polypeptide backbone#eg4.py# As before...
for residue in achain: index = residue.get_id()[1] calpha = residue['CA'] carbon = residue['C'] nitrogen = residue['N'] oxygen = residue['O']
print "Residue:",residue.get_resname(),index print "N - Ca",(nitrogen - calpha) print "Ca - C ",(calpha - carbon) print "C - O ",(carbon - oxygen)
if achain.has_id(index+1): nextresidue = achain[index+1] nextnitrogen = nextresidue['N'] print "C - N ",(carbon - nextnitrogen)
![Page 24: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/24.jpg)
10/19/2015 BCHB524 - 2015
Find potential disulfide bonds
The sulfur atoms of Cys amino-acids often form “di-sulfide” bonds if they are close enough – less than 8 Å.
Compare with PDB file contents: SSBOND
Bio.PDB does not provide an easy way to access the SSBOND annotations
![Page 25: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/25.jpg)
10/19/2015 BCHB524 - 2015
Find potential disulfide bonds#eg5.pyimport Bio.PDB.PDBParserimport sys
# Use QUIET=True to avoid lots of warnings...parser = Bio.PDB.PDBParser(QUIET=True)structure = parser.get_structure("1KCW", "1KCW.pdb")model = structure[0]achain = model['A']
cysresidues = []for residue in achain: if residue.get_resname() == 'CYS': cysresidues.append(residue)
for c1 in cysresidues: c1index = c1.get_id()[1] for c2 in cysresidues: c2index = c2.get_id()[1] if (c1['SG'] - c2['SG']) < 8.0: print "possible di-sulfide bond:", print "Cys",c1index,"-", print "Cys",c2index, print round(c1['SG'] - c2['SG'],2)
![Page 26: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/26.jpg)
10/19/2015 BCHB524 - 2015
Find contact residues in a dimer#eg6.pyimport Bio.PDB.PDBParserimport sys
# Use QUIET=True to avoid lots of warnings...parser = Bio.PDB.PDBParser(QUIET=True)structure = parser.get_structure("1HPV","1HPV.pdb")
achain = structure[0]['A']bchain = structure[0]['B']
for res1 in achain: if res1.get_id()[0][0] is 'H' or res1.get_id()[0][0] is 'W': continue r1ca = res1['CA'] r1ind = res1.get_id()[1] r1sym = res1.get_resname() for res2 in bchain: if res2.get_id()[0][0] is 'H' or res2.get_id()[0][0] is 'W': continue r2ca = res2['CA'] r2ind = res2.get_id()[1] r2sym = res2.get_resname() if (r1ca - r2ca) < 6.0: print "Residues",r1sym,r1ind,"in chain A", print "and",r2sym,r2ind,"in chain B", print "are close to each other:",round(r1ca-r2ca,2)
![Page 27: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/27.jpg)
10/19/2015 BCHB524 - 2015
Find contact residues in a dimer – better version#eg7.pyimport Bio.PDB.PDBParserimport sys
# Use QUIET=True to avoid lots of warnings...parser = Bio.PDB.PDBParser(QUIET=True)structure = parser.get_structure("1HPV","1HPV.pdb")
achain = structure[0]['A']bchain = structure[0]['B']
bchainca = [ r['CA'] for r in bchain ]
neighbors = Bio.PDB.NeighborSearch(bchainca)
for res1 in achain: r1ca = res1['CA'] r1ind = res1.get_id()[1] r1sym = res1.get_resname() for r2ca in neighbors.search(r1ca.get_coord(), 6.0): res2 = r2ca.get_parent() r2ind = res2.get_id()[1] r2sym = res2.get_resname() print "Residues",r1sym,r1ind,"in chain A", print "and",r2sym,r2ind,"in chain B", print "are close to each other:",round(r1ca-r2ca,2)
![Page 28: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/28.jpg)
10/19/2015 BCHB524 - 2015
Superimpose two structuresimport Bio.PDBimport Bio.PDB.PDBParserimport sys
# Use QUIET=True to avoid lots of warnings...parser = Bio.PDB.PDBParser(QUIET=True)structure1 = parser.get_structure("2WFJ","2WFJ.pdb")structure2 = parser.get_structure("2GW2","2GW2a.pdb")
ppb=Bio.PDB.PPBuilder()
# Manually figure out how the query and subject peptides correspond...# query has an extra residue at the front# subject has two extra residues at the backquery = ppb.build_peptides(structure1)[0][1:]target = ppb.build_peptides(structure2)[0][:-2]
query_atoms = [ r['CA'] for r in query ]target_atoms = [ r['CA'] for r in target ]
superimposer = Bio.PDB.Superimposer()superimposer.set_atoms(query_atoms, target_atoms)
print "Query and subject superimposed, RMS:", superimposer.rmssuperimposer.apply(structure2.get_atoms())
# Write modified structures to one fileoutfile=open("2GW2-modified.pdb", "w") io=Bio.PDB.PDBIO() io.set_structure(structure2) io.save(outfile) outfile.close()
![Page 29: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/29.jpg)
10/19/2015 BCHB524 - 2015
Superimpose two chainsimport Bio.PDB
parser = Bio.PDB.PDBParser(QUIET=1)structure = parser.get_structure("1HPV","1HPV.pdb")
model = structure[0]
ppb=Bio.PDB.PPBuilder()
# Get the polypeptide chainsachain,bchain = ppb.build_peptides(model)
aatoms = [ r['CA'] for r in achain ]batoms = [ r['CA'] for r in bchain ]
superimposer = Bio.PDB.Superimposer()superimposer.set_atoms(aatoms, batoms)
print "Query and subject superimposed, RMS:", superimposer.rmssuperimposer.apply(model['B'].get_atoms())
# Write structure to fileoutfile=open("1HPV-modified.pdb", "w") io=Bio.PDB.PDBIO() io.set_structure(structure) io.save(outfile) outfile.close()
![Page 30: 10/19/2015BCHB524 - 2015 Protein Structure Informatics using Bio.PDB BCHB524 2015 Lecture 12 By Edwards & Li](https://reader035.vdocuments.net/reader035/viewer/2022062309/5697c0131a28abf838ccccc9/html5/thumbnails/30.jpg)
10/19/2015 BCHB524 - 2015
Exercises
Read through and try the examples from Chapter 10 of the Biopython Tutorial and the Bio.PDB FAQ.
Write a program that analyzes a PDB file (filename provided on the command-line!) to find pairs of lysine residues that might be linked if the BS3 cross-linker is used. The rigid BS3 cross-linker is approximately 11 Å long. Write two versions, one that computes the distance between
all pairs of lysine residues, and one that uses the NeighborSearch technique.