proteins from sequences to interactions ipm-nus workshop on computational biology mehdi sadeghi

121
PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Upload: beverley-goodwin

Post on 28-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

PROTEINS FROM SEQUENCES TO

INTERACTIONS

IPM-NUS Workshop on Computational Biology

Mehdi Sadeghi

Page 2: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Amino Acids and Proteins

polymers composed of combinations of 20 different amino acids

range in size from about 50 to over 20000 amino acids

A single cell may have 10,000 or more different proteins.

About half of the non-water component of a typical cell is protein.

Page 3: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Four levels of protein structure

Page 4: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Primary

Secondary

Tertiary

Quaternary

Assembly

Folding

Packing

Interaction

S T

R U

C T

U R

E P R

O C

E S

S

Page 5: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

• Occurs at the ribosome • Involves dehydration synthesis

and polymerization of amino acids attached to tRNA:

• Yields primary structure

Page 6: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

• non-linear• 3 dimensional• Localized to regions of an

amino acid chain• Formed and stabilized by

hydrogen bonding, electrostatic and van der Waals interactions

Secondary Structure

Page 7: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Importance and Determinants of Secondary Structure

• Folded proteins have segments of regular conformation

• The arrangement of secondary structure elements provides a convenient way of classifying types of

folds

• Steric constraints dictate the possible types of secondary structure

Page 8: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Folding

• The folded structure of a protein is directly determined by its primary structure

Computational prediction of folding is not yet reliable

Page 9: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi
Page 10: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Tertiary Structure

• The condensing of multiple secondary structural elements leads to tertiary structure

• Tertiary structure is stabilized by efficient packing of atoms in the protein interior

Page 11: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

The Protein Domain• a compact unit of protein structure that is usually capable

of folding stably as an independent entity in solution. Domains do not need to comprise a contiguous segment of peptide chain, although this is often the case.

• Proteins whose molecular weights are less than about 20,000 often have a simple globular shape, with an average molecular diameter of 20 to 30 Å, but larger proteins usually fold into two or more independent globules, or structural domains.

Page 12: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

• Multidomain proteins probably evolved by the fusion of genes that once coded for separate proteins

The Protein Domain

Identical domains Structurally unrelated

Page 13: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Protein Domains – an alphabet of functional modules

WD40 WWSH2 SH3

14-3-3 ANK3 ARM BH1 C1 C2 CARD

EH EVH FYVE PDZDeath DED EFH

PH PTB SAM

Page 14: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

The Universe of Protein Structures

• The number of protein folds is large but limited

• Protein structures are modular and proteins can be grouped into families on the basis of the domains they contain

• The modular nature of protein structure allows for sequence insertions and deletions

Page 15: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Schematic diagram of the domain arrangement of a number of signal transduction proteins. The different moduleshave different functions

Page 16: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

• Why classify proteins:

• Number of solved structures grow rapidly• Generate overview of structure types• Detect similarities (evolutionary relationships)• Set up prediction benchmarks

Protein structure Classification

Page 17: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Classification schemes

• SCOP– Manual classification

• CATH– Semi manual classification

• FSSP– Automatic classification

Page 18: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

SCOP

Page 19: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi
Page 20: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

CATH

Singh

Class: SSE composition & packing

Architecture: overall shape of domain, ignore SSE connectivity

Topology (Fold): consider connectivity

Homologous superfamily: a common ancestor

Page 21: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Class 1: Mainly Alpha

CATH - Class

Class 2: Mainly Beta

Class 3:mixedAlpha/Beta

Class 4: Few Secondary Structures

Secondary structure content (automatic)

Page 22: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Roll

CATH - Architecture

Super Roll Barrel 2-Layer Sandwich

Orientation of secondary structures (manual)

Page 23: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

L-fucose Isomerase

CATH - Topology

Serine Protease Aconitase, domain 4 TIM Barrel

Topological connection and number of secondary structures

Page 24: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Alanine racemase

CATH - Homology

Dihydropteroate (DHP) synthetase

FMN dependent fluorescent proteins

7-stranded glycosidases

Superfamily clusters of similar structures & functions

Page 25: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Protein Motifs• Protein motifs may be defined by their primary sequence

or by the arrangement of secondary structure elements

Helix-turn-helix Four-helix bundle TIM-barrel Zinc finger

Page 26: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

R-Y-x-[DT]-W-x-[LIVM]-[ST]-T-P-[LIVM](3)

Protein Motifs

•Identifying motifs from sequence is not straightforward

Page 27: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Quaternary Structure

• Many proteins are composed of more than one polypeptide chain

Page 28: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

• All specific intermolecular interactions depend on complementarity

• All types of protein-stabilizing interactions contribute to the formation of intermolecular interfaces

• Inappropriate quaternary interactions can have dramatic functional consequences

Sickle-cell hemoglobin

Quaternary Structure

Page 29: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

• Protein assemblies built of identical subunits are usually symmetric

Quaternary Structure

Page 30: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Proteins are the most versatile macromolecules of the cell

“Protein function” may mean the biochemical function of the molecule in isolation, or the

cellular function it performs as part of an assemblage or complex with other

molecules

Page 31: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Functions of Protein

Enzymes•Globular proteins that facilitate chemical reactions.Defense Proteins•Antibodies•Protein toxinsTransport Proteins•Plasma membrane proteins carry substances through membranes or

formchannels or pumps for passage of substances•Oxygen carrier in circulation (hemoglobin)•Mineral protein carriers (iron, zinc)Structural/Support Proteins (Fibrous proteins)•Connective tissue in animals (collagen – the most abundant

vertebrate protein)•Webs, cocoons and other arthropod structures•Hair, nails horns, etc. (keratin)•Fibrins used in blood clotting

Page 32: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Functions of ProteinContractile Proteins – locomotion and movement•Muscle•Cilia and flagella,•Microtubules, microfilaments and intermediate filamentsRegulatory Proteins•Hormones•Gene Regulators – transcription factors•Osmotic regulationReceptor Proteins•Membrane surface receptor proteins•Signal transduction proteinsRecognition Proteins•Glycoproteins (carbohydrate-protein hybrids) for identification of

"self".Storage Proteins (specialized)•Examples are casein in milk, ferritin for iron storage, calmodulin for

calcium and albumin in eggsEnergy transfer molecules•Cytochromes

Page 33: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Fold evolutionary relationships

Biological multimeric states

Disease states mutations Active sites, enzyme clefts

Antigenic sites Surface properties

3D STRUCTURE

Protein-Ligand Interactions

HISTORYHISTORYProtein Structure-Function Relationships

Page 34: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Overview: Protein Function and Architecture

BindingSpecific recognition

of other molecules iscentral to protein

function. The molecule that is

bound (the ligand) can be as small as

the oxygen molecule that coordinates to the heme group of

myoglobin, or as large as the specific

DNA sequence

Page 35: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Catalysis

Essentially every

chemical reaction in the

Living cell is catalyzed,

and most of the

catalysts Are protein

enzymes.

Overview: Protein Function and Architecture

Page 36: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Switching

Proteins are flexible

molecules and their

conformation can

change in response to

changes in pH or ligand

binding. Such changes

can be used as

molecular switches to

Control cellular

processes.

Overview: Protein Function and Architecture

Page 37: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Structural Proteins

Protein molecules serve as

some of the major structural

elements of living systems.

This function depends on

Specific association of

protein subunits with

themselves as well as with

other proteins, carbohydrates,

and so on,

Overview: Protein Function and Architecture

Page 38: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Protein-Protein Interaction Network

Page 39: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Why Study Networks?

• It is increasingly recognized that complex systems cannot be described in a reductionism view.

• the actual output may not be predictable by looking at only individual components: The whole is greater than the sum of its parts

• Understanding the behavior of such systems starts with understanding the topology of the corresponding

network.

Page 40: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

• Topological information is fundamental in constructing realistic models for the function of the network

• Create models of networks that can help us to understand the meaning of these properties

• Find statistical properties that characterize the structure and behavior of networked systems

• Predict what the behavior of networked systems will be

Why Study Networks?

Page 41: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Basic notions of networks

Network (graph) – a set of nodes connected via edges.

The degree of a node (connectivity) = total number of connections of a node.

Page 42: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Characteristics of networksk - degree of a node,

P(k) – degree distribution,

Diameter – max of distances between nodes taken over all node pairs.

Clustering coefficient

K=2K=2

K=3

K=1

Page 43: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Types of Networks

• Social network Individual or organization connected by one or more specific type of

interdependency (friendship, common interest, beliefs, etc.)

• Data networkSuch as articles and citation, World Wide Web, ….

• Technological networkDesigned networks such as internet, transport, electrical,….

• Biological networks

Page 44: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Network models

A number of network models have been suggested to characterize networks.

The most widely accepted models are scale free and the small-world network.

An alternative model is modular network model

Page 45: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Different network models: Barabasi-Alberts.

Barabasi & Albert, Science, 1999

Model of preferential attachment.• At each step, a new node is added to the graph.• The new node is attached to one of old nodes with probability

proportional to the vertex degree.

ln(P(k))

ln(k)

kkp )(

Degree distribution – power law distribution.

Page 46: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Difference between scale-free and random graph models.

Random networks are homogeneous, most nodes have the same number of links.

Scale-free networks have a number of highly connected verteces.

Page 47: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Scale free model• Interaction networks with scale-free model

– Most proteins interact with a small number of partners

– A few proteins (“hubs”) interact with many partners

– Resistant to random node removal– Sensitive to targeted hub removal

• Types of Hubs– Party Hubs

• Interact with most of their partners simultaneously

• Perform specific functions inside module

– Date Hubs• Interact with different partners at different

times or locations• Connect modules (biological processes)

together

Page 48: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Scale free model

Page 49: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Small-world and modular model The shortest path between any pair of proteins

tends to be small, and the network is full of densely connected neighborhoods.

small-world and scale free models are not in conflict; rather, they complement each other

Modular network model suggests that protein interaction networks consist of several densely interconnected functional modules. The most nodes roughly have the equal edge degree, which is against the scale-free nature

Page 50: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Types of biomolecular networks• Gene regulatory networks

Vertices : genesEdges : regulatory influences

• Metabolic networksVertices : metabolites, reactions

(catalyzed by enzymes)Edges : consumption, production: enzymes, metabolites

• Protein-protein interaction networksVertices : proteins Edges : physical interactions

• Signaling networksVertices : proteins with state informationEdges : interactions modifying states

• Networks of functional linksVertices : genesEdges : functional relationships

• ChIP-Chip• Gene expression data• Sequence

• Sequence• Classical biochemistry• Mass spectrometry• Isotope labeling

• Yeast two-hybrid• Mass spectrometry

• Measurements of post-translational modifications

• Sequence (of several organisms)• Expression data• Any data type allowing definition of a

similarity measure…

Page 51: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Protein Interactions

• Proteins perform a function as a complex rather as a single protein.

- protein-protein interactions are of central importance for virtually every process in a living cell(cell growth,cell cycle,

metabolic pathway, signal transduction) • Knowing whether two proteins interact can help us

discover unknown proteins’ functions:– If the function of one protein is known, the function of its

binding partners are likely to be related- “guilt by association”.

– Thus, having a good method for detecting interactions can allow us to use a small number of proteins with

known function to characterize new proteins.

Page 52: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

• Studying protein interaction network architecture allows us to:

– Assess the role of individual proteins in the overall pathway

– Identify candidate genes involved in genetic diseases(Gene mutation → protein interaction confusion → disease)

– Sets up the framework for mathematical modelsBiological Networks are very rich networks with very

limited, noisy, and incomplete information.Discovering underlying principles is very challenging.

Importance of protein interaction

Page 53: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Genome: 30,000 genes

Transcriptome: 40,000 -100,000 mRNAs

Proteome: 100,000 - 400,000 proteins

Interactome: >1,000,000 interactions

Human Genome

Human Proteome

Transcripts

Protein Interaction

105

106

Importance of protein interaction

Page 54: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

S. Cerevisiae (Yeast)• 4389 proteins• 14319 interactions

C. Elegans (Worm)• 2718 proteins• 3926 interactions

D. Melanogaster (Fly)• 7038 proteins• 20720 interactions

Importance of protein interaction

Page 55: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Yeast Protein Interaction Network

Nodes: proteins

Links: physical interactions

Page 56: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

The two representations differ in localization (a protein occurs multiple times in the list but exactly once in the layout);

in the layout, the neighbors of a protein are easily identified and studied; and mental image (the network layout allows proteins to

be memorized by position).

In positioning the nodes, secondary information can be employed to guide the layout; for example, proteins can be spatially grouped by localization or function. In this way, a particular arrangement of

the proteins can even increase the information content.

PPI more often represented graphically as two dimensional networks

Page 57: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Types of PPI Network

Page 58: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Methods to investigate protein-protein interactions

• there are a multitude of methods to detect protein-protein interactions

• Each of the approaches has its own strengths and weaknesses, especially with regard to the sensitivity and specificity of the method.

• A high sensitivity means that many of the interactions that occur in reality are detected by the screen. A high specificity indicates that most of the interactions detected by the screen are also occurring in reality.

Page 59: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Methods to investigate protein-protein interactions

• Experimental methods

• Computational methods

Page 60: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Experimental methods

Page 61: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Experimental methods

Co-immunoprecipitation GST-pull down assays Protein arrays Far-western analysis TAP-MS X Y

does X bindwith a protein?

Bait Prey

Bait – Prey model

In vitro

In vivo

• Yeast two-hybrid system• Phage display

Physical interaction between protein binding domains

Page 62: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Co-immunoprecipitation• Co-immunoprecipitation is considered to be the gold standard

assay for protein-protein interactions • Immunoprecipitation (IP) experiment - immune response & precipitation

• Affinity purify a bait protein antigen together with its binding partner using a specific antibody

• Capturing of immune complex by solid support

• Elution from the support and analysis by SDS-PAGE and detection by western blot

• it is not a screening approach

Page 63: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Co-immunoprecipitation

Page 64: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

GST-pull down assays

• Affinity chromatography method

• Using a tagged or labeled bait by binding a specific affinity matrix

• Purification of a prey protein from a lysate sample or other protein-containing mixture

• GTH(glutathione)-GST(glutathione S-transferase) binding

Page 65: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

GST-pull down assays

prepare proteinextract from tissue

mix and incubate

express GST-fusionprotein in E.coli

pGEX

GSTgene X

Page 66: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

GST-pull down assays

GST

-fusi

on p

rote

inG

ST a

lone

Sepaharose bead-GTH(glutathione)

Page 67: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Protein arrays

• Antibody-based or bait-based arrays

• High-throughput assays ; screening and detection of specific interactions of proteins from complex mixtures

• Protein expression profiling, protein-protein interaction and enzyme activity

• Binding between the capture proteins immobilized on a surface and the target proteins in the sample solution.

Page 68: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Protein arrays

Page 69: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Mass spectroscopy

• Ionization (Ex: Electrospray ionization) produce peptide ions in a gas phase;

• Detection and recording of sample ions mass-to-charge ratios are assigned to different

peaks of spectra;

• Analysis of MS spectra, protein identification search sequence database with mass fingerprint, find correlations between theoretical and

experimental spectra.

Page 70: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Protein Identification by MS

Artificial spectra built

Artificially trypsinated

Database of sequences

(i.e. SwissProt)

Spot removed from gel

Fragmented using trypsin

Spectrum of fragments generated

MATCHLi

bra

ry

Page 71: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Tandem affinity purification method (TAP)

• Target protein ORF is fused with the DNA sequences encoding TAP tag

• tagged ORFs are expressed in yeast cells and form native complexes;

• the complexes are purified by TAP method;

• components of each complex are found by gel

electrophoresis, MS and bioinformatics methods.

Page 72: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

TAP-MS (Tandem Affinity Purification-Mass Spectrometry)

Page 73: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Yeast two-hybrid experiments

• Many transcription factors have two domains; one that binds to a promoter DNA sequence (BD) and another that activates transcription (AD).

• Transcription factor can not activate transcription unless DNA-binding domain is physically associated with an activating domain

Page 74: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Yeast two-hybrid system• Detecting protein-protein interactions in yeast• Transcriptional regulator system• “prey”-”bait” model :fusion proteins with a transcriptional

activating domain (AD, prey), a DNA-binding domain (BD, bait)• Term “two-hybrid” derives from these two chimeric proteins.• Most commonly used method for large scale, high-throughput

identification of potential protein-protein interactions

Gene construction in yeast expression vectors

Expression of the reporter indicating that the proteins bind

reporter gene

Y

Two hybrid proteins bind

Forming a functional transcription activator

X

X Y

X Y

Page 75: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Gal4/LacZ Y2H system

• Target proteins are fused with BD and AD of GAL4 protein which activate LacZ gene.

• If there is no galactose, GAL80 binds to GAL4 and blocks transcription.

• If galactose is present, GAL4 can activate transcription of beta-galactosidase.

Page 76: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

High-throughput Y2H screening

Principle of two-hybrid library and array screens(Peter Uetz, et al. 2001)

Page 77: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Genome-wide analysis by Y2H

• Matrix approach: a matrix of prey clones is added to the matrix of bait clones. Diploids where X and Y interact are selected based on the expression of a reporter gene.

• Library approach: one bait X is screened against an entire library. Positives are selected based on their ability to grow on specific substrates.

Page 78: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Drawback of Y2H

• The interactions can not be tested if a target protein can initiate transcription.

• Fusion of a protein into chimeras can change the structure of a target.

• Protein interactions can be different in yeast and other organisms.

• Proteins which can interact in two-hybrid experiments, may never interact in a cell.

Page 79: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Advantage of Y2H

• in vivo technique, good approximation of processes which occur in higher eukaryotes.

• Transient interactions can be determined, can predict the affinity of an interaction.

• Fast and efficient.

Page 80: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Differences and similarities between Y2H and MS-TAP

• Both methods generate a lot of false positives, only ~50% interactions are biologically significant.

• Y2H produces binary interactions, lack of information about protein complexes, but can detect transient interactions.

• Y2H is in vivo technique.

• MS can detect large stable complexes and networks of interactions.

Page 81: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Phage display

(William G.T. Willats. 2002)

Page 82: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Comparison of methods

Method Advantage DisadvantageCo-immunoprecipitation Independent of cloning and ectopic

gene expression

Rapid procedures

Cross-reactivity of antibody

Antibody bleeding from column

TAP-MS Generically applicable approach

Ability to purify low abundant proteins/protein complexes

high throughput identification

Protein-tag might influence protein function

requires two successive steps of protein purification and can not readily detect transient interactions.

GST-pull down assays Applicable to very weak protein

interactions

Complex formation in-vitro

Competition with in-vivo

pre-assembled complex

Protein arrays High-throughput assay

Disease diagnosis

Difficulty of protein chip production

Yeast two-hybrid system

Highly sensitive detection

Applicable to a wide range of protein interactions

No biochemical purification

Stability of folding and activity in yeast

Not post-transcriptional modification

phage display Random library screening of many cDNAs through panning cycle

Size of limitation of protein sequence

Incorrect folding or modification

Page 83: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Other methods

• Fluorescence resonance energy transfer (FRET) is a common technique when observing the interactions of only two different

proteins. • Label transfer can be used for screening or confirmation of protein

interactions and can provide information about the interface where the interaction takes place.

• Chemical crosslinking is often used to "fix" protein interactions in place before trying to isolate/identify interacting proteins.

• Protein-protein docking, the prediction of protein-protein interaction based on the three-dimensional protein structures only is not satisfactory

Page 84: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Computational methods for prediction of functional association and protein interactions

• Phylogenetic profile method.

• Rosetta Stone approach.

• Gene neighborhood method.

• Gene cluster method.

• Co-evolution methods.

• Classification methods.

Page 85: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Computational methods (Genomic context-based methods)

Genomic context-based methods are based on the assumption that functionally related proteins are encoded by genes that co-regulated or co-evolved.

These methods seek to predict protein functional associations. Such functional associations may or may not result from physical binding.

− Phylogenetic profile

− Gene neighbors method

− Rosetta stone method (gene fusion)

Page 86: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Phylogenetic profile method

Idea: pairs of proteins that are always both present or both absent in a genome suggest their functional dependence possible interaction

Profile of a protein: A vector of 0/1 where each position corresponds to one genome. 1=protein present, 0=protein absent

Page 87: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Phylogenetic profile method

Page 88: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Phylogenetic profile method

• Proteins with identical (or similar) profiles are boxed to indicate that they are likely to be functionally linked. Boxes connected by lines have phylogenetic profiles that differ by one bit and are termed neighbors.

Page 89: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Gene neighbors method

The basic assumption is that genes which interact or are functionally associated tend to be located in physical proximity to each other on the genome.

Despite the effect of neutral evolution which tends to shuffle gene order between distantly related organisms, gene clusters or operons encoding for co-regulated genes are usually conserved

Page 90: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Gene neighbors method

Page 91: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Gene cluster method

• Bacterial genes of related function are often transcribed simultaneously – operon.

• Identification of operons is based on intergenic distances.

Page 92: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Rosetta stone method (gene fusion)

Assumption (based on observation):

Gene fusion examines pairs of genes that exist individually in some organisms, but as a fused gene in other organisms

Proteins that are fused in one genome are likely to interact, physically or at least functionally, in other genomes.

Page 93: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Five examples of pairs of E. coli proteins predicted to interact by

the domain fusion analysis. Each protein is shown schematically

with boxes representing domains

Rosetta stone method (gene fusion)

Page 94: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Correlation between gene expression and protein interactions

• There should exist a relationship between gene expression levels of subunits in a complex, then protein-protein interactions can be verified from coexpression data.

• Methods are tested on protein complexes: ribosome, proteasome, RNA Polymerase II Holoenzyme and replication complexes.

Page 95: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Expression profiles were taken from: cell cycle experiments and expression ratios for overall yeast genome for 300 cell states.

• Difference between absolute expression levels can be calculated as

)(

||

ji

ji

EE

EED

Correlation between gene expression and protein interactions

Page 96: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Results of gene coexpression analysis.

• Subunits from the same complex show coexpression.

• Expression correlation is strong for permanent complexes.

• Transient complexes have weaker correlation.

Page 97: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Coevolution of interacting proteins – “mirrortree” methods

• Interacting proteins may co-evolve and their phylogenetic trees show similarity.

• Similarity between phylogenetic trees can be quantified by correlation coefficient between distance matrices used to construct trees.

Page 98: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Tree of life (TOL) assists in prediction of protein interactions

• There is “background” similarity between trees of any proteins, no matter if they interact or not.

• “Background” tree is constructed from 16S rRNA sequences.

• rRNA-based distances are subtracted from distances of original phylogenetic tree.

Page 99: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Verification of experimental protein-protein interactions

• Protein localization method.

• Expression profile reliability method.

• Paralogous verification method.

Page 100: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Protein localization method

True positives:

- Proteins which are localized in the same cellular compartment

Page 101: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Expression profile reliability method.

Page 102: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Paralogous verification method.

PVM method is based on observation that if two proteins interact, their paralogs would interact. Calculates the number of interactions between two families of paralogous proteins.

Page 103: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Aligning protein interaction networks.

• The method searches for high-scoring pathway alignments between two networks, where proteins are paired based on their sequence similarity.

A

B

C

D

E

a

b

d

e

Page 104: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Aligning protein interaction networks.

• The network alignment between worm, yeast and fly detected 71 network regions that were conserved between all three species.

Page 105: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Interaction databases

• Experiment (E)

• Structure detail (S)

• Predicted– Physical (P)– Functional (F)

• Curated (C)

• Homology modeling (H)

Page 106: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Protein interaction databases

• Protein-protein interaction databases

• Domain-domain interaction databases

Page 107: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

DIP database

• Documents protein-protein interactions from experiment– Y2H, protein microarrays,

TAP/MS, PDB

• 55,733 interactions between 19,053 proteins from 110 organisms.

Organisms # proteins

# interactions

Fruit fly 7052 20,988

H. pylori 710 1425

Human 916 1407

E. coli 1831 7408

C. elegans 2638 4030

Yeast 4921 18,225

Others 985 401

Page 108: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

DIP/Prolinks database

• Records functional association using prediction methods:– Gene neighbors– Rosetta Stone– Phylogenetic profiles– Gene clusters

Page 109: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Other functional association databases

• Phydbac2 (Claverie)• Predictome (DeLisi)• ArrayProspector

(Bork)

Page 110: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

BIND database

• Records experimental interaction data

• 83,517 protein-protein interactions

• 204,468 total interactions include small molecules, NAs, complexes

Page 111: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

MPact/MIPS database

• Records yeast protein-protein interactions

• Curates interactions:– 4,300 PPI– 1,500 proteins

Page 112: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

STRING database

• Records experimental and predicted protein-protein interactions using methods:– Genomic context– High-throughput– Coexpression– Database/literature

mining

Page 113: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

More interaction databases

• IntAct (Valencia)– Open source interaction database and analysis– 68,165 interactions from literature or user submissions

• MINT (Cesareni)– 71,854 experimental interactions mined from literature

by curators– Uses IntAct data model

• BioGRID (Tyers)– 116,000 protein and genetic interactions

Page 114: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

Protein interaction databases

• Protein-protein interaction databases

• Domain-domain interaction databases

Page 115: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

InterDom database

• Predicts domain interactions (~30000) from PPIs

• Data sources:– Domain fusions– PPI from DIP– Protein complexes– Literature

Page 116: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

PIBASE

• Query by PDB, domain, interface

• 1,946 interacting SCOP domains

• 2,387 unique interaction types

Page 117: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

PIBASE/ModBase

• Protein structure models

• Predict interfaces with Pibase

Page 118: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

3did database

• Defines domains using Pfam

• Data source: Protein structure data

• 3,304 unique interaction types

• 2,247 interacting domains

• Display linkages and chain locations

Page 119: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

iPfam database

• View Pfam interactions on PDB structures

• View individual structures and sequence plots

Page 120: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

DIMA database

• Phylogenetic profiles of Pfam domain pairs

• Uses structural info from iPfam

• Works well for moderate information content

Page 121: PROTEINS FROM SEQUENCES TO INTERACTIONS IPM-NUS Workshop on Computational Biology Mehdi Sadeghi

perspective To further expand our knowledge about protein

interaction networks, we need to improve our data-gathering capabilities.

Development of highly sensitive and accurate methods to allow data collection under various cellular functional and temporal states.

Novel computational approaches need to be developed to transfer as many interactions as possible from model organisms to human.