rna secondary structure prediction spring 2010. objectives can we predict the structure of an rna? ...

32
RNA Secondary Structure Prediction Spring 2010

Upload: blanche-carter

Post on 30-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

RNA Secondary Structure PredictionSpring 2010

Page 2: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Objectives

Can we predict the structure of an RNA?

Can we predict the structure of a protein?

Page 3: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

RNA: Hypothesis and exclusions Single stranded chain

A RNA molecule is a string of n characters R=r1r2…rn such that ri {A,C,G,U}

Predicting methods are based on the computation of minimun free-energy configurations

We exclude the knots. A knot exists when (ri, rj) S and (rk, rl) S and i<k<j<l.

Page 4: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Combinatorial Solution

Enumerate all the possible structures Compute the one with the lowest free-

energy

Impossible to solve:Exponential in the number of bases!

Page 5: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Independent Base Pair

Assumption: the energy of a base pair is independent of all the others. Let (ri,rj) be the free energy of base pair (ri, rj)

We assume that (ri,rj)=0 if i=j

• We consider secondary structures and we use a dynamic programming approach

Page 6: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Dynamic programming Consider the string Ri,j = R=riri+1…rj we

want to compute the secondary structure Si,j of minimum energy

Possible Cases:1. rj is base-paired with ri

then E(Si,j) = (ri,rj) + E(Si,j-1)

2. rj does not pair with any base then E(Si,j) = E(Si,j-1)

rj is base-paired with rk and i k j – then split the string in Ri,k-1 and Rk,j and

– E(Si,j) = min{ E(Si,k-1) + E(Sk,j)}

Page 7: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Dynamic Programming

The algorithm computes the matrix n x n using this minimization function

Complexity of the algorithm: O(n3) O(n2) to compute the matrix times O(n) to compute each element of the matrix

note that each element has a variable number of elements to consider that can be n in the worst case

Page 8: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Structures with loops

Assumption: the free energy of a base pair depends on adjacent base pairs

A loop is a set of all bases accessible from a base pair (ri,rj)

Consider (ri,rj) in S and positions u, v, and w such that i u v w j . We say that rv

is accessible from (ri,rj), if (ru,rw) is not a base pair in S for any u and w

Page 9: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Loops

hairpin loopbulge on i

interior loop helical region

Page 10: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Dynamic programming Determine Si,j for Ri,j = R=riri+1…rj

Possible Cases: ri is not base-paired

then E(Si,j) = E(Si+1,j) rj is not base-paired

then E(Si,j) = E(Si,j-1) rj forms a pair with rk and i k j

– split the string in Ri,k-1 and Rk,j and – then E(Si,j) = min { E(Si,k) + E(Sk+1,j) }

rj forms a pair with ri that means that there might be one or more loops

between i and j E(Si,j) = E(Li,j)

Page 11: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

The energy of a loophairpin: E(Li,j) = (ri,rj) + (j-i-1) (k)=destabilizing free energy of hairpin loop of size khelical region: E(Li,j) = (ri,rj) + + E(Si+1,j-1) =stabilizing free energy of adjacent base pairbulge on i: E(Li,j) = mink1{(ri,rj) + (k) + E(Si+k+1,j-1)} (k) =destabilizing free energy of a bulge loop of size kbulge on j: E(Li,j) = mink1{ (ri,rj) + (k) + E(Si+1,j-k-1)}

interior loop: E(Li,j) = mink1,k21{(ri,rj) + (k1+ k2) + E(Si+1,j-1)} (k) =destabilizing free energy of a bulge loop of

size K

Page 12: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Dynamic ProgrammingE(Si+1,j)

E(Si,j-1)

E(Si,j) = min min{ E(Si,k) + E(Sk+1,j) } i<k<j

E(Li,j)

Complexity of the algorithm: O(n4) the complexity is worse because of the loops it takes

constant time to compute the hairpin and helical region loop O(n2)

it takes linear time to compute the bulge since k ranges between 1 and j-i. O(n3)

it takes quadratic time to compute the interior loop since we are dealing with two parameters k1 and k2 therefore we have to look at a submatrix having O(n2) elements.

O(n4)

Page 13: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Page 14: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Protein Folding:Example Amino Acid

CH3

H2N

H

C COOH

Alpha Carbon

Amino GroupCarboxy Group

Side Chain

Page 15: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Common Secondary Structures

Page 16: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Protein Folding ProblemAssumption for all protein folding

prediction methods: amino acid sequence completely and uniquely determines the folding.

Problem: Given the amino acid sequence of a protein, we would like to process it and determine where exactly the -helices, -sheets and loops are, and how they arrange themselves in motifs and domains

Page 17: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Combinatorial approach

1. Enumerate all the possible foldings, 2. compute the free energy of each3. choose the one with minimum free

energy

Page 18: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

From what we know….

1. if we assume that the angles and between the alpha carbon and the neighboring atoms assume only 3 possible values, we have that a protein with 100 residue has (32)100 possible configurations!

2. How do we compute the energy of a configuration?

• factors: shape, size, polarity of the molecules, relative strenght of the interaction at molecular level, ect…too many factors and not a defined agreement

• this problem applies to secondary structures as well.

3. What do we know? 1. hydrophobic amino acid stay "inside" the protein,

hydrophillic amino acid stay "outside" the protein;2. not enough information to make a prediction

Page 19: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Conclusions

No dynamic programming approach is know for protein secondary prediction

Programs based on neural nets to pattern-recognition based on statistical properties of residue in proteins are available not as good as we desire.

new techniques must be developed we discuss a branch and bound solution

Page 20: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Protein Threading Problem

1. Similar sequences should have similar structures if A with a known protein structure, is similar

to B at a sequence level, B structure should be nearly the same as A structure

2. Certain proteins are different at a sequence level but are structurally related, ie. they have different kinds of loops but

similar cores

3. Approach used for the solution: Branch and Bound

Page 21: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Core threading

Loops are structures that are neither helices nor -sheets

Protein Core are either helices and sheets

Motifs are simple combination of a few secondary structures (ex. helix-loop-helix)

Page 22: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Protein threading problem definition Input:   Given protein sequence A;

            Core structural model M;             Score functions g1, g2.

Output:  A threading T.             In short:  Align A to model T.

Given: A: Protein sequence of length n: a1, a2, a3, … , an; M: m core segments C1, C2, C3, … , Cm; c1, c2, c3, … , cm; length of core segments; l1, l2, l3, …, lm-1; loop regions connecting core segments; l1max, l2max, l3max, …,  lm-1max; maximum lengths of loop

regions; l1min, l2min, l3min, …,  lm-1min; minimum lengths of loop

regions; Properties of each amino acid; f, g1, g2: score functions to evaluate threading;

Page 23: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

where, g1 and g2  are based on the given model M.  g1 shows how each segment corresponds to core segment i in the model, and g2 deals with the interactions between segments.  So to solve the threading problem, we have to decide on  t1, t2, t3, …, tm, so that the overall score is maximum.  Thus the threading problem, or alignment problem, is converted to an optimization problem.

Output:  T:  t1, t2, t3, …, tm;  start locations for core segments;

score function

Page 24: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Threading constraints

spacing constraint

order of the core constraint

Page 25: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Branch and Bound Assume we are minimizing f(s) and we aleady

know the value f(s) for some candidate solutions in a set.

Branch divide the solution space according to some constraints.

For example, partition X in X1(all solutions having a certain property) and X2 (all solutions that do not)

Partition should be implicit, i.e. you do not explicitly enumerate the solutions

Bound for every partition X, obtain a lower bound lb on the

value of f(s) for every solution sX. If f(s) < lb then we can discard all the candidate

solutions in X because we have a solution, s, that scores better than all solutions in X, otherwise we explore the set X

Page 26: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Branch and Bound-Issues Constructing score function Calculating lower bound Choosing split segment Choosing split point

Page 27: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Protein Threading: Branch and Bound

1. Set of all possible threadings defined by initial position bounds

• This implicitely defined by the number of aminoacids in A

2. Divide possible threadings into smaller sets, and compute new position bounds for each set

3. Compute a quick score lower bound for each set of threadings

4. Keep re-dividing the set with smallest lower bound, until set size if 1.

Page 28: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

2. Division in smaller sets: Branch and Bound

Page 29: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

3. Compute the lower bound: Branch and Bound

Given a set of threadings defined by position bounds, one possible score lower bound is

this is an approximation

Page 30: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

0-100

1 101

5

0-100

6 206

250

10-20

266

10

476

Page 31: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

set of possible threadings: amino acids

(1…100)

(1…49) (50) (51…100)

3060 70

(1…23) (24) (24...49)

8035

32

(25…29) (30) (34…49)

5034 80

Page 32: RNA Secondary Structure Prediction Spring 2010. Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?