constraint programming and biology: rna secondary structureagostino.dovier/wroclaw/biocp12_4.pdf ·...

Constraint Programming and Biology:RNA secondary structure

Agostino Dovier

Dept. Math and Computer Science, Univ. of Udine, Italy

ACP Summer School in Constraint ProgrammingWrocław, September 2012

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 1 / 15

RNA secondary structure prediction DNA and RNA

The central dogma

RNA is a sequence of nucleotides (A,C,G,U) that (often) is just anintermediary between DNA and proteinsThe 3D structure of RNA depends largely on interactions betweenpairs of nucleotides (base pairing)The secondary structure is the set of its base pairings

RNA secondary structure prediction Definitions

Mathematically

A RNA sequence ~s = s1s2 · · · sn is a string in {A,C,G,U}∗

A RNA secondary structure is a (partial) injective functionP ⊆ {1, . . . ,n}2 such that (i , j) ∈ P → i < j

(or, alternatively, such that (i , j) ∈ P ↔ (j , i) ∈ P)One might also require from the beginning that (i , j) ∈ P only if(si , sj) ∈ {(A,U), (U,A), (C,G), (G,C), (U,G), (G,U)}We are interested in a pairing maximizing the pairings (and/orminimizing a more difficult energy function)

Mathematically

A RNA secondary structure is a (partial) injective functionP ⊆ {1, . . . ,n}2 such that (i , j) ∈ P → i < j(or, alternatively, such that (i , j) ∈ P ↔ (j , i) ∈ P)

One might also require from the beginning that (i , j) ∈ P only if(si , sj) ∈ {(A,U), (U,A), (C,G), (G,C), (U,G), (G,U)}We are interested in a pairing maximizing the pairings (and/orminimizing a more difficult energy function)

Mathematically

A RNA secondary structure is a (partial) injective functionP ⊆ {1, . . . ,n}2 such that (i , j) ∈ P → i < j(or, alternatively, such that (i , j) ∈ P ↔ (j , i) ∈ P)One might also require from the beginning that (i , j) ∈ P only if(si , sj) ∈ {(A,U), (U,A), (C,G), (G,C), (U,G), (G,U)}

We are interested in a pairing maximizing the pairings (and/orminimizing a more difficult energy function)

Mathematically

A RNA secondary structure is a (partial) injective functionP ⊆ {1, . . . ,n}2 such that (i , j) ∈ P → i < j(or, alternatively, such that (i , j) ∈ P ↔ (j , i) ∈ P)One might also require from the beginning that (i , j) ∈ P only if(si , sj) ∈ {(A,U), (U,A), (C,G), (G,C), (U,G), (G,U)}We are interested in a pairing maximizing the pairings (and/orminimizing a more difficult energy function)

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .NO!

Results

The pseudo-knot constraint is sensible. Adding it there arepolynomial-time algorithms (mfold: dynamic programming.http://mfold.rna.albany.edu)Without the pseudo-knot constraint the problem is NP complete.

Actually, what problem?

Results

The pseudo-knot constraint is sensible. Adding it there arepolynomial-time algorithms (mfold: dynamic programming.http://mfold.rna.albany.edu)Without the pseudo-knot constraint the problem is NP complete.Actually, what problem?

NP CompletenessLyngsø and Pedersen, 2000

Let ~s = s1 · · · sn be a RNA sequence, and P a secondarystructure. Then

E(~s,P) =∑

(i,j)∈P,i<j

E(~s, i , j ,P)

where E(~s, i , j ,P) depend on si and sj and, moreover, on the szsuch that (i + 1, z) ∈ P or (j − 1, z) ∈ P.

In the NP-completeness proof they first assume to have an infiniteset of complementary bases (e.g., (A1,U1), (A2,U2), (A3,U3), . . . )and define E as follows:

E(~s, i , j ,P) =

−1 If si and sj are complementary symbols and