constraint programming and biology: rna secondary structureagostino.dovier/wroclaw/biocp12_4.pdf ·...

44
Constraint Programming and Biology: RNA secondary structure Agostino Dovier Dept. Math and Computer Science, Univ. of Udine, Italy ACP Summer School in Constraint Programming Wroclaw, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wroclaw, September 2012 1 / 15

Upload: others

Post on 26-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

Constraint Programming and Biology:RNA secondary structure

Agostino Dovier

Dept. Math and Computer Science, Univ. of Udine, Italy

ACP Summer School in Constraint ProgrammingWrocław, September 2012

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 1 / 15

Page 2: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction DNA and RNA

The central dogma

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 2 / 15

Page 3: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction DNA and RNA

The central dogma

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 2 / 15

Page 4: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction DNA and RNA

The central dogma

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 2 / 15

Page 5: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction DNA and RNA

The central dogma

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 2 / 15

Page 6: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction DNA and RNA

The central dogma

RNA is a sequence of nucleotides (A,C,G,U) that (often) is just anintermediary between DNA and proteinsThe 3D structure of RNA depends largely on interactions betweenpairs of nucleotides (base pairing)The secondary structure is the set of its base pairings

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 3 / 15

Page 7: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Mathematically

A RNA sequence ~s = s1s2 · · · sn is a string in {A,C,G,U}∗

A RNA secondary structure is a (partial) injective functionP ⊆ {1, . . . ,n}2 such that (i , j) ∈ P → i < j

(or, alternatively, such that (i , j) ∈ P ↔ (j , i) ∈ P)One might also require from the beginning that (i , j) ∈ P only if(si , sj) ∈ {(A,U), (U,A), (C,G), (G,C), (U,G), (G,U)}We are interested in a pairing maximizing the pairings (and/orminimizing a more difficult energy function)

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 4 / 15

Page 8: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Mathematically

A RNA sequence ~s = s1s2 · · · sn is a string in {A,C,G,U}∗

A RNA secondary structure is a (partial) injective functionP ⊆ {1, . . . ,n}2 such that (i , j) ∈ P → i < j(or, alternatively, such that (i , j) ∈ P ↔ (j , i) ∈ P)

One might also require from the beginning that (i , j) ∈ P only if(si , sj) ∈ {(A,U), (U,A), (C,G), (G,C), (U,G), (G,U)}We are interested in a pairing maximizing the pairings (and/orminimizing a more difficult energy function)

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 4 / 15

Page 9: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Mathematically

A RNA sequence ~s = s1s2 · · · sn is a string in {A,C,G,U}∗

A RNA secondary structure is a (partial) injective functionP ⊆ {1, . . . ,n}2 such that (i , j) ∈ P → i < j(or, alternatively, such that (i , j) ∈ P ↔ (j , i) ∈ P)One might also require from the beginning that (i , j) ∈ P only if(si , sj) ∈ {(A,U), (U,A), (C,G), (G,C), (U,G), (G,U)}

We are interested in a pairing maximizing the pairings (and/orminimizing a more difficult energy function)

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 4 / 15

Page 10: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Mathematically

A RNA sequence ~s = s1s2 · · · sn is a string in {A,C,G,U}∗

A RNA secondary structure is a (partial) injective functionP ⊆ {1, . . . ,n}2 such that (i , j) ∈ P → i < j(or, alternatively, such that (i , j) ∈ P ↔ (j , i) ∈ P)One might also require from the beginning that (i , j) ∈ P only if(si , sj) ∈ {(A,U), (U,A), (C,G), (G,C), (U,G), (G,U)}We are interested in a pairing maximizing the pairings (and/orminimizing a more difficult energy function)

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 4 / 15

Page 11: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 15

Page 12: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 15

Page 13: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 15

Page 14: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 15

Page 15: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 15

Page 16: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 15

Page 17: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 15

Page 18: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 15

Page 19: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

NO!

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 6 / 15

Page 20: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .

NO!

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 6 / 15

Page 21: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Spatial constraints (pseudo knot)

If i < ` < j and (i , j) ∈ P, and ((`, k) ∈ P or (k , `) ∈ P), then i < k < j .NO!

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 6 / 15

Page 22: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Results

The pseudo-knot constraint is sensible. Adding it there arepolynomial-time algorithms (mfold: dynamic programming.http://mfold.rna.albany.edu)Without the pseudo-knot constraint the problem is NP complete.

Actually, what problem?

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 7 / 15

Page 23: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

Results

The pseudo-knot constraint is sensible. Adding it there arepolynomial-time algorithms (mfold: dynamic programming.http://mfold.rna.albany.edu)Without the pseudo-knot constraint the problem is NP complete.Actually, what problem?

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 7 / 15

Page 24: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

Let ~s = s1 · · · sn be a RNA sequence, and P a secondarystructure. Then

E(~s,P) =∑

(i,j)∈P,i<j

E(~s, i , j ,P)

where E(~s, i , j ,P) depend on si and sj and, moreover, on the szsuch that (i + 1, z) ∈ P or (j − 1, z) ∈ P.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 8 / 15

Page 25: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

In the NP-completeness proof they first assume to have an infiniteset of complementary bases (e.g., (A1,U1), (A2,U2), (A3,U3), . . . )and define E as follows:

E(~s, i , j ,P) =

−1 If si and sj are complementary symbols and

(∀z ∈ {1, . . . , i − 1, j + 1, . . . ,n})({(i + 1, z), (z, i + 1), (j − 1, z), (z, j − 1)} ∩ P = ∅)

0 otherwise

E(~s, i , j ,P) = 0

-10-1

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 9 / 15

Page 26: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

In the NP-completeness proof they first assume to have an infiniteset of complementary bases (e.g., (A1,U1), (A2,U2), (A3,U3), . . . )and define E as follows:

E(~s, i , j ,P) =

−1 If si and sj are complementary symbols and

(∀z ∈ {1, . . . , i − 1, j + 1, . . . ,n})({(i + 1, z), (z, i + 1), (j − 1, z), (z, j − 1)} ∩ P = ∅)

0 otherwise

E(~s, i , j ,P) =

0

-1

0-1

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 9 / 15

Page 27: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

In the NP-completeness proof they first assume to have an infiniteset of complementary bases (e.g., (A1,U1), (A2,U2), (A3,U3), . . . )and define E as follows:

E(~s, i , j ,P) =

−1 If si and sj are complementary symbols and

(∀z ∈ {1, . . . , i − 1, j + 1, . . . ,n})({(i + 1, z), (z, i + 1), (j − 1, z), (z, j − 1)} ∩ P = ∅)

0 otherwise

E(~s, i , j ,P) =

0

-1

0-1

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 9 / 15

Page 28: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

In the NP-completeness proof they first assume to have an infiniteset of complementary bases (e.g., (A1,U1), (A2,U2), (A3,U3), . . . )and define E as follows:

E(~s, i , j ,P) =

−1 If si and sj are complementary symbols and

(∀z ∈ {1, . . . , i − 1, j + 1, . . . ,n})({(i + 1, z), (z, i + 1), (j − 1, z), (z, j − 1)} ∩ P = ∅)

0 otherwise

E(~s, i , j ,P) =

0-1

0

-1

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 9 / 15

Page 29: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

In the NP-completeness proof they first assume to have an infiniteset of complementary bases (e.g., (A1,U1), (A2,U2), (A3,U3), . . . )and define E as follows:

E(~s, i , j ,P) =

−1 If si and sj are complementary symbols and

(∀z ∈ {1, . . . , i − 1, j + 1, . . . ,n})({(i + 1, z), (z, i + 1), (j − 1, z), (z, j − 1)} ∩ P = ∅)

0 otherwise

E(~s, i , j ,P) =

0-1

0

-1

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 9 / 15

Page 30: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

In the NP-completeness proof they first assume to have an infiniteset of complementary bases (e.g., (A1,U1), (A2,U2), (A3,U3), . . . )and define E as follows:

E(~s, i , j ,P) =

−1 If si and sj are complementary symbols and

(∀z ∈ {1, . . . , i − 1, j + 1, . . . ,n})({(i + 1, z), (z, i + 1), (j − 1, z), (z, j − 1)} ∩ P = ∅)

0 otherwise

E(~s, i , j ,P) =

0-10

-1

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 9 / 15

Page 31: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

To prove NP hardness they start from 3SAT with the furtherrequirement that each literal occurs at most twice (for a variable X ,you can have X zero, one or two times and ¬X zero, one, or twotimes). Prove it is NP complete (exercise)For a clause ci = `1 ∨ `2 ∨ `3 they introduce a gadget:

Ci = ci,1(`1)1/2ci,1ci,2(`2)1/2ci,1ci,2(`3)1/2ci,2

(1/2 according to the leftmost/rightmost occurrence of that literal)For a variable Xi that occurs twice positively and twice negatively,introduce a gadget (a substring in case of less occurrences)

Vi = vi(Xi)2 (Xi)1vivi(¬Xi)2 (¬Xi)1vi

The encoding of a formula c1 ∧ · · · ∧ cm on variables X1 . . .Xn is

C1 · · · CmV1 · · · Vn

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 10 / 15

Page 32: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessMain idea

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 11 / 15

Page 33: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessMain idea

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 11 / 15

Page 34: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessMain idea

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 11 / 15

Page 35: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessMain idea

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 11 / 15

Page 36: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessMain idea

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 11 / 15

Page 37: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

The encoding of a formula C1 ∧ · · · ∧ Cm on variables X1 . . .Xn is

C1 · · · CmV1 · · · Vn

They prove that ϕ is satisfiable iff there is a secondary structure withenergy -(3m+n) [Nice exercise]

Then they complete the proof without the hypothesis of infinitealphabet (this is a nice reading—too long for explaining it in thiscourse).

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 12 / 15

Page 38: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

The encoding of a formula C1 ∧ · · · ∧ Cm on variables X1 . . .Xn is

C1 · · · CmV1 · · · Vn

They prove that ϕ is satisfiable iff there is a secondary structure withenergy -(3m+n) [Nice exercise]

Then they complete the proof without the hypothesis of infinitealphabet (this is a nice reading—too long for explaining it in thiscourse).

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 12 / 15

Page 39: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

NP CompletenessLyngsø and Pedersen, 2000

The encoding of a formula C1 ∧ · · · ∧ Cm on variables X1 . . .Xn is

C1 · · · CmV1 · · · Vn

They prove that ϕ is satisfiable iff there is a secondary structure withenergy -(3m+n) [Nice exercise]

Then they complete the proof without the hypothesis of infinitealphabet (this is a nice reading—too long for explaining it in thiscourse).

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 12 / 15

Page 40: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

A simple CLP encoding

Input s1, . . . , sn

Variables Pairs = [P1, . . . ,Pn].Let Sx = {i ∈ {1, . . . ,n} | si = x}.If si = A, then dom(Pi) = {0} ∪ SU .If si = C, then dom(Pi) = {0} ∪ SG.If si = G, then dom(Pi) = {0} ∪ SC ∪ SU .If si = U, then dom(Pi) = {0} ∪ SA ∪ SG.For i = 1, . . . ,n, if Pi > 0 then PPi = I. If Pi = 0 no constraint. Itcan be stated compactly as:

element(P + 1, [I|Pairs], I)

Pseudo-knots: If Pi > 0 then (Pi+1 ∈ [i + 3..PPi − 1]) ∨ (Pi+1 = 0)

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 13 / 15

Page 41: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

A simple CLP encoding

As cost function we want either to maximize contacts or

(as done by Dahl-Bavarian, WCB05),a solution close to the statistics, namely 35% for AU, 53% for CG,12% for GU.Let NC = n −#contactsWe minimize therefore a weighted sum of the form

c1NCn

+ c2#(AU)− .35(n − NC)

n+ c3

#(CG)− .53(n − NC)

n

(c1, c2, c3 constants that can be changed. The denominator n canbe omitted for minimization)Let us see some execution of RNA_alignment.pl

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 14 / 15

Page 42: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

A simple CLP encoding

As cost function we want either to maximize contacts or(as done by Dahl-Bavarian, WCB05),

a solution close to the statistics, namely 35% for AU, 53% for CG,12% for GU.Let NC = n −#contactsWe minimize therefore a weighted sum of the form

c1NCn

+ c2#(AU)− .35(n − NC)

n+ c3

#(CG)− .53(n − NC)

n

(c1, c2, c3 constants that can be changed. The denominator n canbe omitted for minimization)Let us see some execution of RNA_alignment.pl

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 14 / 15

Page 43: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

A simple CLP encoding

As cost function we want either to maximize contacts or(as done by Dahl-Bavarian, WCB05),a solution close to the statistics, namely 35% for AU, 53% for CG,12% for GU.Let NC = n −#contactsWe minimize therefore a weighted sum of the form

c1NCn

+ c2#(AU)− .35(n − NC)

n+ c3

#(CG)− .53(n − NC)

n

(c1, c2, c3 constants that can be changed. The denominator n canbe omitted for minimization)Let us see some execution of RNA_alignment.pl

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 14 / 15

Page 44: Constraint Programming and Biology: RNA secondary structureagostino.dovier/WROCLAW/BIOCP12_4.pdf · Wrocław, September 2012 Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław,

RNA secondary structure prediction Definitions

(Some) References

M. Zucker and P. Stiegler. Optimal computer folding of large RNAsequences using thermodynamics and auxiliary information. NucleidAcid Research, 9(1):133–148, 2981.

R.B. Lyngsø and C.N.S Pedersen. RNA Pseudoknot prediction inEnergy-Based Models. J. of Computational Biology 7(3/4), 2000.

G. Blin, G. Fertin, I. Rusu, and C. Sinoquet. Extending the hardness ofRNA secondary structure comparison. LNCS 4614, pp. 140–151, 2007.

M. Bauer, G.W. Klau, and K. Reinert. Accurate multiplesequence-structure alignment of RNA sequences using combinatorialoptimization. BMC Bioinformatics, 8, 2007.

M. Bavarian and V. Dahl. Constraint Based Methods for BiologicalSequence Analysis. J. Universal Computer Science 12(11):1500–1520,2006 (also in WCB 05).

A. Dal Palù, M. Möhl, S. Will. A Propagator for Maximum Weight StringAlignment with Arbitrary Pairwise Dependencies. CP 2010: 167-175(also in WCB 10)

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 15 / 15