synthesis for systems biology ras bodík, ali sinan köksal, evan pu, saurabh srivastava uc berkeley...

53
Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University of Leicester

Upload: andra-sutton

Post on 23-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Synthesis forSystems Biology

Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC BerkeleyJasmin Fisher Microsoft ResearchNir Piterman University of Leicester

Page 2: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

2

Page 3: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Executable biology pushes our boundaries

Maximally non-deterministic systemscells exhibit races model must preserve all observed n/d

Needs new synthesis algorithmsfrom 2QBF to 3QBF

Incomplete specssparse wet lab experiments unknown behavior

Needs analysis of ambiguityare there alternative explanations of observed phenomena?

4

Page 4: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

5

Other lessons and results

Design your own toolsTo enable synthesis, design a domain language.Then build a lightweight synthesizer.

Synthesized a C. elegans VPC modelWe failed to write this model manually; others took months.

Beyond synthesisShowed that available experiments are non-ambiguous.Synthesized an new internally alternative model.

Page 5: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Systems biology

6

Page 6: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

7

Understanding Diseases

“Cancer is fundamentally a disease of failure of regulation of tissue growth. In order for a normal cell to transform into a cancer cell, the genes which regulate cell growth and differentiation must be altered.” – Wikipedia

To understand cancer, investigate cell differentiation

Page 7: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

8

How Are Cells Differentiated?

Two ways of differentiation:– A single cell divides into cells of different type.– Multiple identical cells differentiate by

communicating.

To understand cell differentiation, investigate cell communication.

Page 8: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Studying Differentiation on Worms

Cell differentiation in worms: similar to human but much simpler.

9

identicalprecursor cells

differentiatedvulval cells

Page 9: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

10

The Research Goal

What is the cell’s “algorithm” for robustlydeciding cell fates through communication?

Page 10: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Mutation experiments are visually observable

Biologists mutate cell genes and observe the outcome of differentiation.

sqv mutants of Caenorhabditis elegans

are defective in vulval epithelial invagination

[Herman et al. 1999]11

Page 11: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

12

The results from wet-lab experiments

Page 12: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

13

Mutation experiments give partial knowledge

From gene mutation experiments, biologists infer a protein interaction.

“In this assay, depletion of lst-2, lst-3, lst-4, or dpy-23, as well as ark-1, caused ectopic vulval induction, suggesting that they function as negative regulators of the EGFR- MAPK pathway.”

[Yoo et al. 2004]

Page 13: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

14

Making Sense of Experiments

Page 14: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Executable Systems biology

15

Page 15: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

16

Executable Biology

Computational models are needed to tackle the combinatorial complexity of cell communication.

Verification of models can show their inconsistency with experimental data.

New interactions can be discovered. [Fisher et al. 2007]

Page 16: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

17

Semantics of models

Time and protein concentrations are discrete:discrete is sufficient to show interesting behavior

Cells are concurrent communicating automatabounded asynchrony (cells progress at ~same rate)

Note: timing is modeled with state progression

Page 17: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

18

Cells as a Reactive Modules (RM) programatom Vul controls Vul reads go, Vul, IS, Muv_state, v_Vul awaits go, v_Vul,

lst_state init [] (true) & v_Vul'= ko -> Vul':= off0; [] (true) & v_Vul'~= ko -> Vul':= Evaluate0;

update [] (~go & go') & Vul = Evaluate0 & Muv_state = ON & IS ~= high -> Vul' := off1; [] (~go & go') & Vul = Evaluate0 & IS = high -> Vul' := let23; [] (~go & go') & Vul = Evaluate0 & Muv_state = OFF & IS ~= high -> Vul' :=

Evaluate1; [] (~go & go') & Vul = off1 & IS = med -> Vul' := Before_Partial_On; [] (~go & go') & Vul = off1 & IS = high -> Vul' := let23; [] (~go & go') & Vul = off1 & IS ~= high & IS ~= med -> Vul' := off2; [] (~go & go') & Vul = Evaluate1 -> Vul' := let23; [] (~go & go') & Vul = Before_Partial_On -> Vul' := let23; [] (~go & go') & Vul = let23 & lst_state' = OFF -> Vul' := sem5; [] (~go & go') & Vul = sem5 & lst_state' = OFF -> Vul' := let60; [] (~go & go') & Vul = let60 & lst_state' = OFF -> Vul' := mpk1; [] (~go & go') & Vul = let23 & lst_state' = ON -> Vul' := Vul_counteracted; [] (~go & go') & Vul = sem5 & lst_state' = ON -> Vul' := Vul_counteracted; [] (~go & go') & Vul = let60 & lst_state' = ON -> Vul' := Vul_counteracted

Page 18: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

19

RM models: laborious to develop and update

Months of tweaking to get the timing righthard to understandhard to debug

RM is too expressive (eg, has clairvoyance)it’s tempting to encode constructs that have no clear biological explanations (strange abstractions)

Summary: modeling in executable biology is laborious

if only we could automate model development

Page 19: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Synthesis and Analysis of Biology Models

20

Page 20: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

21

Our contribution

Automatically infer cell models (synthesis)– obtain executable models faster

Enumerate alternative models (“distinct” synthesis)

– find alternative explanations of observed phenomena

Ask for more specifications (disambiguation)– suggest experiments to disambiguate between

models

Page 21: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

22

Lessons: Build your tools!

Executable biology selects methods based on availability of tools, eg model checkers.

We did the same for synthesis of models. It failed.

We argue here to build our own lightweight tools, including the modeling language and its synthesizer.

We show how to DIY.

Page 22: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

The language

23

Page 23: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

24

Motivation for a high-level language (HLL)

HLL smaller programs smaller search space faster synthesis

HLL programs are biological diagrams easier to read by biologists

Page 24: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Four levels of the language

schedule

concentration update function

Page 25: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

26

Top-level semantics

The program

Inputs: mutation () changes behavior of proteins

schedule () bounded length, controls cell interleaving

Output: fates of cells () resulting fates of cells

Page 26: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

27

Correctness

Top level program

Specification (experiments):

Correctness: i. demonic scheduler cannot produce

unobserved fate

ii. angelic scheduler can produce each observed fate

Page 27: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

28

Level 2: Program is composed from cells

Cells advance according to the scheduleCells communicate by reading each others’ state

state: set of concentrations of proteins of cell proteins

Schedule: The first step executes cells 2, 3, and 6.

Bounded asynchrony: [Fischer et al.]schedule can be partitioned into macrosteps,in each macrostep, each cell makes one step

Our schedules contain exactly macrosteps

Page 28: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Level 3: In cells are proteins

Each cell is composed from proteins.– protein state: discretized protein concentration– proteins read states of other proteins (pot. in

other cells)– they update their own concentration next step

Synchronous execution: – when a cell is scheduled, all of its proteins take

one step– ie, they update their concentration level

[similar to Synchronous/Reactive (SR) model, Edwards and Lee, 2002]

29

Page 29: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Level 4: In proteins are update functions

Protein state , discretized concentrations

Protein update function reads concentrations of attached proteins and updates own

Note: these update functions are what we synthesize

i.e., in our partial models we leave (some) some update functions unspecified

30

Page 30: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

The output fate

The fate of the program is computed with a fate function from the state of each cell

,

where is the state of cell .

31

Page 31: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Example

Assume a network of police cameras. When a gunshot happens, we want at least one nearby camera to take a picture. Synthesize a protocol for deciding which camera takes a picture. OK if multiple cameras do.

Two types of communications: - sound from gunshot (“base station”) to

cameras- radio transmission between camera nodes

announcing “I took a picture, you don’t have to, save your battery”

Nodes should decide who is closest on the basis of sound signal strength. No triangulation.

32

Page 32: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Example

33

Page 33: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Incomplete specification

signal from BS

take picture? signal from BS

take picture? cameras managed to

communicate?

H Y H N Y

N Y

Y Y

H Y L N Y

H Y H Y N

34

Page 34: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Synthesized update functions for base receiver, delay node

35

Page 35: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Synthesis

36

Page 36: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Synthesis

Input to synthesizer:specification partial program (sketch)“biological” invariants see next slide

Output:completion completes into a correct

The synthesis problem:

a 3QBF problem (unlike ordinary 2QBF synthesis): 37

Page 37: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

39

Enforcing Biological Invariants

Synthesized models must satisfy biological invariants.

Biologist’s invariants specify whether one protein activates or inhibits another.

Asserted as monotonicity constraints on state transitions

Page 38: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

The synthesizer

40

Page 39: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Architecture of synthesizer (3.5 KLOC)

DSL embedded in Scalajust defining classes for Cells, Proteins gives nice syntax

evaluate the Scala program result is an abstract syntax graph (ASG)

interpreter for ASG in Scalagiven ASG and (m, s), run the program to get the fate

compiler from ASG to a Z3 formula use by algorithms for verification, synthesis, ambiguity

41

Page 40: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Example of the embedded DSL

class BaseReceiver extends Node("BaseReceiver") { val base = input(“off”, "low", "high") val lateralReceiver = input(“off”, "on") val out = output(“off”, "on")

// update functions implemented as a (more general) FSM val stateful = logic(new StatefulLogic { val off = state("off") // two observable states val on = state("on") output(out) // link these states to output port init(off) // “off” is the start state

nbStates(5) // this state machine will have five hidden states

activating(base) // biological invariants on inputs inhibiting(lateralReceiver) }) register(stateful) // necessitated by the DSL}

42

Page 41: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

How to deal with 3QBF synthesis problem

Domain sizes:holes large treated symbolicallyschedules large treated symbolicallymutations small by demand enumeration

43

Page 42: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Algorithms

45

Page 43: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

46

Synthesis Approach: CEGIS

assume we care only about the classical demonic correctness

synthesize

initial input set(schedule, experiment)

candidate modelSAT

add counterexample(schedule, experiment)

SAT UNSATUNSAT

verify

Page 44: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

47

Synthesis algorithm

∃h((∀𝑚∈𝜋𝑚 (𝐸 ) .¬∃𝑠 (𝑚 ,𝑃 (𝑚 , 𝑠) )∉𝐸 )∧ (∀ (𝑚 , 𝑓 )∈𝐸 .∃𝑠 .𝑃 (𝑚 ,𝑠 )= 𝑓 ))

∃h(𝑚1 ,𝑃 (𝑚1 ,𝑠 1 ) )∈𝐸

∧…∧(𝑚𝑙 ,𝑃 (𝑚𝑙 ,𝑠𝑙 ) )∈𝐸

∧(∃𝑠 .𝑃 (𝑚1, 𝑠)= 𝑓 1 )

∧…∧(∃𝑠 .𝑃 (𝑚𝑘 , 𝑠)= 𝑓 𝑘)

verifier of demonic schedules verifier of angelic schedules

counterexample counterexample

Page 45: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Three communicating solvers

48

3QBF

SAT 2QBF // blasts (m,f), turns to SAT

SAT

2QBF 3QBF

Page 46: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Supporting tools

49

Page 47: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

50

Supporting tools

Work would not be productive without these tools

– execution visualizer– causal tracer– automaton minimizer

We still need ideas on how to construct those quickly

Page 48: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

51

Visualizing the Synthesized Model

activatedconnectionsare colored

step throughexecution

Page 49: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Results

52

Page 50: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

53

Results (1): Automatic model inference

Synthesized a model of VPC in C. elegans- the model expressed in our bio-inspired

language- we believe it’s more readable than in RM

Prior to synthesis– we failed to manually fix a bug in an equivalent

model– collaborators took several months to make this

model

Page 51: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

54

Results (2): Are experiments complete?We concluded that the set of experiments is complete

– this means there exists no alternative model that behaves differently on experiments not yet performed

– this is under the assumption described in the sketch provided by biologists, which encodes their knowledge about C. elegans

Working on identifying minimal set of experiments– if we want to validate these experiment, do we need

to repeat all of them?

Page 52: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

55

Results (3)

No behaviorally distinct models. But we synthesized a model that differs internally.

cell behavior due to a different protein interaction

These models can’t be distinguished via mutation and fate observation (models have same fates, after all).

Hence one must “instrument” the cell by tagging proteins with fluorescent genes.

Here, our synthesis identifies which genes to instrument (the fewer the better).

Page 53: Synthesis for Systems Biology Ras Bodík, Ali Sinan Köksal, Evan Pu, Saurabh Srivastava UC Berkeley Jasmin Fisher Microsoft Research Nir Piterman University

Summary: Executable biology’s challenges

Infer models that can replay all observed behavior

… or else they don’t faithfully model cell phenomena.This semantics leads to a 3QBF synthesis problem.

Analyze the space of plausible modelsAre specs ambiguous, minimal? Which experiments to perform to rule out a model?

56