machine learning methods in statistical model checking and ... · machine learning methods in...

Machine learning methods in StatisticalModel Checking and System Design

Guido Sanguinetti1,2

1Adaptive and Neural Computation, School of InformaticsUniversity of Edinburgh

2SynthSys, Synthetic and Systems Biology, University of Edinburgh

joint work with D. Milios (Edinburgh), and Luca Bortolussi (Saarbruecken/ Trieste)

RV Tutorial, 22/09/15

Outline

Motivation and backgroundProperties and qualitative dataCTMCs and model checking

Smoothed model checkingUncertainty and smoothnessGaussian processesSmoothed model checking results

Robust system design from formal specificationSignal Temporal LogicGP-UCB optimisationExperimental Results

U-check

Broad picture

Machine learning and formal methodsFormal methods give us tools to run and reason about models.Machine learning gives us methods to refine models in the lightof observations.

Ultra-condensed summary of this talkSome formal modelling questions can be viewed as functionestimation problems, for which machine learning methods yieldefficient solutions.

Example 1

RbsRibosome

Ribosome

PLacRNAP

TrLacZ1

TrLacZ2

RbsLacZ

TrRbsLacZLacZ

dgrRbsLacZ

dgrLacZ

r_1, r_2 r_3

r_4r_5

r_6, r_7

Transcription

Translation

We build a model of a real system to investigate properties ofthe trajectories of the system.

Example 2

Sales pitch

I Building models of real-world systems involves uncertainty(typically in the parameters, but also in the structure)

I Verifying/ designing properties on uncertain models isunfeasible in a brute force way

I Rest of the talk: use a smoothness result and treat theproblem as a prediction/ optimisation problem in machinelearning

Sales pitch

I Building models of real-world systems involves uncertainty(typically in the parameters, but also in the structure)

I Verifying/ designing properties on uncertain models isunfeasible in a brute force way

I Rest of the talk: use a smoothness result and treat theproblem as a prediction/ optimisation problem in machinelearning

The problems tackledSatisfaction under Uncertainty Problem

Can we compute satisfaction probabilities of formal propertiesfor uncertain models?

Parameter Estimation problem

Can we learn model parameters from qualitative data, i.e. fromobservations of truth values of formal (temporal logic)

properties?

Synthesis problem

Can we learn model parameters satisfying (as robustly aspossible) a set of formal (temporal logic) specifications?

properties?

Synthesis problem

properties?

Synthesis problem

Population CTMC

We consider population CTMC models, that are CTMC modelsin the biochemical reactions style.

State spaceThe state space is described by a set of n variablesX = X1, . . . ,Xn ∈ N, each counting the number of agents (jobs,molecules, ...) of a given kind.

TransitionsThe dynamics is given by a set of chemical reactions, of theform

s1X1 + . . . + snXn → r1X1 + . . . + rnXn,

with a rate given by a function f (X, θ), depending on the systemvariables and on a set of parameters θ.

Population CTMC

s1X1 + . . . + snXn → r1X1 + . . . + rnXn,

Population CTMC

s1X1 + . . . + snXn → r1X1 + . . . + rnXn,

Metric Interval Temporal Logic

I We express qualitative properties by Metric (interval)Temporal Logic.

I Linear time: in experiments we observe single realisations.I Metric bounds: we can observe a system only for a finite

amount of time.

Syntax

ϕ ::= tt | µ | ¬ϕ | ϕ1 ∧ ϕ2 | ϕ1U[T1,T2]ϕ2,

As customary: F[T1,T2]ϕ ≡ ttU[T1,T2]ϕ, G[T1,T2]ϕ ≡ ¬F[T1,T2]¬ϕ.

I Linear time: in experiments we observe single realisations.

I Metric bounds: we can observe a system only for a finiteamount of time.

Syntax

ϕ ::= tt | µ | ¬ϕ | ϕ1 ∧ ϕ2 | ϕ1U[T1,T2]ϕ2,

I Linear time: in experiments we observe single realisations.I Metric bounds: we can observe a system only for a finite

amount of time.

Syntax

ϕ ::= tt | µ | ¬ϕ | ϕ1 ∧ ϕ2 | ϕ1U[T1,T2]ϕ2,

MITL semantics

Semanticsµ are (non-linear) inequalities on vectors of n variablesI x, t |= µ if and only if µ(x(t)) = tt;I x, t |= ϕ1U[T1,T2]ϕ2 if and only if ∃t1 ∈ [t + T1, t + T2] such

that x, t1 |= ϕ2 and ∀t0 ∈ [t , t1], x, t0 |= ϕ1.

Can have both qualitative (boolean) and quantitative(robustness) semantics.

ExampleG[0,100](XI < 40) ∧ F[0,60]G[0,40](5 ≤ XI ≤ 20) ∧G[30,50](XI > 30)encodes transient behaviour.

MITL semantics

Model Checking

I Model checking: Given a stochastic process and a formula,compute the probability $ of the formula being true

I Problem: almost always analytically impossible/ oftencomputationally unfeasible

I Statistical model checking: $ can be estimated as anaverage of the truth values Ti of the formula over manysamples

I Can be given a Bayesian flavour by specifying priors p($)and estimating a posteriori p($|Ti) using the fact thatp(Ti |$) is Bernoulli and Bayes’ theorem

I This is equivalent to introducing pseudocounts→ usefulwhen rare events could be present

Model Checking

Outline

U-check

Uncertainty in models

I Quantitative models require parameters. Is domainexpertise sufficient to select a particular value of theparameters (out of an uncountable set)?

I Certainly not in many applications like systems biologyI What are the implications for model checking?I Satisfaction probabilities of formulae are defined for fully

specified models.

All model checking algorithms rely on a fully specified model

Some further definitions

Uncertain CTMCConsider a Population CTMC modelM depending on a set of dparameters θ. We only know θ ∈ D, but not their precise value.We callM an Uncertain CTMC (notation: Mθ for fixed θ).

Satisfaction functionSuppose we have a UCTMCM, θ ∈ D, and a Metric TemporalLogic formula ϕ. The the satisfaction probability of ϕ w.r.t. M isa function of θ:

p(ϕ | θ) = Prob{Mθ, x ,0 |= ϕ}.

We call p(ϕ | θ) the satisfaction function.

Goal: can we find statistical methods to approximate p(ϕ | θ)analytically?

p(ϕ | θ) = Prob{Mθ, x ,0 |= ϕ}.

Key property of satisfaction function

Theorem (Bortolussi, Milios and G.S. 2014)Consider a Population CTMC modelM whose transition ratesdepend polynomially on a set of d parameters θ ∈ D. Let ϕ be aMITL formula. The satisfaction function p(ϕ | θ) : D → [0,1] is asmooth function of the parameters θ.

This means that we can transfer information acrossneighbouring points. Can we define a statistical model

checking technique which simultaneously computes the wholesatisfaction function?

Key property of satisfaction function

Theorem (Bortolussi, Milios and G.S. 2014)Consider a Population CTMC modelM whose transition ratesdepend polynomially on a set of d parameters θ ∈ D. Let ϕ be aMITL formula. The satisfaction function p(ϕ | θ) : D → [0,1] is asmooth function of the parameters θ.

This means that we can transfer information acrossneighbouring points. Can we define a statistical model

checking technique which simultaneously computes the wholesatisfaction function?

Smoothness proof (sketch)

I Use the fact that the space of trajectories of a CTMC is acountable union of finite dimensional spaces (indexed bythe finite sequence of transitions)

I Write explicitly the measure on each finite dimensionalspace

I Show that the series of derivatives converges

Random functions

I Bayesian SMC uses the fact the satisfaction probability ofa formula given a model is a number in [0,1], and priordistributions on numbers between [0,1] exist (Betadistribution)

I The object we are interested in is a smooth function. Howdo you construct probability distributions over randomfunctions?

I Simplest idea: fix a set of functions ξi(θ) i = 1, . . . ,N(basis functions)

I Take a linear combination f of the basis function withrandom coefficients wi , f =

∑i wiξi(θ)

I If the coefficients are (multivariate) Gaussian distributed,the value of f at a point θ̂ is Gaussian distributed

Random functions

I Bayesian SMC uses the fact the satisfaction probability ofa formula given a model is a number in [0,1], and priordistributions on numbers between [0,1] exist (Betadistribution)

I The object we are interested in is a smooth function. Howdo you construct probability distributions over randomfunctions?

I Simplest idea: fix a set of functions ξi(θ) i = 1, . . . ,N(basis functions)

I Take a linear combination f of the basis function withrandom coefficients wi , f =

∑i wiξi(θ)

I If the coefficients are (multivariate) Gaussian distributed,the value of f at a point θ̂ is Gaussian distributed

Important exercise

Let ϕ1(x), . . . , ϕN(x) be a fixed set of functions, and letf (x) =

∑wiϕi(x). If w ∼ N(0, I), compute:

1. The single-point marginal distribution of f (x)

2. The two-point marginal distribution of f (x1), f (x2)

I Obviously both distributions are zero-mean GaussiansI To compute the (co)variance, take products and

expectations and remember that 〈wiwj〉 = δij

I Defining ϕ(x) = (ϕ1(x), . . . , ϕN(x)), we get that

〈f (xi)f (xj)〉 = ϕ(xi)Tϕ(xj)

Important exercise

Let ϕ1(x), . . . , ϕN(x) be a fixed set of functions, and letf (x) =

∑wiϕi(x). If w ∼ N(0, I), compute:

1. The single-point marginal distribution of f (x)

2. The two-point marginal distribution of f (x1), f (x2)

I Obviously both distributions are zero-mean GaussiansI To compute the (co)variance, take products and

expectations and remember that 〈wiwj〉 = δij

I Defining ϕ(x) = (ϕ1(x), . . . , ϕN(x)), we get that

〈f (xi)f (xj)〉 = ϕ(xi)Tϕ(xj)

The Gram matrix

I Generalising the exercise to more than two points, we getthat any finite dimensional marginal of this process ismultivariate Gaussian

I The covariance matrix of this function is given byevaluating a function of two variables at all possible pairs

I The function is defined by the set of basis functions

k(xi , xj) = ϕ(xi)Tϕ(xj)

I The covariance matrix is often called Gram matrix and is(necessarily) symmetric and positive definite

I Bayesian prediction in regression then is essentially thesame as computing conditionals for Gaussians (more later)

Main limitation of basis function regression

I Choice of basis functions inevitably impacts what can bepredicted

I Suppose one wishes the basis functions to tend to zero asx → ∞. What would we predict for very large input values?

I Very large input values will have predicted outputs nearzero with high confidence!

I Ideally, one would want a prior over functions which wouldhave the same uncertainty everywhere

Main limitation of basis function regression

I Choice of basis functions inevitably impacts what can bepredicted

I Suppose one wishes the basis functions to tend to zero asx → ∞. What would we predict for very large input values?

I Very large input values will have predicted outputs nearzero with high confidence!

I Ideally, one would want a prior over functions which wouldhave the same uncertainty everywhere

Stationary variance

I We have seen that the variance of a random combinationof functions depends on space as

∑ϕ2

I Given any compact set, (e.g. hypercube with centre in theorigin), we can find a finite set of basis functions s.t.∑ϕ2

i (x) = const (partition of unity, e.g. triangulations orsmoother alternatives)

I We can construct a sequence of such sets which coversthe whole of RD in the limit

I Therefore, we can construct a sequence of priors which allhave constant prior variance across all space

I Covariances would still be computed by evaluating a Grammatrix (and need not be constant)

Function space view

I The argument before shows that we can put a prior overinfinite-dimensional spaces of functions s.t. all finitedimensional marginals are multivariate Gaussian

I The constructive argument, often referred to as weightsspace view, is useful for intuition but impractical

I It does demonstrate the existence of truly infinitedimensional Gaussian processes

I Once we accept that Gaussian processes exist, we arebetter off proceeding along a more abstract line

Gaussian Processes

Gaussian Processes are a natural extension of the multivariatenormal distribution to infinite dimensional spaces of functions.

A GP is a probability measure over the space of continuousfunctions (over a suitable input space) such that the randomvector obtained by evaluating a sample function at a finite set ofpoints follows a multivariate normal distribution.

A GP is uniquely defined by its mean and covariance functions,denoted by µ(x) and k(x , x ′):

f ∼ GP(µ, k)↔ f = (f (x1), . . . , f (xN)) ∼ N (µ,K ) ,

µ = (µ(x1), . . . .µ(xN)), K = (k(xi , xj))i ,j

Gaussian Processes

f ∼ GP(µ, k)↔ f = (f (x1), . . . , f (xN)) ∼ N (µ,K ) ,

µ = (µ(x1), . . . .µ(xN)), K = (k(xi , xj))i ,j

Gaussian Processes

f ∼ GP(µ, k)↔ f = (f (x1), . . . , f (xN)) ∼ N (µ,K ) ,

µ = (µ(x1), . . . .µ(xN)), K = (k(xi , xj))i ,j

The Radial Basis Function Kernel

The kernel function is the most important ingredient of GP (themean function is usually taken to be zero).

Radial Basis Function kernel

k(x , x ′) = γ exp[−‖x − x ′‖2

]It depends on two hyper-parameters, the amplitude γ and thelengthscale λ. Sample functions from a GP with RBFcovariance are with probability 1 infinitely differentiablefunctions.

FactAny smooth function can be approximated to arbitrary precisionby a sample from a GP with RBF covariance

The Radial Basis Function Kernel

The kernel function is the most important ingredient of GP (themean function is usually taken to be zero).

Radial Basis Function kernel

k(x , x ′) = γ exp[−‖x − x ′‖2

]It depends on two hyper-parameters, the amplitude γ and thelengthscale λ. Sample functions from a GP with RBFcovariance are with probability 1 infinitely differentiablefunctions.

FactAny smooth function can be approximated to arbitrary precisionby a sample from a GP with RBF covariance

Bayesian prediction with GPs

I The joint probability of function values at a set of points ismultivariate Gaussian

I If I have observations yi i = 1, . . . ,N of the function atinputs xi , what can I say of the function value at a newpoint x∗?

I By Bayes’ theorem, we have

p(f∗|y) ∝

∫df (x)p(f∗, f (x))p(y|f (x)) (1)

where f (x) is the vector of function values at the inputpoints

I If p(y|f (x)) is Gaussian, then we have a regression taskand the integral in (1) can be computed analytically

Regression calculations

I By the definition of GPs we know that the joint p(fnew , f (x))in equation (1) is a zero-mean multivariate Gaussian withcovariance

(k∗∗ k∗k∗ K

where k∗∗ = k(x∗, x∗), k∗j = k(x∗, xj) and Kij = k(xi , xj)

I Supposing the observations are i.i.d. Gaussianp(y|f (x)) = N(0, σ2I) we can obtain the joint distribution forthe observations and the new value

p(f∗,y) = N(0,Σy) Σy =

(k∗∗ k∗k∗ K + σ2I

Regression calculations

I By the definition of GPs we know that the joint p(fnew , f (x))in equation (1) is a zero-mean multivariate Gaussian withcovariance

(k∗∗ k∗k∗ K

where k∗∗ = k(x∗, x∗), k∗j = k(x∗, xj) and Kij = k(xi , xj)

I Supposing the observations are i.i.d. Gaussianp(y|f (x)) = N(0, σ2I) we can obtain the joint distribution forthe observations and the new value

p(f∗,y) = N(0,Σy) Σy =

(k∗∗ k∗k∗ K + σ2I

Regression calcs cont’d

I Conditioning on observations means fixing the values ofthe variables y to the actual measurements in equation (3)

I The posterior distribution is then obtained by completingthe square and using the partitioned inverse formula (seee.g. Rasmussen and Williams, Gaussian Processes forMachine learning, MIT press 2006)

I The final result is that p(f∗|y) = N(m, v) with

m = kT∗ Σ−1

y y v = k∗∗ − kT∗ Σ−1

y k∗ (4)

Observations

I Equations (4) provide the predictive distribution analyticallyat all points in space

I The expectation is always a linear combination of theobservations; far away points contribute less to theprediction (being multiplied by the fast decaying term k∗j

I Addition of a new observation always reduces uncertaintyat all points→ vulnerable to outliers

I MAIN PROBLEM: GP prediction relies on a matrixinversion which scales cubically with the number of points!

I Sparsification methods have been proposed but in highdimension GP regression is likely to be tricky nevertheless

GP regression - example

X0 0.5 1 1.5 2 2.5 3 3.5 4

5True functionGP predictioncb 95%cb 95%observation

Smoothed model checking: An intuitive view

parameter

probability

parameter

probability

parameter

probability

GP prediction from binary observations

I Equation (1) holds whatever p(y|f (x)).I In the model checking case, we have that the truth of a

formula at a specified parameter value is a booleanvariable following a Bernoulli distribution

I We can encode this in a GP prediction mechanism andwork directly with (few) truth observations at a set ofparameter values

I The integral in (1) is no longer analytically computable, weuse Expectation-Propagation (EP) an established fastapproximate inference method

Smoothed model checking

We call the following statistical model checking algorithmSmoothed model checking

1. Input a set of parameter values and a number ofobservations per parameter value

2. Perform (approximate) GP prediction to obtain estimates ofsatisfaction function and uncertainties

3. If uncertainty is too high, increase the resolution of theparameter grid/ number of observations and repeat.

4. Return estimated satisfaction function and uncertainties

Network Epidemics

We investigate a SIR-like network epidemics model, with anetwork of 100 nodes having three states: susceptible (XS),infected (XI), and patched (XR). The dynamics is described bythe following reactions:

External infection: Ske−−→ I, with rate function keXS;

Internal infection: S + Iki−→ I + I, with rate function kiXSXI ;

Patching: Ikr−→ R, with rate function kr XI ;

Immunity loss: Rks−−→ S, with rate function ksXR;

Example: Epidemics

SIR modelWe investigate the dependence of truth probability of the property

ϕ = (XI > 0)U[100,120](XI = 0)

on the infection rate: kI and recovery rate: kR . We use ten simulationsat each of ten parameter values.

0 0.05 0.1 0.15 0.2 0.25 0.30

infection rate

abilit

0 0.05 0.1 0.15 0.20

recovery rate

abilit

(left: function of kI , right: function of kR . Blue dots, SMC for 5000 runsper parameter value)

Example: Epidemics

SIR modelWe investigate the dependence of truth probability of the property

ϕ = (XI > 0)U[100,120](XI = 0)

on the infection rate: kI and recovery rate: kR . We use ten simulationsat each of ten parameter values.

0 0.05 0.1 0.15 0.2 0.25 0.30

infection rate

abilit

0 0.05 0.1 0.15 0.20

recovery rate

abilit

(left: function of kI , right: function of kR . Blue dots, SMC for 5000 runsper parameter value)

Example: Epidemics

SIR model - two free parametersWe estimate p(ϕ | kI , kR) from 2560 traces for the same property.

ϕ = (XI > 0)U[100,120](XI = 0)

00.050.10.150.20

(left: smoothed MC, right SMC on 16x16 points at 5000 runs perpoint. Time SMC= 580× smoothed MC)

Example: Epidemics

SIR model - two free parametersWe estimate p(ϕ | kI , kR) from 2560 traces for the same property.

ϕ = (XI > 0)U[100,120](XI = 0)

00.050.10.150.20

(left: smoothed MC, right SMC on 16x16 points at 5000 runs perpoint. Time SMC= 580× smoothed MC)

Example: LacZ operon

RbsRibosome

Ribosome

PLacRNAP

TrLacZ1

TrLacZ2

RbsLacZ

TrRbsLacZLacZ

dgrRbsLacZ

dgrLacZ

r_1, r_2 r_3

r_4r_5

r_6, r_7

Transcription

Translation

Stochastic model of celebrated regulatory circuit in E. colianabolism (Kierzek,2002).

Example: LacZ operon

40006000

800010000

40006000

800010000

Model checking the probability of bursting expression

ϕ = F[16000,21000](∆XLacZ > 0 ∧ G[10,2000]∆XLacZ ≤ 0

)as we vary k2 and k7. Left smoothed MC, right SMC (100 drawsper parameter value).

Quantitative evaluation

Table: RMSE for the expression burst formula (LacZ model); the k2and k7 parameters have been explored simultaneously. A grid of 256parameter values and a varying number of observations per valuehas been used. The true values were approximated via SMC using2000 simulation runs. The MSE values presented are the average of5 independent experiment iterations.

Obs. per valueRMSE

Bayesian SMCSmoothed MC

100 points 256 points

5 0.1819 ± 0.017 0.0513 ± 0.033 0.0390 ± 0.01110 0.1246 ± 0.008 0.0412 ± 0.006 0.0322 ± 0.00320 0.0954 ± 0.010 0.0375 ± 0.011 0.0248 ± 0.00350 0.0588 ± 0.005 0.0266 ± 0.005 0.0200 ± 0.004

100 0.0427 ± 0.006 0.0200 ± 0.002 0.0159 ± 0.001200 0.0309 ± 0.003 0.0171 ± 0.002 0.0141 ± 0.002

Outline

U-check

Robust Design with Temporal Logic

Qualitative dataWe consider a set of system requirements specified as (linear)temporal logic formulae.

Problem: Robust Synthesis/ DesignFind the set of model parameters that such that the modelsatisfies the requirements as robustly as possible.

Notion of robustnessThis requires a proper notion of robustness of satisfaction for atemporal logic formula.

Signal Temporal Logic

STL is metric linear time logic for real-valued signals.

STL SyntaxGiven a (primary) real-valued signal x [t ] = (x1[t ], ..., xn[t ]),t ∈ R>0, xi ∈ R, the STL syntax is given by

ϕ := µ | ¬ϕ | ϕ1 ∧ ϕ2 | ϕ1 U[a,b] ϕ2

I µ : Rn → B is an atomic predicate s.t. µ(x) := (y(x) > 0),I y : Rn → R a real-valued function, the secondary signal.

As usual, F[a,b]ϕ := ttU[a,b]ϕ and G[a,b]ϕ := ¬F[a,b]¬ϕ.

ϕ := µ | ¬ϕ | ϕ1 ∧ ϕ2 | ϕ1 U[a,b] ϕ2

I µ : Rn → B is an atomic predicate s.t. µ(x) := (y(x) > 0),

I y : Rn → R a real-valued function, the secondary signal.

ϕ := µ | ¬ϕ | ϕ1 ∧ ϕ2 | ϕ1 U[a,b] ϕ2

STL Quantitative SemanticsThe quantitative satisfaction function ρ is defined by

ρ(µ, x , t) = y(x [t ]) where µ ≡ (y(x [t ]) > 0)

ρ(¬ϕ, x , t) = − ρ(ϕ, x , t)

ρ(ϕ1 ∧ ϕ2, x , t) = min(ρ(ϕ1, x , t), ρ(ϕ2, x , t))

ρ(ϕ1 U[a,b)ϕ2, x , t) = maxt ′∈t+[a,b]

(min(ρ(ϕ2, x , t ′)), mint ′′∈[t ,t ′]

(ρ(ϕ1, x , t ′′))).

This satisfaction score can be computed efficiently forpiecewise linear signals, see the Breach Matlab Toolbox

Satisfability of Stochastic ModelsThe boolean semantics of an STL formula ϕ can be easilyextended to stochastic models.as customary, by measuring the probability of the set oftrajectories of the CTMC that satisfy the formula:

P(ϕ) = P{x | x |= ϕ}.

I The stochastic process X(t) is interpreted as a randomvariable X over the space of trajectories D.

I The set of trajectories that satisfy an STL formula ϕ can bethought as a measurable function

Iϕ : D → {0,1}, s.t. Iϕ(x) = 1 ⇔ x |= ϕ.

Hence,

P(Iϕ(X) = 1) = P({x ∈ D | Iϕ(x) = 1}) = P{x | x |= ϕ} = P(ϕ)

P(Iϕ(X) = 1) = P({x ∈ D | Iϕ(x) = 1}) = P(I−1ϕ (1))

P(ϕ) = P{x | x |= ϕ}.

Iϕ : D → {0,1}, s.t. Iϕ(x) = 1 ⇔ x |= ϕ.

Hence,

P(Iϕ(X) = 1) = P({x ∈ D | Iϕ(x) = 1}) = P(I−1ϕ (1))

P(ϕ) = P{x | x |= ϕ}.

Iϕ : D → {0,1}, s.t. Iϕ(x) = 1 ⇔ x |= ϕ.

Hence,

P(Iϕ(X) = 1) = P({x ∈ D | Iϕ(x) = 1}) = P(I−1ϕ (1))

P(ϕ) = P{x | x |= ϕ}.

Iϕ : D → {0,1}, s.t. Iϕ(x) = 1 ⇔ x |= ϕ.

Hence,

P(Iϕ(X) = 1) = P({x ∈ D | Iϕ(x) = 1}) = P(I−1ϕ (1))

Robustness of Stochastic Models

The quantitative satisfaction function

ρ(ϕ,x) : D → R

(with respect to the σ-algebra induced from the Skorokhodtopology in D),

induces a real-valued random variable Rϕ(X) with probabilitydistribution

P(Rϕ(X) ∈ [a,b]

)= P (X ∈ {x ∈ D | ρ(ϕ,x,0) ∈ [a,b]})

Indicators:I E(Rϕ) (The average robustness degree)I E(Rϕ | Rϕ > 0) and E(Rϕ | Rϕ < 0) (The conditional

average)

Robustness of Stochastic Models

The quantitative satisfaction function

ρ(ϕ,x) : D → R

(with respect to the σ-algebra induced from the Skorokhodtopology in D),

induces a real-valued random variable Rϕ(X) with probabilitydistribution

P(Rϕ(X) ∈ [a,b]

)= P (X ∈ {x ∈ D | ρ(ϕ,x,0) ∈ [a,b]})

Indicators:I E(Rϕ) (The average robustness degree)I E(Rϕ | Rϕ > 0) and E(Rϕ | Rϕ < 0) (The conditional

average)

Schlögl systemCTMC model of a Schlögl system.Biochemical reactions of the Schlögl model:

Reaction rate constant init popA + 2X → 3X k1 = 3 · 10−7 X (0) = 2473X → A + 2X k2 = 1 · 10−4 A(0) = 105

B → X k3 = 1 · 10−3 B(0) = 2 · 105

X → B k4 = 3.5

Simulation of the Schloglmodel (100 runs):

starting close to theboundary of the basin ofattraction, the bistablebehaviour is evident

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21time

values

Schlögl systemCTMC model of a Schlögl system.Biochemical reactions of the Schlögl model:

Reaction rate constant init popA + 2X → 3X k1 = 3 · 10−7 X (0) = 2473X → A + 2X k2 = 1 · 10−4 A(0) = 105

B → X k3 = 1 · 10−3 B(0) = 2 · 105

X → B k4 = 3.5

Simulation of the Schloglmodel (100 runs):

starting close to theboundary of the basin ofattraction, the bistablebehaviour is evident

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21time

values

Schlögl system

STL formula : ϕ : F[0,T1]G[0,T2](X ≥ kt ) kt = 300

−300 −200 −100 0 100 200 3000

robustness degree

Statistical estimation of ϕ: p = 0.4583 (10000 runs, error ±0.02at 95% confidence level).

Schlögl system

Satisfaction probability versus average robustness degree

0 0.2 0.4 0.6 0.8 1−400

−300

−200

−100

satisfaction probability

Varying threshold kt

Correlation around 0.8386,dependency seems to follow a

sigmoid shaped curve.

0 0.2 0.4 0.6 0.8 1−300

−200

−100

satisfaction probability

Varying k3

Correlation around 0.9718 with anevident linear trend.

The System Design Problem

“Given a stochastic model, depending on a set of parameters θ ∈ K ,and a specification ϕ (STL formula), find the parameter combinationθ∗ s.t. the system satisfies ϕ as robustly as possible”.

Solution strategy:I rephrase it as an unconstrained optimisation problem: we

maximise the expected robustness degree.

I evaluate the function to optimise using statistical modelchecking with a fixed number of runs

I solve the optimisation problem using the GaussianProcess-Upper Confidence Bound optimisation (GP-UCB)

The optimisation problem

I We need to maximise the expected robustness.

I Each evaluation of this function is costly (we obtain it bySMC, need to run SSA many times).

I Each evaluation of this function is noisy (we estimate thevalue by SMC - noise is approximatively gaussian).

We need to maximise an unknown function, which we canobserve with noise, with the minimal number of evaluations.

We use the GP-UCB algorithm (Srinivas et al 2012)

I We need to maximise the expected robustness.I Each evaluation of this function is costly (we obtain it by

SMC, need to run SSA many times).

I Each evaluation of this function is noisy (we estimate thevalue by SMC - noise is approximatively gaussian).

SMC, need to run SSA many times).I Each evaluation of this function is noisy (we estimate the

value by SMC - noise is approximatively gaussian).

The GP-UCB algorithm

Basic IdeaUse GP regression to emulate the unknown function, and toexplore the region near the maximum of the posterior mean.Doing this naively⇒ trapped in local optima.

Balance Exploration and ExploitationMaximise an upper quantile of the distribution, obtained asmean value plus a constant times the standard deviation:

xt+1 = argmaxx [µt (x) + βtvart (x)]

Then xt is added to the observation points.

The algorithm has a convergence guarantee in terms of regretbounds (for slowly increasing βt ).

The GP-UCB algorithm - example

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 310

Experimental Results (Schlögl System)Statistics of the results of ten experiments to optimize the parameterk3 , for ϕ : F[0,T1]G[0,T2](X ≥ kt ), in the range [50,1000]:

Parameter mean Parameter range Mean probabilityk3 = 997.78 [979.31 999.99] 1

Average Robustness Number of function evaluations Number of simulation runs348.97 34.4 3440

The emulated robustness function in the optimisation of k3

100 200 300 400 500 600 700 800 900 1000−3000

−2500

−2000

−1500

−1000

−500

Experimental Results (Schlögl System)

The distribution of the robustness score for ϕ : F[0,T1]G[0,T2](X ≥ kt )

with k3 = 999.99, T1 = 10, T2 = 15 and kt = 300

250 300 350 4000

robustness degree

U-check

I Open-source java implementation of smoothed modelchecking and learning/ designing from qualitative data

I Takes as input models specified as PRISM, BioPEPA orSimHyA

I Outputs results readable/ analysable with MATLAB orGNUplot

I Java source code available athttps://github.com/dmilios/U-check

Other related things

I Learning parametrisations from formulae truthobservations (QEST 2013)

I Learning parametric formulae from time series data (withE. Bartocci, FORMATS 2014)

I Spatio-temporal generalisations of robust system design (with E. Bartocci and L. Nenzi, HSB 2015)

I Statistical abstractions of stiff dynamical systems forefficient simulations (CMSB 2015)

Conclusions

I Uncertainty has implications also for formal analysistechniques

I When there is uncertainty, machine learning is likely to beof use

I We proposed methods relying on Gaussian Processes toefficiently compute the probability of a certain property asa function of the parameters, and to train models fromqualitative observations.

I Similar ideas can be applied to other performance metrics,such as expected rewards.

I We have just released a tool: U-check.

Conclusions

The End

and thanks to the ERC!

Bibliography

I L. Bortolussi, D. Milios and G. S. Smoothed Model Checking for Uncertain Continuous Time Markov Chains.http://arxiv.org/pdf/1402.1450.pdf, to appear in Information and Computation.

I E. Bartocci, L. Bortolussi, L. Nenzi, and G. S. On the robustness of temporal properties for stochasticmodels. Electronic Proceedings in Theoretical Computer Science, 125:3–19, Aug. 2013.

I E. Bartocci, L. Bortolussi, L. Nenzi and G. S., System design of stochastic models using robustness oftemporal properties, Theor. Comp. Sci. 587,3-25.

I L. Bortolussi and G. Sanguinetti. Learning and designing stochastic processes from logical constraints. InProceedings of QEST 2013, 2013 and LMCS 2015.

I L. Bortolussi, D. Milios and G.S., U-check: Model Checking and Parameter Synthesis under Uncertainty,QEST 2015

I C. E. Rasmussen and C. K. I. Williams. Gaussian processes for machine learning. MIT Press, Cambridge,Mass., 2006.

I N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger. Information-theoretic regret bounds for gaussianprocess optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):3250–3265, May2012.

I A. Donzé and O. Maler. Robust satisfaction of temporal logic over real-valued signals. In Proc. ofFORMATS, 2010.

machine learning methods in statistical model checking and ... · machine learning methods in...

Documents