part 5 - prediction error methods

93
System Identification Control Engineering B.Sc., 3 rd year Technical University of Cluj-Napoca Romania Lecturer: Lucian Bus ¸oniu

Upload: farideh

Post on 07-Jul-2016

238 views

Category:

Documents


2 download

DESCRIPTION

System Identification; Technical University of Cluj-Lucian Busoniu

TRANSCRIPT

Page 1: Part 5 - Prediction Error Methods

System IdentificationControl Engineering B.Sc., 3rd yearTechnical University of Cluj-Napoca

Romania

Lecturer: Lucian Busoniu

Page 2: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Part V

Prediction error methods

Page 3: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

2 Model structures

3 General prediction error methods

4 Implementation and extensions

We stay in the single-output, single-input case for all but the last section.

Page 4: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Classification

Recall Types of models from Part I:

1 Mental or verbal models2 Graphs and tables (nonparametric)3 Mathematical models, with two subtypes:

First-principles, analytical modelsModels from system identification

Prediction error methods produce parametric, polynomial models.

Page 5: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

Analytical development

Theoretical guarantee

Matlab example

2 Model structures

3 General prediction error methods

4 Implementation and extensions

Page 6: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Recall: discrete-time model

Page 7: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

ARX model structure

We consider first the ARX model structure, where the output y(k) atthe current discrete time step is computed based on previous inputand output values:

y(k) + a1y(k − 1) + a2y(k − 2) + . . . + anay(k − na)

= b1u(k − 1) + b2u(k − 2) + . . . + bnbu(k − nb) + e(k)

equivalent toy(k) = −a1y(k − 1)− a2y(k − 2)− . . .− anay(k − na)

b1u(k − 1) + b2u(k − 2) + . . . + bnbu(k − nb) + e(k)

e(k) is the noise at step k .

Model parameters: a1, a2, . . . , ana and b1, b2, . . . , bnb.

Name: AutoRegressive (y(k) a function of previous y values) witheXogenous input (dependence on u)

Page 8: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Polynomial representation

Backward shift operator q−1:

q−1z(k) = z(k − 1)

where z(k) is any discrete-time signal.

Then:

y(k) + a1y(k − 1) + a2y(k − 2) + . . . + anay(k − na)

= (1 + a1q−1 + a2q−2 + . . . + anaq−na)y(k) =: A(q−1)y(k)

and:b1u(k − 1) + b2u(k − 2) + . . . + bnbu(k − nb)

= (b1q−1 + b2q−2 + . . . + bnbq−nb)u(k) =: B(q−1)u(k)

Page 9: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

ARX model in polynomial form

Therefore, the ARX model is written compactly:

A(q−1)y(k) = B(q−1)u(k) + e(k)

The symbolic representation in the figure holds because:

y(k) =1

A(q−1)[B(q−1)u(k) + e(k)]

Remark: The ARX model is quite general, it can describe arbitrarylinear relationships between inputs and outputs. However, the noiseenters the model in a restricted way, and later we introduce modelsthat generalize this.

Page 10: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Linear regression model

Returning to the explicit recursive representation:

y(k) =− a1y(k − 1)− a2y(k − 2)− . . .− anay(k − na)

b1u(k − 1) + b2u(k − 2) + . . . + bnbu(k − nb) + e(k)

=[−y(k − 1) · · · −y(k − na) u(k − 1) · · · u(k − nb)

]·[a1 · · · ana b1 · · · bnb

]>+ e(k)

=:ϕ>(k)θ + e(k)

So in fact ARX obeys the standard model structure in linearregression!

Regressor vector: ϕ ∈ Rna+nb, previous output and input values.Parameter vector: θ ∈ Rna+nb, polynomial coefficients.

Page 11: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Identification problem

Consider now that we are given a vector of data u(k), y(k),k = 1, . . . , N, and we have to find the model parameters θ.

Then for any k :y(k) = ϕ>(k)θ + ε(k)

where ε(k) is now interpreted as an equation error.

Objective: minimize the mean squared error:

V (θ) =1N

N∑k=1

ε(k)2

Remark: When k ≤ na, nb, zero- and negative-time values for u andy are needed to construct ϕ. They can be taken 0 (assuming thesystem is in zero initial conditions).

Page 12: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Linear system

y(1) =[−y(−1) · · · −y(−na) u(−1) · · · u(−nb)

y(2) =[−y(0) · · · −y(1− na) u(0) · · · u(1− nb)

· · ·y(N) =

[−y(N − 1) · · · −y(N − na) u(N − 1) · · · u(N − nb)

Matrix form:y(1)y(2)

...y(N)

=

−y(−1) · · · −y(−na) u(−1) · · · u(−nb)−y(0) · · · −y(1− na) u(0) · · · u(1− nb)

......

......

−y(N − 1) · · · −y(N − na) u(N − 1) · · · u(N − nb)

· θY = Φθ

with notations Y ∈ RN and Φ ∈ RN×(na+nb).

Page 13: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

ARX solution

From linear regression, to minimize 12

∑Nk=1 ε(k)2 the parameters are:

θ = (Φ>Φ)−1Φ>Y

Since the new V (θ) = 1N

∑Nk=1 ε(k)2 is proportional to the criterion

above, the same solution also minimizes V (θ).

However, the form above is impractical in system identification, sincethe number of data points N can be very large. Better form:

Φ>Φ =N∑

k=1

ϕ(k)ϕ>(k), Φ>Y =N∑

k=1

ϕ(k)y(k)

⇒ θ =

[N∑

k=1

ϕ(k)ϕ>(k)

]−1 [N∑

k=1

ϕ(k)y(k)

]

Page 14: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

ARX solution (continued)

Remaining issue: the sum of N terms can grow very large, leading tonumerical problems: (matrix of very large numbers)−1· vector of verylarge numbers.

Solution: Normalize element values by diving them by N. Inequations, N simplifies so it has no effect on the analyticaldevelopment, but in practice it keeps the numbers reasonable.

θ =

[1N

N∑k=1

ϕ(k)ϕ>(k)

]−1 [1N

N∑k=1

ϕ(k)y(k)

]

Page 15: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

Analytical development

Theoretical guarantee

Matlab example

2 Model structures

3 General prediction error methods

4 Implementation and extensions

Page 16: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Main result

Assumptions1 There exists a true parameter vector θ0 so that:

y(k) = ϕ(k)θ0 + v(t)

with v(k) a stationary stochastic process independent from u(k).2 E

{ϕ(k)ϕ>(k)

}is a nonsingular matrix.

3 E {ϕ(k)v(k)} = 0.

Theorem

ARX identification is consistent: the estimated parameters θ tend tothe true parameters θ0 as N →∞.

Page 17: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Discussion of assumptions

1 Assumption 1 is equivalent to the existence of true polynomialsA0(q−1), B0(q−1) so that:

A0(q−1)y(k) = B0(q−1)u(k) + v(k)

To motivate Assumption 2, recall

θ =[

1N

∑Nk=1 ϕ(k)ϕ>(k)

]−1 [1N

∑Nk=1 ϕ(k)y(k)

]As N →∞, 1

N

∑Nk=1 ϕ(k)ϕ>(k) → E

{ϕ(k)ϕ>(k)

}.

2 E{ϕ(k)ϕ>(k)

}is nonsingular if the data is “sufficiently

informative” (e.g., u(k) should not be a simple feedback fromy(k); see Soderstrom & Stoica for more discussion).

3 E {ϕ(k)v(k)} = 0 if v(k) is white noise. For more details onAssumption 3 and the role of E {ϕ(k)v(k)} = 0, see Part VI.

Page 18: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

Analytical development

Theoretical guarantee

Matlab example

2 Model structures

3 General prediction error methods

4 Implementation and extensions

Page 19: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Experimental data

Consider we are given the following, separate, identification andvalidation data sets.

plot(id); and plot(val);

Remarks: Identification input: a so-called pseudo-random binarysignal, an approximation of (non-zero-mean) white noise. Validationinput: a sequence of steps.

Page 20: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Identifying an ARX modelmodel = arx(id, [na, nb, nk]);

Arguments:

1 Identification data.2 Array containing the orders of A and B and the delay nk .

Structure slightly different from theory: includes the explicit minimumdelay nk between inputs and outputs, useful to model systems withtime delays.

y(k) + a1y(k − 1)+a2y(k − 2) + . . . + anay(k − na)

= b1u(k − nk)+b2u(k − nk − 1) + . . . + bnbu(k − nk − nb + 1) + e(k)

A(q−1)y(k) = B(q−1)u(k − nk) + e(k), where:

A(q−1) = (1 + a1q−1 + a2q−2 + . . . + anaq−na)

B(q−1) = (b1 + b2q−1 + bnbq−nb+1)

Note: can transform this into theoretical structure by rewriting theright-hand side using a B(q−1) of order nk + nb − 1, with nk − 1leading zeros.

Page 21: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Model validation

Assuming the system is second-order with no time delay, we takena = 2, nb = 1, nk = 1. Validation: compare(model, val);

Results are quite bad.

Page 22: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Structure selection

Better idea: try many different structures and choose the best one.

Na = 1:15;Nb = 1:15;Nk = 1:5;NN = struc(Na, Nb, Nk);V = arxstruc(id, val, NN);

struc generates all combinations of orders in Na, Nb, Nk.arxstruc identifies for each combination an ARX model (on thedata in 1st argument), simulates it (on the data in the 2ndargument), and returns all the MSEs on the first row of V (seehelp arxstruc for the format of V).

Page 23: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Structure selection (continued)

To choose the structure with the smallest MSE:N = selstruc(V, 0);

For our data, N= [8, 7, 1].

Alternatively, graphical selection: N = selstruc(V, ’plot’);

(Later we learn other structure selection criteria than smallest MSE.)

Page 24: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Validation of best ARX model

model = arx(id, N); compare(model, val);

A better fit is obtained.

Page 25: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

2 Model structures

3 General prediction error methods

4 Implementation and extensions

Page 26: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Motivation

Before deriving the general PEM, we will introduce the general classof models to which it can be applied.

Page 27: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

General model structure

y(k) = G(q−1)u(k) + H(q−1)e(k)

where G and H are discrete-time transfer functions – ratios ofpolynomials. Signal e(t) is zero-mean white noise.

By placing the common factors in the denominators of G and H inA(q−1), we get the more detailed form:

A(q−1)y(k) =B(q−1)

F (q−1)u(k) +

C(q−1)

D(q−1)e(k)

where A, B, C, D, F are all polynomials, of orders na, nb, nc, nd , nf .

Page 28: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

General model structure (continued)

A(q−1)y(k) =B(q−1)

F (q−1)u(k) +

C(q−1)

D(q−1)e(k)

Very general form, all other linear forms are special cases of this. Notfor practical use, but to describe algorithms in a generic way.

Page 29: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

ARMAX model structure

Setting F = D = 1 (i.e. orders nf = nd = 0), we get:

A(q−1)y(k) = B(q−1)u(k) + C(q−1)e(k)

Name: AutoRegressive, Moving Average (referring to noise model)with eXogenous input (dependence on u)

Page 30: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

ARMAX: explicit form

A(q−1)y(k) = B(q−1)u(k) + C(q−1)e(k)

A(q−1) = 1 + a1q−1 + . . . + anaq−na

B(q−1) = b1q−1 + . . . + bnbq−nb

C(q−1) = 1 + c1q−1 + . . . + cncq−nc

y(k) + a1y(k − 1) + . . . + anay(k − na)

= b1u(k − 1) + . . . + bnbu(k − nb)

+ e(k) + c1e(k − 1) + . . . + cnce(k − nc)

with parameter vector:

θ = [a1, . . . , ana, b1, . . . , bnb, c1, . . . , cnc ]> ∈ Rna+nb+nc

Compared to ARX, ARMAX can model more intricate relationshipsbetween the disturbance and the inputs and outputs.

Page 31: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Special case of ARMAX: ARX

Setting C = 1 in ARMAX (nc = 0), we get:

A(q−1)y(k) = B(q−1)u(k) + e(k)

precisely the ARX model we worked with before.

Page 32: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Special case of ARX: FIR

Further setting A = 1 (na = 0) in ARX, we get:

y(k) = B(q−1)u(k) + e(k) =nb∑j=1

bju(k − j) + e(k)

=M−1∑j=0

h(j)u(k − j) + e(k)

the FIR model from correlation analysis!

(where h(0),the impulse response at time 0, is assumed 0 – i.e.system does not respond instantaneously to changes in input)

Page 33: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Difference between ARX and FIR

ARX: A(q−1)y(k) = B(q−1)u(k) + e(k)

FIR: y(k) = B(q−1)u(k) + e(k)

Since ARX includes recursive relationships between current andprevious outputs, it will be sufficient to take orders na and nb (atmost) equal to the order of the dynamical system.

FIR needs a sufficiently large order nb (or length M) to model theentire transient regime of the impulse response (in principle, we onlyrecover the correct model as M →∞).

⇒ more parameters ⇒ more data needed to identify them.

Page 34: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Overall relationship

General Form ⊃ ARMAX ⊃ ARX ⊃ FIR

ARMAX to ARX: Less freedom in modeling disturbance.ARX to FIR: More parameters required.

Page 35: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Output error

Other model forms are possible that are not special cases of ARMAX,e.g. Output Error, OE:

y(k) =B(q−1)

F (q−1)u(k) + e(k)

obtained for na = nc = nd = 0, i.e. A = C = D = 1.

This corresponds to simple, additive measurement noise on theoutput (the “output error”).

Page 36: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

2 Model structures

3 General prediction error methods

Stepping stones: ARX and ARMAX

General case

Matlab example

Theoretical guarantees

4 Implementation and extensions

Page 37: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

ARX reinterpreted as a PEM

1 Compute predictions at each step, y(k) = ϕ>(k)θ givenparameters θ.

2 Compute prediction errors at each step, ε(k) = y(k)− y(k).3 Find a parameter vector θ minimizing criterion

V (θ) = 1N

∑Nk=1 ε2(k).

Prediction error methods are obtained by extending this procedure togeneral model structures.

Page 38: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

ARX reinterpreted as a PEM (continued)

Remarks:

ARX predictor y(k) is chosen to achieve the error ε(k) = e(k).We will aim to achieve the same error in the general PEM,intuitively because we cannot do better.For the mathematically inclined: Such a predictor is optimal in the senseof minimizing the variance of ε(k).

The prediction error is for ARX just a rearrangement of theequation y(k) = ϕ>(k)θ + ε(k) = y(k) + ε(k).The criterion can be more general than the MSE, but we willalways use MSE in this section. For ARX, we already know howto minimize the MSE (from linear regression); for more generalmodels new methods will be introduced.

Page 39: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Intermediate step: PEM for 1st order ARMAX

To make things easier to follow, we first derive PEM for a 1st orderARMAX.

Looking at slide ARMAX: explicit form, we write 1st order ARMAX in aslightly different way:

y(k) = −ay(k − 1) + bu(k − 1) + ce(k − 1) + e(k)

Page 40: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

1st order ARMAX: predictor

To achieve error e(k), the predictor must be:

y(k) = −ay(k − 1) + bu(k − 1) + ce(k − 1) (5.1)

This depends on unknown noise e(k − 1). However, we derive arecursive predictor formula that does not.

y(k − 1) = −ay(k − 2) + bu(k − 2) + ce(k − 2) (5.2)

From Eqn. (5.1) +c Eqn. (5.2):

y(k) + cy(k − 1)

=− ay(k − 1) + bu(k − 1) + ce(k − 1)

+ c(−ay(k − 2) + bu(k − 2) + ce(k − 2))

=− ay(k − 1) + bu(k − 1) + ce(k − 1)

+ c(−ay(k − 2) + bu(k − 2) + ce(k − 2) + e(k − 1)− e(k − 1))

=− ay(k − 1) + bu(k − 1) + ce(k − 1) + cy(k − 1)− ce(k − 1)

=(c − a)y(k − 1) + bu(k − 1)

Page 41: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

1st order ARMAX: predictor (continued)

Final recursion:

y(k) = −cy(k − 1) + (c − a)y(k − 1) + bu(k − 1)

Requires initialization at y(0); this initial value is usually taken 0.

This initialization needs some technical conditions to work properly, we do notgo into them here.

Page 42: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

1st order ARMAX: prediction error

Similar recursion:

ε(k) = y(k)− y(k)

ε(k − 1) = y(k − 1)− y(k − 1)

⇒ ε(k) + cε(k − 1) = y(k) + cy(k − 1)− (y(k) + cy(k − 1))

= y(k) + cy(k − 1)− ((c − a)y(k − 1) + bu(k − 1))

= y(k) + ay(k − 1)− bu(k − 1)

⇒ ε(k) = −cε(k − 1) + y(k) + ay(k − 1)− bu(k − 1)

Requires initialization of ε(0), usually taken 0.

Page 43: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Finding the parameters

Once the errors are available, the parameter θ is found by minimizingcriterion V (θ) = 1

N

∑Nk=1 ε2(k).

Page 44: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

2 Model structures

3 General prediction error methods

Stepping stones: ARX and ARMAX

General case

Matlab example

Theoretical guarantees

4 Implementation and extensions

Page 45: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Recall: General model structure

y(k) = G(q−1)u(k) + H(q−1)e(k)

where G and H are discrete-time transfer functions.

Page 46: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Prediction error

We start by deriving e(k).

y(k) = G(q−1)u(k) + H(q−1)e(k)

⇒ e(k) = H−1(q−1)(y(k)−G(q−1)u(k))

where H−1 is the inverse of the fraction of polynomials H.

Recall that predictor will be derived so that the prediction errorε(k) = y(k)− y(k) = e(k), so the formula above can also be used tocompute ε(k).

Page 47: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Predictor

To achieve error e(k), the predictor must be:

y(k) = y(k)− e(k) = Gu(k) + (H − 1)e(k)

= Gu(k) + (H − 1)H−1(y(k)−Gu(k))

= Gu(k) + (1− H−1)(y(k)−Gu(k))

= Gu(k) + (1− H−1)y(k)−Gu(k) + H−1Gu(k))

= (1− H−1)y(k) + H−1Gu(k)

=: L1y(k) + L2u(k)

where we skipped q−1 to make the equations readable.

Remarks:

New notations L1(q−1) = 1− H−1(q−1),L2(q−1) = H−1(q−1)G(q−1).In order to have a causal predictor (which only depends on pastvalues of the output and input), we need G(0) = 0 and H(0) = 1,leading to L1(0) = L2(0) = 0.

Page 48: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

1st order ARMAX: finding the parameters

Once the predictors and errors are available, the parameter θ is foundby minimizing criterion V (θ) = 1

N

∑Nk=1 ε2(k).

Again, linear regression will not work in general. Computationalmethods to solve the minimization problem will be introduced in thenext section.

Page 49: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Specializing the framework to ARX

It is instructive to see how the formulas simplify in the ARX case.Rewriting ARX in the general model template:

y(k) = Gu(k) + He(k) =BA

u(k) +1A

e(k)

We have:

L1 = 1− H−1 = 1− A, L2 = H−1G = By(k) = L1y(k) + L2y(k) = (1− A)y(k) + Bu(k) = ϕ(k)θ

ε(k) = H−1(y(k)−Gu(k)) = Ay(k)− Bu(k)

which can be easily seen to be equivalent with the ARX formulation.

Page 50: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Specializing the framework to 1st order ARMAX

Homework! Plug in the polynomials for 1st order ARMAX and verifythat you obtain the same result as before.

Page 51: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

2 Model structures

3 General prediction error methods

Stepping stones: ARX and ARMAX

General case

Matlab example

Theoretical guarantees

4 Implementation and extensions

Page 52: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Experimental data

Consider again the experimental data on which ARX was applied.plot(id); and plot(val);

Recall theoretical ARMAX:

A(q−1)y(k) = B(q−1)u(k) + C(q−1)e(k)

Page 53: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Identifying an ARMAX modelmARMAX = armax(id, [na, nb, nc, nk]);

Arguments:

1 Identification data.2 Array containing the orders of A, B, C and the delay nk .

Like for ARX, structure includes the explicit minimum delay nkbetween inputs and outputs.

y(k) + a1y(k − 1) + a2y(k − 2) + . . . + anay(k − na)

= b1u(k − nk) + b2u(k − nk − 1) + . . . + bnbu(k − nk − nb + 1)

+ e(k) + c1e(k − 1) + c2e(k − 2) + . . . + cnce(k − nc)

A(q−1)y(k) = B(q−1)u(k − nk) + C(q−1)e(k), where:

A(q−1) = (1 + a1q−1 + a2q−2 + . . . + anaq−na)

B(q−1) = (b1 + b2q−1 + bnbq−nb+1)

C(q−1) = (1 + c1q−1 + c2q−2 + . . . + cncq−nc)

Remark: As for ARX, can transform into theoretical structure by usinga new B(q−1) of order nk + nb − 1, with nk − 1 leading zeros.

Page 54: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

ARMAX model validation

Considering the system is 2nd order with no time delay, take na = 2,nb = 2, nc = 2, nk = 1. Validation: compare(val, mARMAX);

In contrast to ARX, results are good. Flexible noise model pays off.

Page 55: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Identifying an OE model

Recall OE model structure:

y(k) =B(q−1)

F (q−1)u(k) + e(k)

Page 56: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Identifying an OE model (continued)

mOE = oe(id, [nb, nf, nk]);

Arguments:

1 Identification data.2 Array containing the orders of B, F , and the delay nk .

y(k) =B(q−1)

F (q−1)u(k − nk) + e(k), where:

B(q−1) = (b1 + b2q−1 + bnbq−nb+1)

F (q−1) = (1 + f1q−1 + f2q−2 + . . . + fnf q−nf )

Remark: Like before, can transform into theoretical structure bychanging B.

Page 57: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

OE model validation

Considering the system is second-order with no time delay, we takenb = 2, nf = 2, nk = 1. Validation: compare(val, mOE);

Results as good as ARMAX. System seems to obey both modelstructures.

Page 58: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

2 Model structures

3 General prediction error methods

Stepping stones: ARX and ARMAX

General case

Matlab example

Theoretical guarantees

4 Implementation and extensions

Page 59: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Preliminaries: Vector derivative and Hessian

Consider any function V (θ), V : Rn → R. Then:

dVdθ

=

∂V∂θ1∂V∂θ2...

∂V∂θn

,d2Vdθ2 =

∂2V∂θ2

1

∂2V∂θ1∂θ2

. . . ∂2V∂θ1θn

∂2V∂θ2∂θ1

∂2V∂θ2

2. . . ∂2V

∂θ2θn

......

......

∂2V∂θn∂θ1

∂2V∂θnθ2

. . . ∂2V∂θ2

n

Page 60: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Assumptions

Assumptions (simplified)1 Signals u(k) and y(k) are stationary stochastic processes.2 The input signal u(k) is “sufficiently informative”.3 Transfer functions G(q−1) and H(q−1) depend smoothly on θ.4 The Hessian d2V

dθ2 is nonsingular.

Page 61: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Discussion of assumptions

Assumption 2 is informal, formal name is ‘persistently exciting’input (it comes later).Recall that G and H have the parameters as coefficients, wedenote them by G(q−1; θ) and G(q−1; θ) to make thedependence explicit. Assumption 3 ensures we can differentiateG and H w.r.t. θ.Note that dG

dθ , dHdθ are vectors of transfer functions!

Recall V (θ) = 1N

∑Nk=1 ε2(k), the MSE. Assumption 4 ensures V

is not “flat” around minima, so the minima are uniquelyidentifiable.

Page 62: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Guarantee

Theorem 1

Define the limit V∞(θ) = limN→∞ V (θ). Given Assumptions 1–4, theidentification solution θ = arg minθ V (θ) converges to a minimumpoint θ∗ of V∞(θ) as N →∞.

Remark: This is a type of consistency guarantee, in the limit ofinfinitely many data points.

Page 63: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Further assumptions

Assumptions (simplified)5 The true system satisfies the model structure chosen. This

means there exists at least one θ0 so that for any input u(k) andthe corresponding output y(k) of the true system, we have:

y(k) = G(q−1; θ0)u(k) + H(q−1; θ0)e(k)

with e(k) white noise.6 The input u(k) is independent from the noise e(k) (the

experiment is performed in open loop).

Page 64: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Additional guarantee

Theorem 2

Under Assumptions 1-6, θ converges to a true parameter vector θ0 asN →∞.

Remark: Also a consistency guarantee. Theorem 1 guaranteed aminimum-error solution, whereas Theorem 2 additionally says thissolution corresponds to the true system, if the system satisfies themodel structure.

Page 65: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

2 Model structures

3 General prediction error methods

4 Implementation and extensions

Solving the optimization problem

Multiple inputs and outputs

MIMO ARX: Matlab example

Page 66: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Optimization problem

Objective of identification procedure: Minimize mean squared error

V (θ) =1N

N∑k=1

ε(k)2

where ε(k) are the prediction errors. In the general case:ε(k) = H−1(q−1)(y(k)−G(q−1)u(k))

Solution: θ = arg minθ

V (θ)

So far we took this solution for granted and investigated its properties.While in ARX linear regression could be applied to find θ, in generalthis does not work. Main implementation question:

How to solve the optimization problem?

Page 67: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Minimization via derivative root

Consider first the scalar case, θ ∈ R.

Idea: at any minimum, the derivativef (θ) = dV

dθ is zero. So, find a root of f (θ).

Remarks:Care must be taken to find aminimum and not a maximum orinflexion. This can be checked withthe second derivative, d2V

dθ2 = dfdθ > 0.

We may also find a local minimumwhich is larger (worse) than theglobal one.

Page 68: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Newton’s method for root finding

Start from some initial point θ0.At iteration `, next point θ`+1 is the intersection between abscissaand tangent at f in current point θ`. By geometry arguments:

θ`+1 = θ` −f (θ`)df (θ`)

Remarks:

Notation df (θ`)dθ means the value of derivative df

dθ at point θ`.

The slope of the tangent is df (θ`)dθ .

θ`+1 is the “best guess” for root given current point θ`.

Page 69: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Newton’s method for the identification problem

Replace f (θ) by dVdθ to get back to optimization problem:

θ`+1 = θ` −dV (θ`)

dθd2V (θ`)

dθ2

Extend to θ ∈ Rn. Recall that dVdθ is now an n-vector, and d2V

dθ2 ann × n-matrix (the Hessian):

θ`+1 = θ` −[

d2V (θ`)

dθ2

]−1 dV (θ`)

Add a step size α` > 0:

θ`+1 = θ` − α`

[d2V (θ`)

dθ2

]−1 dV (θ`)

Remark:

The stepsize helps with the convergence of the method, e.g.because in identification V is noisy.

Page 70: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Stopping criterion

Algorithm can be stopped:

When the difference between consecutive parameter vectors issmall, e.g. maxn

i=1 |θi,`+1 − θi,`| smaller than some presetthreshold.

orWhen the number of iterations ` exceeds a preset maximum.

Page 71: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Computing the derivatives

V (θ) =1N

N∑k=1

ε(k)2

Keeping in mind that ε(k) depends on θ, from matrix calculus:

dVdθ

=2N

N∑k=1

ε(k)dε(k)

d2Vdθ2 =

2N

N∑k=1

dε(k)

[dε(k)

]>+

2N

N∑k=1

ε(k)d2ε(k)

dθ2

where:dε(k)

dθ is the vector derivative and d2ε(k)dθ2 the Hessian of ε(k).

dε(k)dθ [ dε(k)

dθ ]>

is an n × n matrix.

Page 72: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Gauss-Newton

Ignore the second term in the Hessian of V and just use the first term:

H =2N

N∑k=1

dε(k)

[dε(k)

]>leading to the Gauss-Newton algorithm:

θ`+1 = θ` − α`H−1 dV (θ`)

Motivation:

Quadratic form of H gives better algorithm behavior.Simpler computation.

Page 73: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Example: 1st order ARMAX

Recall model and prediction error for 1st order ARMAX:

y(k) = −ay(k − 1) + bu(k − 1) + ce(k − 1) + e(k)

ε(k) = −cε(k − 1) + y(k) + ay(k − 1)− bu(k − 1)

We need dε(k)dθ = [∂ε(k)

∂a , ∂ε(k)∂b , ∂ε(k)

∂c ]>

. Differentiating second equation:

∂ε(k)

∂a= −c

∂ε(k − 1)

∂a+ y(k − 1)

∂ε(k)

∂b= −c

∂ε(k − 1)

∂b− u(k − 1)

∂ε(k)

∂c= −c

∂ε(k − 1)

∂c− ε(k − 1)

So, ∂ε(k)∂a , ∂ε(k)

∂b , ∂ε(k)∂c are dynamical signals that can be computed

using the recursions above, starting e.g. from 0 initial values.

Page 74: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Example: 1st order ARMAX (continued)

Finally, the algorithm is implemented as follows:

Given current value of parameter vector θ`, apply the recursionsabove to find signals dε(k)

dθ , k = 1, . . . , n.

Plug dε(k)dθ into equations for dV

dθ , d2Vdθ2

Apply Newton (or Gauss-Newton) update formula to find θ`+1.

Page 75: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

2 Model structures

3 General prediction error methods

4 Implementation and extensions

Solving the optimization problem

Multiple inputs and outputs

MIMO ARX: Matlab example

Page 76: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

MIMO system

So far we considered y(k) ∈ R, u(k) ∈ R,Single-Input, Single-Output (SISO) systems:

y(k) = G(q−1)u(k) + H(q−1)e(k)

Many systems are Multiple-Input, Multiple-Output (MIMO).E.g., aircraft. Inputs: throttle, aileron, elevator, rudder.Outputs: airspeed, roll, pitch, yaw.

Page 77: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

General MIMO model

Consider y(k) ∈ Rny , u(k) ∈ Rnu. Model:

y(k) = G(q−1)u(k) + H(q−1)e(k)y1(k)y2(k)

...yny (k)

= G(q−1)

u1(k)u2(k)

...unu(k)

+ H(q−1)

e1(k)e2(k)

...eny (k)

Different from SISO:

noise e(k) ∈ Rny is a random vector of the same size as y ,white-noise (independent, zero-mean).G(q−1) is a matrix of size ny × nuH(q−1) is a matrix of size ny × ny

The matrix elements are fractions of polynomials in q−1.

Page 78: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

MIMO ARX

A(q−1)y(k) = B(q−1)u(k) + e(k)

A(q−1) = I + A1q−1 + . . . + Anaq−na

B(q−1) = B1q−1 + . . . + Bnbq−nb

where I is the ny × ny identity matrix, A1, . . . , Ana ∈ Rny×ny ,B1, . . . , Bnb ∈ Rny×nu.

Put in the general template:

y(k) = A−1(q−1)B(q−1)u(k) + A−1(q−1)e(k)

Page 79: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

MIMO ARX: concrete example

Take na = 1, nb = 2, ny = 2, nu = 3. Then:

A(q−1)y(k) = B(q−1)u(k) + e(k)

A(q−1) = I + A1q−1

= I +

[a11

1 a121

a211 a22

1

]q−1

B(q−1) = B1q−1 + B2q−2

=

[b11

1 b121 b13

1b21

1 b221 b23

1

]q−1 +

[b11

2 b122 b13

2b21

2 b222 b23

2

]q−2

Page 80: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

MIMO ARX: concrete example (continued)

([1 00 1

]+

[a11

1 a121

a211 a22

1

]q−1

) [y1(k)y2(k)

]

=

([b11

1 b121 b13

1b21

1 b221 b23

1

]q−1 +

[b11

2 b122 b13

2b21

2 b222 b23

2

]q−2

) u1(k)u2(k)u3(k)

+

[e1(k)e2(k)

]

Explicit relationship:

y1(k) + a111 y1(k − 1) + a12

1 y2(k − 1)

= b111 u1(k − 1) + b12

1 u2(k − 1) + b131 u3(k − 1)

+ b112 u1(k − 2) + b12

2 u2(k − 2) + b132 u3(k − 2) + e1(k)

y2(k) + a211 y1(k − 1) + a22

1 y2(k − 1)

= b211 u1(k − 1) + b22

1 u2(k − 1) + b231 u3(k − 1)

+ b212 u1(k − 2) + b22

2 u2(k − 2) + b232 u3(k − 2) + e2(k)

Page 81: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

General error and predictor

The derivation mirrors that in the SISO case:

y(k) = Gu(k) + He(k)

⇒ e(k) = ε(k) = H−1(y(k)−Gu(k))

y(k) = y(k)− e(k) = Gu(k) + (H − I)e(k)

= Gu(k) + (H − I)H−1(y(k)−Gu(k))

= Gu(k) + (I − H−1)(y(k)−Gu(k))

= Gu(k) + (I − H−1)y(k)−Gu(k) + H−1Gu(k))

= (I − H−1)y(k) + H−1Gu(k)

=: L1y(k) + L2u(k)

where technical conditions must be satisfied, e.g. to ensure that H−1

is meaningful.

Page 82: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

PEM criterion

Important difference between SISO and MIMO: identification criterion.The SISO criterion, the MSE:

V (θ) =1N

N∑k=1

ε(k)2

is no longer appropriate, since ε(k) is an ny -vector.

Define R(θ) = 1N

∑Nk=1 ε(k)ε>(k), an ny × ny matrix. Two options for

the new criterion:V (θ) = trace(R(θ))

V (θ) = det(R(θ))

Assuming ε(k) is zero-mean, R(θ) is an estimate of its covariance.

Page 83: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Table of contents

1 Identification of ARX models

2 Model structures

3 General prediction error methods

4 Implementation and extensions

Solving the optimization problem

Multiple inputs and outputs

MIMO ARX: Matlab example

Page 84: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Experimental data

Consider a continuous stirred-tank reactor:

Image credit: mathworks.com

Input: coolant flow Q

Outputs:

Concentration CA of substance A in the mixTemperature T of the mix

Page 85: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Experimental data

Left: identification, Right: validation

Page 86: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

MIMO ARX in Matlab

A(q−1)y(k) = B(q−1)u(k) + e(k)

A(q−1) =

a11(q−1) a12(q−1) . . . a1ny (q−1)a21(q−1) a22(q−1) . . . a2ny (q−1)

. . .any1(q−1) any2(q−1) . . . anyny (q−1)

aij(q−1) =

{1 if i = j0 otherwise

+ aij1q−1 + . . . + aij

naijq−naij

B =

b11(q−1) b12(q−1) . . . b1nu(q−1)b21(q−1) b22(q−1) . . . b2nu(q−1)

. . .bny1(q−1) bny2(q−1) . . . bnynu(q−1)

bij(q−1) = bij

1q−nkij + . . . + bijnbij

q−nkij−nbij+1

Page 87: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

Identifying a MIMO ARX modelm = arx(id, [Na, Nb, Nk]);

Arguments:

1 Identification data.2 Matrices with orders of polynomials in A, B, and delays nk :

Na =

na11 . . . na1ny. . .

nany1 . . . nanyny

Nb =

nb11 . . . nb1nu. . .

nbny1 . . . nbnynu

Nk =

nk11 . . . nk1nu. . .

nkny1 . . . nknynu

Page 88: Part 5 - Prediction Error Methods

ARX models Model structures General PEM Implementation & Extensions

MIMO ARX results

Take na = 2, nb = 2, and nk = 1 everywhere in matrix elements:

Na = [2 2; 2 2]; Nb = [2; 2]; Nk = [1; 1];m = arx(id, [Na Nb Nk]);compare(m, val);

Page 89: Part 5 - Prediction Error Methods

Appendix: Nonlinear ARX

Appendix: Nonlinear ARX (for project)

Page 90: Part 5 - Prediction Error Methods

Appendix: Nonlinear ARX

Nonlinear ARX structure

Recall standard ARX:

y(k) =− a1y(k − 1)− a2y(k − 2)− . . .− anay(k − na)

b1u(k − 1) + b2u(k − 2) + . . . + bnbu(k − nb) + e(k)

Linear dependence on delayed outputs y(k − 1), . . . , y(k − na) andinputs u(k − 1), . . . , u(k − nb).

Nonlinear ARX (NARX) generalizes this to any nonlineardependence:

y(k) =g(y(k − 1), y(k − 2), . . . , y(k − na),

u(k − 1), u(k − 2), . . . , u(k − nb); θ)

+ e(k)

Function g is parameterized by θ ∈ Rn, and these parameters can betuned to fit identification data and thereby model a particular system.

In our case, g will be a neural network.

Page 91: Part 5 - Prediction Error Methods

Appendix: Nonlinear ARX

Neural-net NARX: training data

y(k) =g(y(k − 1), y(k − 2), . . . , y(k − na),

u(k − 1), u(k − 2), . . . , u(k − nb); θ)

+ e(k)

y(k) =g(x(k); θ) + e(k)

with new notation for the network input:x(k) = [y(k − 1), . . . , y(k − na), u(k − 1), . . . , u(k − nb)]>

Then, the training dataset consists of input-output pairs:

(x(1), y(1)), (x(2), y(2)), . . . , (x(N), y(N))

where N=number of time steps in the data.

Note: Negative and zero-time y and u can be taken 0, assumingsystem in zero initial conditions.

Page 92: Part 5 - Prediction Error Methods

Appendix: Nonlinear ARX

Neural-net NARX: training criterion

y(k) =g(x(k); θ)

ε(k) =y(k)− y(k) = y(k)− g(x(k); θ)

The criterion is V (θ) = 1N

∑Nk=1 ε2(k), the MSE. Training should return

a parameter vector θ minimizing V (θ).

So NARX identification is in fact:

A nonlinear prediction error method.And also a nonlinear regression problem.

Page 93: Part 5 - Prediction Error Methods

Appendix: Nonlinear ARX

Using the NARX model

One-step ahead prediction: If true output sequence is known, thenetwork input x(k) is fully available:

x(k) =[y(k − 1), . . . , y(k − na), u(k − 1), . . . , u(k − nb)]>

y(k) =g(x(k); θ)

Example: On day k − 1, predict weather for day k .

Simulation: If true outputs unknown, use the previously predictedoutputs to construct an approximation of x(k):

x(k) =[y(k − 1), . . . , y(k − na), u(k − 1), . . . , u(k − nb)]>

y(k) =g(x(k); θ)

(predicted outputs at negative and zero time can also be taken 0.)

Example: Simulation of an aircraft’s response to emergency pilotinputs, that may be dangerous to apply to the real system.