approximate solution of a system of linear equations...

5
APPROXIMATE SOLUTION OF A SYSTEM OF LINEAR EQUATIONS WITH RANDOM PERTURBATIONS P. Date * ([email protected]) Center for Analysis of Risk and Optimisation Modelling Applications, Department of Mathematical Sciences, Brunel University, U.K. Abstract This work suggests a way of finding an approximate solution to a system of linear equations of the form AX = b,A = A 0 + Δ with a known square matrix A 0 , a known vector b and a small structured ran- dom perturbation Δ. Under certain realistic assump- tions about the structure and the smallness of per- turbations, it is shown that a uniformly convergent sequence of approximations to the random variable X = A -1 b can be obtained in terms of polynomials of the random perturbations. Possible applications of this method in control theory are suggested. Numer- ical examples demonstrate the applicability of this method. 1 Introduction Many estimation and control problems rely on so- lution of linear least-squares problems or uncertain matrix inversion problems of the form AX = b. When the nominal data {A, b} are subject to uncer- tainty and/or disturbances, the performance of the optimal estimator may degrade appreciably. Sim- ilar problems arise while computing frequency re- sponse of a system with uncertain poles. Numerous approaches to alleviate this problem have been sug- gested. Min-max approach to least-squares problem under bounded data uncertainties has been studied in [1] and [2]. This approach formulates and solves problems of the form min X max A∈A, b ∈B kAX - bk 2 where A and B are appropriate bounded uncertainty sets. Data-fitting under bounded uncertainties has * corresponding author been considered in a more general setting in [3]. Ap- plications of min-max based least-squares solutions have been reported in state estimation in general dy- namical systems [4] and in water networks [5]. If statistical information is available about the uncer- tainty, the min-max based approach described above may be too conservative for use in certain applica- tions. If the joint probability distribution function of all the uncertain parameters is known, the joint probability distribution of A -1 b may be completely characterised using results in [6]. However, this level of information is rarely available. In this paper, the focus is to obtain computationally simpler estimates with a good average-case accuracy possibly at the ex- pense of worst-case accuracy using substantially less information than the full probability density. To this end, this paper considers a sequence of uniform ap- proximations to the true solution (which is a random variable) of the system of equations. The statistics of approximants is much simpler to compute than that of the true solution. The rest of the paper is organised as follows. The next section gives the exact assumptions and the problem formulation for the case when A is a square matrix and is invertible almost surely. Section 3 dis- cusses the main results of this paper. Section 4 de- scribes the case when the random perturbations have a special affine structure. Section 5 describes applica- tions to analysis of uncertainty in frequency response of linear systems. Section 6 demonstrates these re- sults with numerical examples. 2 Problem formulation Notation in the paper is standard. Let R m×n (re- spectively, C m×n ) denote the space of m × n real Control 2004, University of Bath, UK, September 2004 ID-043

Upload: others

Post on 17-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: APPROXIMATE SOLUTION OF A SYSTEM OF LINEAR EQUATIONS …ukacc.group.shef.ac.uk/proceedings/control2004/Papers/043.pdf · random vectors which converges with probability 1 to the true

APPROXIMATE SOLUTION OF A SYSTEM OF

LINEAR EQUATIONS WITH RANDOM

PERTURBATIONS

P. Date∗([email protected])Center for Analysis of Risk and Optimisation Modelling Applications,

Department of Mathematical Sciences, Brunel University, U.K.

Abstract

This work suggests a way of finding an approximatesolution to a system of linear equations of the formAX = b, A = A0 + ∆ with a known square matrixA0, a known vector b and a small structured ran-dom perturbation ∆. Under certain realistic assump-tions about the structure and the smallness of per-turbations, it is shown that a uniformly convergentsequence of approximations to the random variableX = A−1b can be obtained in terms of polynomialsof the random perturbations. Possible applications ofthis method in control theory are suggested. Numer-ical examples demonstrate the applicability of thismethod.

1 Introduction

Many estimation and control problems rely on so-lution of linear least-squares problems or uncertainmatrix inversion problems of the form AX = b.When the nominal data {A,b} are subject to uncer-tainty and/or disturbances, the performance of theoptimal estimator may degrade appreciably. Sim-ilar problems arise while computing frequency re-sponse of a system with uncertain poles. Numerousapproaches to alleviate this problem have been sug-gested. Min-max approach to least-squares problemunder bounded data uncertainties has been studiedin [1] and [2]. This approach formulates and solvesproblems of the form

minX

maxA∈A,b∈B

‖AX− b‖2

where A and B are appropriate bounded uncertaintysets. Data-fitting under bounded uncertainties has∗corresponding author

been considered in a more general setting in [3]. Ap-plications of min-max based least-squares solutionshave been reported in state estimation in general dy-namical systems [4] and in water networks [5].

If statistical information is available about the uncer-tainty, the min-max based approach described abovemay be too conservative for use in certain applica-tions. If the joint probability distribution functionof all the uncertain parameters is known, the jointprobability distribution of A−1b may be completelycharacterised using results in [6]. However, this levelof information is rarely available. In this paper, thefocus is to obtain computationally simpler estimateswith a good average-case accuracy possibly at the ex-pense of worst-case accuracy using substantially lessinformation than the full probability density. To thisend, this paper considers a sequence of uniform ap-proximations to the true solution (which is a randomvariable) of the system of equations. The statistics ofapproximants is much simpler to compute than thatof the true solution.

The rest of the paper is organised as follows. Thenext section gives the exact assumptions and theproblem formulation for the case when A is a squarematrix and is invertible almost surely. Section 3 dis-cusses the main results of this paper. Section 4 de-scribes the case when the random perturbations havea special affine structure. Section 5 describes applica-tions to analysis of uncertainty in frequency responseof linear systems. Section 6 demonstrates these re-sults with numerical examples.

2 Problem formulation

Notation in the paper is standard. Let Rm×n (re-spectively, Cm×n) denote the space of m × n real

Control 2004, University of Bath, UK, September 2004 ID-043

Page 2: APPROXIMATE SOLUTION OF A SYSTEM OF LINEAR EQUATIONS …ukacc.group.shef.ac.uk/proceedings/control2004/Papers/043.pdf · random vectors which converges with probability 1 to the true

(resp. complex) matrices. Let Rn denote the spaceof n−vectors. I denotes identity matrix; its size isdetermined by the context. A(i, j) denotes (i, j)th el-ement of matrix A. Vectors will be denoted by bold-face characters. In the problems considered in thispaper, a system of linear equations of the followingform is assumed to be given:

AX = b, A = A0 + ∆, A0 ∈ Qm×m, b ∈ Rm (1)

where Q = R or C depending on context and ∆ is areal matrix-valued random variable satisfying

P

(‖A−1

0 ∆‖2 < 1)

= 1. (2)

Here ‖ · ‖2 denotes the maximum singular value andP(Φ) denotes the probability of occurence of event Φ.It is seen that the random perturbation is assumedto be small relative to the nominal value in a well-defined sense. Specific instances of the uncertaintyset will be considered in section 4.

3 Main result

Theorem 1 Suppose the given system of equa-tions (1) satisfies (2) and suppose that the inverseof A exists with probability 1. Define

X̃ = A−1b, XN =N∑i=0

(−A−10 ∆)iA−1

0 b (3)

with (−A−10 ∆)0 = I. Then XN → X̃ with probabil-

ity 1.

Proof : The proof rests on the following standardresult in probability (see, e.g., [7], theorem 7.4):

If∑N P(|XN − X̃| > ε) < ∞ for all ε > 0, then

XN → X̃ with probability 1.

The proof is based on deriving an upper bound onP(‖XN − X̃‖ > ε) and then showing that summationof these upper bounds over N is finite. 1

Since P(‖A−1

0 ∆‖2 < 1)

= 1, the following power se-ries expansion holds with probability 1 (see, e.g. [8],chapter 5):

(A0 + ∆)−1 = (I +A−10 ∆)−1A−1

0

=∞∑i=0

(−A−1

0 ∆)i)A−1

0

1From proof of [7], theorem 7.4, it is easy to see that anal-ogous result holds for vector-valued random variables.

so that P(X̃ =

∑∞i=0(−A−1

0 ∆)iA−10 b

)= 1. Suppose

that P(‖A−1

0 ∆‖2 ≤ α)

= 1 for some α < 1. Also, let‖A−1

0 b‖2 = β. For a given ε > 0, let Nε be thesmallest integer such that αNε+1β

1−α ≤ ε. Now, usingdefinition of XN ,

P

(∥∥∥XNε−1 − X̃∥∥∥

2> ε)

= P

∥∥∥∥∥{ ∞∑i=Nε

(−A−10 ∆)i

}A−1

0 b

∥∥∥∥∥2

> ε

= 0, since, with probability 1, (4)∥∥∥∥∥{ ∞∑i=Nε

(−A−10 ∆)i

}A−1

0 b

∥∥∥∥∥2

≤ αNε+1β

1− α≤ ε. (5)

Next, for any ε > 0,∞∑N=0

P

(∥∥∥XN − X̃∥∥∥

2> ε)

≤Nε∑N=0

P

(∥∥∥XN − X̃∥∥∥

2> ε)<∞, (6)

since Nε ≤log(

ε(1−α)αβ )

log(α) is finite for any ε > 0. Thiscompletes the proof.

The above result shows a way of obtaining a series ofrandom vectors which converges with probability 1to the true solution of the system of equations. Theexpected value of XN is given by

E(XN) = E

(N∑i=0

(−A−1

0 ∆)iA−1

0 b

).

This is a function of the first N moments of randomvariable ∆ and may thus be found using a limitedinformation about the uncertainty.

In the next section, ∆ with a special structure is con-sidered, which yields particularly simple low orderapproximations.

4 Affinely parameterised per-turbations

Consider the system (1) with ∆ which satisfies (2)and has the following structure:

∆ =k∑i=1

aiAi, Ai ∈ Rm×m, (7)

Control 2004, University of Bath, UK, September 2004 ID-043

Page 3: APPROXIMATE SOLUTION OF A SYSTEM OF LINEAR EQUATIONS …ukacc.group.shef.ac.uk/proceedings/control2004/Papers/043.pdf · random vectors which converges with probability 1 to the true

and ai are scalar random variables with

E(ai) = 0, i = 1, 2, . . . , k.

The matrices Ai are in general, sparse matrices, with1′s in places where the random parameter ai has animpact on the nominal entry in the matrix A0 andzeros at all other elements (numerical examples willmake this point clear). The assumption that ai arezero-mean random variables is perfectly reasonableand implies that the mean is “absorbed” in A0. Itis possible to obtain useful low order approximationsmay be obtained with a very small number of termsfor this structure, as explained below.

1. The first order approximation

X1 =

(I −

k∑i=1

aiA−10 Ai

)A−1

0 b

is a linear combination of zero mean randomvariables ai and is of independent interest. Ifthe variance of ai are known and if N is suffi-ciently large, this expression may be used (along-with central limit theorem) to build approximateconfidence intervals around each element of thenominal (0th) order solution A−1

0 b. If N is smallbut the upper and the lower bounds on ai areknown, Hoeffding’s inequality [9] may be usedto obtain potentially conservative approximateerror bounds. See [10] for a recent application ofHoeffding’s inequalities in engineering.

Alternatively, X1 may simply be used to test thesensitivity of the solution of a system of linearequations to small perturbations in certain en-tries in the matrix (without having to solve theperturbed system).

2. If ai are uncorrelated, it is easy to show that

E(X2) =

(I +

k∑i=1

E(ai)2(−A−10 Ai)2

)A−1

0 b.

Note that the computation for E(X2) does not requiresampling the distribution of ai. For a large numberof uncertain parameters, the cost-saving to build anaccurate estimate of the average value of A−1b maybe significant. Simulation experiments indicate thathigher order approximations indeed tend to be moreaccurate than the nominal (0th) order solution A−1

0 b.This is demonstrated in section 6 through an exam-ple.

If first 4 moments of ai are known, it is possible toobtain covariance matrix of X2. The expression for

covariance is straighforward (if tedious) and is om-mitted.

It is necessary here to comment on the relationshipbetween the order of approximation and the size ofperturbations. The bound on the relative size of per-turbation ‖A−1

0 ∆‖2 will not be known in general.However, this bound does not appear in XN itself.If XN for some N ≥ 1 differs substantially (by morethan 100%, say) from X0 = A−1

0 b, it may indicatethat the small perturbations condition may be vio-lated and the data is not reliable enough to build anestimate. In general, it may be seen from the proofof theorem 1 that

‖XN − X̃‖2‖X0‖2

≤ ‖A−10 ∆‖N+1

2

1− α

holds with probability 1. The choice of N then de-pends on the trade-off between tractability and a pri-ori knowledge of the size of uncertainty.

Finally, the results presented here may be triviallyextended to the case when b = b0 + δb for a randomperturbation δb which is uncorrelated to ∆.

5 Applications

The technique presented in this paper is generic andmay have a wide variety of applications. Two applica-tions are described in some detail here. The focus ofthe first application is to map the available informa-tion about uncertainty in parameters of a linear sys-tem into corresponding uncertainty in its frequencyresponse. The second application relates to pertur-bation of a transfer function of a controller and itseffect on the closed-loop transfer functions.

5.1 Uncertainty in frequency responseof a linear system due to paramet-ric perturbations

If the parametric uncertainty in a model of a lin-ear system is in the form of a covariance matrix ofparameter estimates, there are numerous other waysof mapping it into frequency response; see [11] andreferences therein. However, the parameters may beobtained by physical knowledge of the underlying dy-namics, e.g. by knowing the values of (nomial) motorresistance and inductance in a DC drive. The moti-vation of this application is the latter situation, whenthe parameters may be known within certain toler-ances around their nominal values.

Control 2004, University of Bath, UK, September 2004 ID-043

Page 4: APPROXIMATE SOLUTION OF A SYSTEM OF LINEAR EQUATIONS …ukacc.group.shef.ac.uk/proceedings/control2004/Papers/043.pdf · random vectors which converges with probability 1 to the true

Consider a continuous time linear, shift-invariant sys-tem (in a standard notation),

dx

dt= Ax(t) +Bu(t),

yk = Cx(t) +Du(t)

where A = A0 + ∆, A0 is a constant real matrix and∆ is a matrix-valued real random variable satisfying

P

(‖(jωI −A0)−1∆‖2 < 1

)= 1, ∀ω. (8)

The perturbation ∆ may represent the uncertainty inphysical parameters which yield the poles of the sys-tem. Following the steps of theorem 1, the first orderapproximation of the uncertain frequency response

P (jω) = C(jωI −A)−1B +D

at a frequency ω is given by

P1 = C(jωI −A0)−1B +D

− C(jωI −A0)−1∆(jωI −A0)−1B. (9)

Unlike P , P1 is affine in uncertainty ∆. If the dis-tribution of this uncertainty is known, approximateconfidence intervals of pointwise frequency responseof the system may be easily obtained from (9).

5.2 Uncertainty in closed-loop fre-quency response due to controllerperturbations

Consider a closed-loop described by the followingequations:

y = Pu+ w

u = Cy. (10)

Here, P is the plant, C is the controller, w is a dis-turbance acting on the output and u and y are theplant input and plant output respectively. The plantand the controller are assumed to be matrix valuedlinear operators while the signals are assumed to bevector-valued. It may be easily shown that the trans-fer function from w to

[yT uT

]T is given by[yu

]=[

(I − PC)−1

C(I − PC)−1

]w. (11)

Suppose the controller is given by C = C0 + ∆ whereC0 is a known transfer function and ∆ is a zero meanrandom perturbation (e.g. accounting for tolerancesof nominal values of electrical or mechanical com-ponents used in hardware implementation). It is of

interest to find the effect of this perturbation on thefrequency response of the transfer function matrix.For notational simplicity, let S0 = (I − PC0)−1 andlet

T =[

(I − PC)−1

C(I − PC)−1

], T0 =

[(I − PC0)−1

C0(I − PC0)−1

].

It is assumed that P (‖ (S0P∆) (jω)‖2 < 1) = 1 holdsat each frequency ω of interest. Using the first orderapproximation for (I−PC)−1 and then retaining onlythe first order terms in ∆ in the resulting expressionyields (after some elementary manipulation)

T ≈ T0 +[

S0 P ∆S0

(I − C0P )−1 ∆S0

]. (12)

This expression may be used to find pointwise con-fidence intervals for the frequency response of thetransfer matrix to perturbations in the controllertransfer function.

6 Numerical Examples

Results of some numerical experiments related to bet-ter average case accuracy of higher order approxima-tions are presented here. Specific applications, suchas the ones described in the last section, are not dis-cussed due to space constraints.

Consider a simple linear system of equations(A0 + ∆)X = b with,

A0 =[1 23 3

], b =

[15

], ∆ = a1A1 + a2A2,

A1 =[0 10 0

], A2 =

[0 00 1

]and ai ∈ P. a1 is assumed to be uniformly dis-tributed in [−0.4, +0.4] and a2 is assumed to be uni-formly distributed in [−0.6, +0.6]. This gives a vari-ation of ±20% around nominal values of the corre-sponding two entries in A0.

100 samples of each a1, a2 are generated and ex-pected value of solution E(X̃) =

[2.3972 −0.7209

]Tis computed. The nominal solution in this case isA−1

0 b =[2.3333 −0.6667

]T . Using the result de-scribed earlier, the expected value of second orderapproximation is computed as

E(X2) = A−10 b +

(E(a1)2(−A−1

0 A1)2

+E(a2)2(−A−10 A2)2

)A−1

0 b

=[2.3883 −.7125

]T.

Control 2004, University of Bath, UK, September 2004 ID-043

Page 5: APPROXIMATE SOLUTION OF A SYSTEM OF LINEAR EQUATIONS …ukacc.group.shef.ac.uk/proceedings/control2004/Papers/043.pdf · random vectors which converges with probability 1 to the true

Note that first order moments are zero and the cor-responding term need not be computed. It is seenthat using a simple second order approximation witha modest extra computation yields a clear improve-ment in the accuracy of solution in this case.

As another example, consider an iterative numericalexperiment. An ith iteration proceeds as follows.

• A random matrix A0 of size 2× 2 and a randomvector of size 2× 1 is generated.

• Two random variables are considered: a1 is nor-mally distributed with E(a1) = 0 and E(a1)2 =0.2 × A0(1, 2) and a2 is uniformly distributedwith E(a2) = 0 and E(a2)2 = 0.1×A0(2, 2).

• For 100 realisations of a1 and a2,

A = A0 + a1

[0 10 0

]+ a2

[0 00 1

]is computed and the mean solution E(X̃) =E(A−1b) is computed. Care taken was to en-sure that the matrix A does not become too ill-conditioned.

• The solutions E(X2) and X0 = A−10 b are com-

puted. The following quantities are taken asmeasure of errors:

L1 = ‖E(X2)− E(X̃)‖2,

and L2 = ‖X0 − E(X̃)‖2.

• This entire experiment is repeated 100 times.

It was found that L1 < L2 held for 94 out of 100times. In the remaining instances the assumption‖A−1

0 ∆‖2 < 1 was violated more than once. It isthought that this presents a convincing case for usinga higher order approximation when better averagecase accuracy is required.

7 Conclusion

A method to build a sequence of random vectorswhich converges almost surely to the solution of asystem of linear equations under random perturba-tion is suggested. Probabilistic description of N th

vector in this sequence may serve as an approximatedescription of the actual solution. When informa-tion about higher order moments for random per-turbations is available, it may be used to build an

accurate approximation to the expected value of thesolution as well as to the confidence intervals aroundthis expected value. The extension of these results toleast squares problems with non-square matrices andexamples of suggested applications will be reportedelsewhere.

References

[1] S. Chandrasekaran, G. Golub, M. Gu, andA. Sayed, “Parameter estimation in the presenceof bounded data uncertaities,” SIAM J. MatrixAnal. and Appl., vol. 19, pp. 235–252, 1998.

[2] L. E. Ghaoui and H. Lebret, “Robust solutionsto least-squares problems with uncertain data,”SIAM J. Matrix Anal. Appl., vol. 18, pp. 1035–1064, 1997.

[3] G. Watson, “Data fitting problem with boundeduncertainties in the data,” SIAM J. MatrixAnal. Appl., vol. 22, pp. 1274–1293, 2001.

[4] A. Sayed, “A framework for state space estima-tion with uncertain models,” IEEE Trans. Au-tomat. Contr., vol. 46, pp. 998–1013, 2001.

[5] A. Nagar and R. Powell, “LFT/SDP approachto the uncertainty analysis for state estimationof water distribution systems,” IEE Proceedings,vol. 149, pp. 137–142, 2002.

[6] J.Feinberg, “On the universality ofthe probability distribution of theproduct B−1X of random matrices.”http://arxiv.org/abs/math/PR0204312.

[7] G. Grimmett and D. Stirzaker, Probability andRandom Processes. Oxford University Press,2001.

[8] R. Horn and C. Johnson, Matrix Analysis. Cam-bridge University Press, 1999.

[9] W. Hoeffding, “Probability inequalities for sumsof bounded random variables,” Amer. StatisticalAsso. Journal, no. 3, pp. 13–30, 1963.

[10] F. Paganini, “A set-based approach for whitenoise modeling,” IEEE Trans. Automat. Contr.,vol. 41, pp. 1453–1465, 1996.

[11] X. Bombois, M. Gevers, G. Scorletti, and B. An-derson, “Robustness analysis tools for an uncer-tainty set obtained by prediction error identifica-tion,” Automatica, vol. 37, pp. 1692–1636, 2001.

Control 2004, University of Bath, UK, September 2004 ID-043