neural networks for solving systems of linear equations

Artificial Neural Networks (Spring 2007)

Neural Networks for Solving Systems of Linear Equations

Seyed Jalal KazemitabarReza Sadraei

Instructor: Dr. Saeed BagheriArtificial Neural Networks Course (Spring 2007)

Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)

Outline

Historical IntroductionProblem FormulationStandard Least Squares SolutionGeneral ANN SolutionMinimax SolutionLeast Absolute Value SolutionConclusion

Outline

History

70’s:Kohonen solved optimization problems using Neural Networks.

80’s:Hopfield used Lyapunov function (Energy function) for proving the convergence of iterative methods in optimization problems.

Differential Eq. Neural Networksmapping

History

Many problems in science and engineering involve solving a large system of linear equations:

Machine LearningPhysicsImage ProcessingStatistics,…

In many applications an on-line solution of a set of linear equations is desired.

History

40’s:Kaczmarz introduced a method to solve linear equations

50’s – 80’s:Different methods based on Kaczmarz’s has been proposed in different fields.Conjugate Gradient method.

No good method for on-line solution of large systems.

1990:Andrzej Cichocki:

a Mathematician who received his PhD in Electrical Engineering Proposed a Neural Network for solving systems of linear equations in real time

Outline

Problem Formulation

Linear Parameter Estimation model :

: Linear Equation

: Model matrix: Unknown vector of the system parameters to be estimated

: Vector of observations: Unknown measurement errors: Vector of true values (usually unknown)

nmij R]a[A ×∈=

truebrbAx =+=

mRb∈mRr∈m

true Rb ∈

nTn21 R]x,...,x,x[x ∈=

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

mn2m1m

n22221

n11211

aaaaaa

Types of Equations

A set of linear equations is said to be overdetermined if m > n.

Usually inconsistent due to noise and errors.e.g. Linear parameter estimation problems arising in signal processing, biology, medicine and automatic control.

A set of linear equations is said to be underdetermined if m < n (due to the lack of information).

Inverse and extrapolation problems.Involves much less problems than overdetermined case

nmij R]a[A ×∈=truebrbAx =+=

Mathematical Solutions

Why not use ?It is not applicable since m≠n most of the time which results in irreversibility of A.

What if we use least square error method?

Inversing is considered to be time consuming for large A in real-time systems.

bA x -1=

;bA)AA(x,bAAxA

,0)bAx(A'y),bAx()bAx(y

−−=

Outline

Least Squares Error Function

Find the vector that minimizes the least squares function

represents the residual components of the residual vector

nRx ∈*

−=−=n

1jijijiii bxabxA)x(r

bAxxrxrxrxr Tm −== )](),...,(),([)( 21

=−−=m

T )x(r21)bAx()bAx(

21)x(E

Gradient Descent ApproachBasic idea: compute a trajectory starting at the initial point

that has the solution x* as a limit point ( for )

General gradient approach for minimization of a function:

is chosen in a way that ensures the stability of the differential equations and an appropriate convergence speed

∞→t)t(x

)x(EdtdX

∇−= μ

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

∂∂

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

mn2m1m

n22221

n11211

Mμμμ

μμμμμμ

Solving LE Using Least Squares Criterion

Gradient of the energy function:

Scalar representation:

)bAx(AxE

−=⎥⎦

⎤⎢⎣

⎡∂∂

∂∂

=∇ L

n,...,2,1j,x)0(x

bxaadtdx

1kikik

1iipjp

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛−−= ∑ ∑∑

)bAx(AdtdX T −−= μ

∑ ∑∑= ==

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛−−=

1kikik

1iipjp

j bxaadtdx

ANN With Identity Activation Function

Outline

General ANN Solution

The key step in designing an algorithm for neural networks:

Construct an appropriate computational energy function (Lyapunov function)

Lowest energy state will correspond to the desired solution x*

Using derivation, the energy function minimization problem is transformed into a set of ordinary differential equations

In general, the optimization problem can be formulated as:

Find the vector that minimizes the energy function

is called weighting function.Weighting function derivation is called activation function

nRx ∈*

))x(r()bxA()x(Em

1iii ∑∑

=−= σσ

))x(r( iσ

)r()r(g∂∂

Gradient descent approach:

The minimization of the energy function leads to the set of differential equation

)x(EdtdX

∇−= μ

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

∂∂

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

mn2m1m

n22221

n11211

Mμμμ

μμμμμμ

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛−×−=

⎟⎟⎠

⎞⎜⎜⎝

∂∂

−=∂∂

∑ ∑∑

∑∑∑

1kikikiip

bxagadt

General ANN Architecture⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛−×−= ∑ ∑∑

1kikikiip

j bxagadt

Remember that this is the activation function

Drawbacks of Least Square Error Criterion

Why not always use least square energy function?

Not so good in case of existence of large outliers.Only optimal for Gaussian distribution of error.

The proper choice of the criterion depends onSpecific applications.Distribution of the errors in the measurement vector b

Gaussian dist*. Least squares criterionUniform dist. Chebyshev norm criterion

*However the assumption that the set of measurements or observations has a Gaussian error distribution is frequently unrealistic due to different sources of errors such as instrument errors, modeling errors, sampling errors, and human errors.

Huber’s Function:

Weighting Function Activation Function

Special Energy Functions

⎪⎪⎭

⎪⎪⎬

⎪⎪⎩

⎪⎪⎨

βββ

Talvar’s Function:

This Function has direct implementationWeighting Function Activation Function

⎪⎪⎭

⎪⎪⎬

⎪⎪⎩

⎪⎪⎨

Logistic Function:

Iterative Reweigheted method uses this activation function.

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

ββρ eCoshln)e( 2

Lp-normed function:

Activation Function

p1)x(E

Lp-Norm Energy Functions

A well-known criterion is energy functionNormL1 −

1ii1 )x(r)x(E

Another well-known criterion is (chebyshev) criterion which can be formulated as the minimax problem:

This criterion is optimal for uniform distribution of error.

NormL −∞

{ })x(rmaxmin imi1Rx n ≤≤∈

Outline

Minimax (L∞-Norm) Criterion

For the case p=∞ of the Lp-Norm problem the activation function g[ri(x)] can not be explicitly mathematically expressed by

Error function can be define as

resulting in following activation function:

mi1i })x(rmax{)x(E

≤≤∞ =

1)( −pi xr

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧ =

= ≤≤

otherwise0})x(r{max)x(rif)]x(r[sign

)]x(r[gkmk1ii

Jalal KazemitabarReza Sadraei Artificial Neural Networks (Spring 2007)

Minimax (L∞-Norm) Criterion

Although straightforward, some problems arise in practical implementations of the system of differential equations:

Exact realization of the signum functions is rather difficult (electrically).E∞ has a derivative discontinuity at x if for some i ≠ k

*This is often responsible for various anomalous results (e.g. hysteresis phenomena)

)()()( xExrxr ki ∞==

Transforming the problem to an equivalent one

Rather than directly implementing the proposed system, we transform the minimax problem

into an equivalent one:

Minimize subject to the constraints

Thus the problem can be viewed as finding the smallest non-negative value of

where x* is a vector of the optimal values of the parameters

{ })(maxmin1

xr imiRx n ≤≤∈

ε≤)(xri 0≥ε

0)( ** ≥= ∞ xEε

New Energy Function

Applying the standard quadratic function we can consider the cost function as:

where are coefficients and

{ }∑=

−− −+++=m

iii xrxrxE

22 ))](([))](([2

),( εεκυεε

0,0 >> κν

},0min{][ yy =−

New Energy Function

Applying now the gradient strategy we obtain the associated system of differential equations

⎟⎠

⎞⎜⎝

⎛−ε++ε+

μ−=ε ∑

]S))x(r(S))x(r[(dtd

1i1ii0

{ }∑=

−−+−=m

iiiiiijj

j SxrSxradtdx

121 ]))(())([( εεμ ),...,2,1( nj =

⎭⎬⎫

⎩⎨⎧ ≥+

=otherwise;1

0)x(r;0S i

⎭⎬⎫

⎩⎨⎧ ≥−

=otherwise;1

0)x(r;0S i

Simplifying architecture

It is interesting to note that the system of differential equations can be simplified by:

This nonlinear function represent a typical dead zone function.

⎪⎭

⎪⎬

⎪⎩

⎪⎨

>+−≤≤−

−<+=

εεεε

εεεϕ

rifrrif

rifrxr

Simplifying architectureIt is easy to check:

Thus the system of differential equations can be simplified to the form:

)),(())(())(( 21 εϕεε xrSxrSxr iiiiii −=−++

)),(())(())(( 21 εϕεε xrSxrSxr iiiiii =−−+

10 )0(,)),(( εεεϕκυμε

=⎟⎠

⎞⎜⎝

⎛−−= ∑

iii xr

,)),x(r(adt

1iiiijj

j ∑=

εϕμ−= )n,...,2,1j(x)0(x )0(jj ==

iiiijj

j xradt

1)),(( εϕμ

⎟⎠

⎞⎜⎝

⎛−−= ∑

1iii0 )),x(r(

dtd εϕ

κυμε

Outline

Least Absolute Values ( L1-Norm) Energy Function

Find the design vector that minimizes the error function

Why should one choose this function knowing that it has differentiation problems?

ii xrxE

11 )()(

jijiji bxaxr

Important L1-Norm Properties1. Least absolute value problems are equivalent to linear

programming problems and vice versa.

2. Although the energy function E1(x) is not differentiable, the terms can be approximated very closely by smoothly differentiable functions

3. For a full rank* matrix A, there always exists a minimum L1-Norm solution which passes through at least n of the m data points. L2-Norm does not in general interpolate any of the points.

These properties are not shared by L2-Norm.

* Matrix A is said to be of full rank if all its rows or columns are linearly independent.

Important L1-Norm Properties

Theorem: There is a minimizer of the energy function for which the residuals forat least n values of i, say i1, i2, …, in, where n denotes the rank of the matrix A.

We can say that L1-Norm solution is the median solution while the L2-Norm solution is the mean solution.

n* Rx ∈∑=

1ii1 )x(r)x(E 0)x(r *

Least Absolute Error Implementation

The algorithm is as follows:1. First phase:

Solving the problem using ordinary least-square technique and computing all m residualsSelecting from them the n residuals which are smallest in absolute value

2. Second phase:Discarding the rest of equations, n equations related to selected residuals are solved by minimizing the residuals to zero

ANN implementation is done in three layers using inhibition control circuit.

Phase #1

ANN Architecture for Solving L1-Norm Estimation Problem

Phase #2

Phase #1

Phase #2

Phase #1

Phase #2

Example

Consider matrix A and observation b as below. Find the solution to Ax=b using the least absolute error energy function.

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

16 4 19 3 14 2 11 1 10 0 1

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

101-1 2 1

b, 0bAx, =−

In the first phase all the switches ( S1-S5 ) were closed and the network was able to find the following standard least-squares solution:

In this case it is impossible to select two largest, in absolutevalue, residuals because Phase one was rerun while switch S4 was opened and the network found then

⎥⎥⎥

⎢⎢⎢

5.15.36.0

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

6.04.1

6.06.04.0

)x(r *I

6.0rrr 532 ===

⎥⎥⎥

⎢⎢⎢

3409.16404.29182.0

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

−−

0273.02273.3

016362182.00818.0

)x(r II*

Cichocki’s Circuit Simulation Results

In the second phase ( and third run of the network ) the inhibitive control network has opened the switch S2. So in the third run only switches S1,S3,S5 were closed, and the network found the equilibrium point:

⎥⎥⎥

⎢⎢⎢

375.1750.21

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

0125.2

0375.00

)x(r *,

Cichocki’s Circuit Simulation Results

Residuals for n=3 of the m=5 equations converges to zero in 50 nano-seconds.

Using MATLAB, we observed that zeroing r1,r3 and r5 results in the minimum value of ∑

1ii1 )x(r)x(E

Outline

Conclusion

Great need for real-time solution of linear equations.

Cichocki’s proposal ANN is different from classical ANNs.

Consider a proper energy function, reducing which results in the optimal solution to Ax=b.

‘Proper function’ may have different meaning in different applications.

Standard least square error function gives the optimal answer for Gaussian distribution of error.

Conclusion (Cont.)Least square function doesn’t have a good behavior when having large outliers in observations.

Various energy functions have been proposed to solve the outlierproblem (e.g. logistic function).

Minimax results in the optimal answer for the uniform distribution of error. It also has some implementation and mathematical problemsthat results in an indirect approach to solving the problem.

Least absolute error function has some properties that makes it distinguishable from other error functions.

neural networks for solving systems of linear equations

Documents

simultaneous equations sketching straight lines solving...

~ chapter 2 ~ solving equations algebra i lesson 2-1 solving...

solving polynomial equations - usm...logo1 introduction...

1 solving linear equations · 1 solving linear equations...

reviewpack grade8maths - f2.hcm.edu.vn€¦ · algebraic...

artificial neural networks for solving ordinary and...

solving univariate equations - robert f. wagner graduate...

mrs. heller's math class€¦ · web view2.2 solving...

solving quadratic equations including using the...

1 lesson 1.2.3 solving one-step equations solving one-step...

chapter 1 solving linear equations. 1.1 solving simple...

solving di erential equations using neural...

solving equations

26-dec-15 solving sim. equations graphically solving simple...

solving harmonic elimination equations in multi-level...

chapter 9 solving quadratic equations - big ideas...

artificial neural networks approach for solving …keywords:...

quadratic equations - inyatrust · adfected quadratic...

topics: topic 1: solving linear equations topic 2: solving...

solving linear equations and inequalities solving...