neural networks: model building through linear regression

C H A P T E R 02

MODEL BUILDING THROUGH REGRESSION

CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq M. Mostafa

Computer Science Department

Faculty of Computer & Information Sciences

AIN SHAMS UNIVERSITY

(most of the figures in this presentation are copyrighted to Pearson Education, Inc.)

ASU-CSC445: Neural Networks

Prof. Dr. Mostafa Gadal-Haqq

Introduction

Supervised Learning vs. Regression

Linear Regression Model

Maximum a Posteriori Estimation (MAP)

Computer Experiment

The Minimum-Description-Length Principle

Finite Sample Size Consideration

Model Building Through Regression

Introduction

Regression is a special type of function approximation

There are two types of regression models:

Linear regression: the dependence of the output on the input is defined by a linear function

Nonlinear regression : the dependence of the output on the input is defined by a nonlinear function

x Linear regression Nonlinear regression

Prof. Dr. Mostafa Gadal-Haqq 4

Supervised Learning vs. Regression

Supervised Learning (Classification):

Learn the “right answer” for each data sample.

Regression Problem:

Predict the real-valued output using the data samples.

Introduction

In regression we do the following:

One of the random variables is considered to be of

particular interest and is referred to as a dependant

variable, or response (The output).

The remaining random variables are called

independent variables, or regressor (The input).

The dependence of the response on the regressors

includes an additive error term.

Linear Regression (One variable)

The parameter vector w = [ w0 w1 ] is fixed but unknown;

stationary environment.

Linear regression

01 x wwy

a = slope

b = intercept

Linear Regression (Multiple variables)

The parameter vector w is

fixed but unknown;

stationary environment.

Figure 2.1(a) Unknown stationary stochastic environment.

(b) Linear regression model of the environment.

jjj xwd

TMxxx ],...,,[ 21x

Preliminary Considerations:

With the environment being stochastic, it follows that: the

regressor x, the response d, and the expectational error are sample values of the random variables X, D, and E.

Then, we can state the problem as follows:

Given the joint statistics of the regressor X and the corresponding response D, estimate the unknown parameter vector w.

By joint statistics we mean that we have:

The correlation matrix of the regressor X;

The variance of the desired response D;

The cross-correlation vector of X and D.

It is assumed that the means of both X and D are zero.

How to estimate the parameter vector W?

Maximum A Posteriori (MAP)

Least Squares Estimation (LS)

Regularized Least Squares Estimation (RLS)

Maximum A Posteriori (MAP) Estimation

Estimation of the parameter vector w:

The regressor X bears no relation to the parameter vector w.

Information about w is contained in the desired response D.

Then we focus on the joint probability density of w and D conditional on X:

Which gives a special form of Bayes theorem:

)(),|()(),|()|,( wxwxwxw pdpdpdpdp

)(),|(),|(

Observation density: p(d|w,x), referring to the observation of the environmental response d due to the regressor x, given w. Also, it is called the likelihood l(d|w,x).

Prior: p(w), referring to information about the parameter vector w, prior to any observations.

Posterior density: p(w|d,x), referring to the parameter vector w after observations have been completed.

Evidence: p(d), referring to the information contained in the environmental response.

)(),|(),|(

Since p(d) is a normalization constant, we can write:

The Maximum-Likelihood (ML) estimate of the vector w is:

The Maximum a Posteriori (MAP) estimate of the vector w is:

The MAP is more profound than the ML because the ML estimator relies solely on the observation model (d, x), which may lead to non-unique solution. The MAP estimator enforce uniqueness and stability to the solution by including p(w).

)(),|(),|( wxwxw pdldp

),|(maxarg xwww

Parameter Estimation in Gaussian Environment:

Let we have a total of N samples of the training data pairs (x, d). We have to make the following three assumptions:

1. IID: The N samples are statistically independent and identically distributed (iid)

2. Gaussianity: The environment, generating the training samples, is Gaussian distributed.

1 ),|(

3. Stationarity: The environment is stationary, which mean that the parameter vector w is fixed but unknown.

Substitution in Bayes rule leads to the MAP estimation of the parameter vector as:

Where = 2/2

TidMAP

22 ||||2

1maxˆ wxww

Maximizing the bracket in the previous equation is equivalent to minimizing the quadratic function:

By differentiating w.r.t. w and equating to zero, we get the MAP estimate of w:

Where the M-by-M correlation matrix, Rxx , and the M-by-1

cross-correlation vector , rdx , are given by:

22 ||||2

1)( wxww

)()()(ˆ1

NNN dxxxMAP rIRw

Tjixx dNN

)( and , )( xrxxR

Least-Square (LS) The estimator is obtained by minimizing the least square error

in the parameter vector:

This is identical to the Maximum-likelihood (ML) estimator

But this solution lacks uniqueness and stability.

1)( xww

)()()(ˆ 1 NNN dxxx rRw

Regularized Least-Square (RLS) To overcome this, we add a structural regularization

term, ||w||2, to obtain the regularized least-square estimator:

structural regularization term, ||w||2, to obtain the regularized least-square estimator:

Which is identical to the MAP estimator. is called a regularization parameter. If ~0, it means that we have complete confidence in the data; if ~ then we have no confidence in the data.

)()()(ˆ1

NNN dxxx rIRw

Tid xww

Computer Experiment

Figure 2.2 Least Squares classification of the double-moon of Fig. 1.8 with

distance d = 1.

Computer Experiment

Figure 2.3 Least-squares classification of the double-moon of Fig. 1.8 with

distance d = –4.

•Problems:

•2.1, 2.2

•Computer Experiment

•2.8, 2.10

Homework 2

The Least Mean Square Algorithm

Next Time

neural networks: model building through linear regression

Education

basic statistics linear regression. x y simple linear...

multiple linear regression - analysis made easy · multiple...

multiple linear regression and artificial neural networks

1 curve-fitting polynomial interpolation. 2 curve fitting...

chapter 13: simple linear regression. 2 simple regression...

linear regression: part 1 - ntnu · linear regression: part...

1 curve-fitting interpolation. 2 curve fitting regression...

linear regression - wharton finance - finance...

neural network...

ml with tensorflow - github...

regression analysis linear regression logistic regression

the local elasticity of neural...

regression linear regression

megha byali, harsh chaudhari*, arpita patra, and ajith...

machine learning: linear regression and neural networks

linear regression

application of linear regression, artificial neural...

chapter 8 linear regression. objectives & learning goals...

artiﬁcial neural networks and multiple linear regression...

linear-regression convolutional neural network for fully...