neural network

Artificial Neural Network

➲ 1 Brief Introduction➲ 2 Backpropogation Algorithm➲ 3 A Simply Illustration

Chapter 1 Brief Introduction

➲ 1.2 Review to Decision Tree Learning process is to reduce the error, which can

be understood as the difference between the target and output values from learning structure.

ID3 Algorithm can be implemented only for discrete values.

Artificial Neural Network (ANN) can describe arbitrary functions.

➲ History

➲ 1.3 Basic Structure This example of ANN learning is provided by

Pomerluau’s(1993) system ALVINN, which uses a learned ANN to steer an autonomous vehicle driving at normal speeds. The input of ANN is a 30x32 grid of pixel intensities obtained from forward-faced camera mounted on the vehicle. The output is the direction in which the vehicle is steered.

As can be seen, 4 units receive inputs directly from all of the 30X32 pixels from the camera in vehicle. These are called ”hidden” units because their outputs are only available to the coming units in the network, but not as apart of the global network.

➲ 1.4 Ability Instances are represented by many attribute-value

pairs. The target function to be learned is defined over instances that can be described by a vector of predefined feature. such as the pixel values in the ALVINN example.

The training examples may contain errors. In following sections we can see, that ANN learning methods are quite robust to noise in training data.

Long training times are acceptable. Compared to decision tree learning, network training algorithm requires longer training time, depending on factors such as the number of the weights in network.

Chapter 2backpropagation Algorithm

➲ 2.1 Sigmoid Like the perceptron, the

sigmoid unit first computes a linear combination of its input.

then the sigmoid unit computes its output with the following function.

This equation 2 is often referred to as the squashing function since it map very large input domain to a small range of output.

this sigmoid function has a useful property that its derivative is easily expressed in terms of its output. In the following description of the backpropagation we can see, the algorithm makes use of this derivative.

➲ 2.2 Function the sigmoid is only one unit in the network, now

we take a look at the whole function, which the neural network calculates. There is a figure 2.2, if we consider an example (x, t), where x is called input attribute and t is called target attribute, than:

➲ 2.3 Squared Error Above it has mentioned, that the whole learning

process is in order to reduce the error, but how can man error describe? Generally the function squared error is used.

Notice: this function 3 sums all the error over all of the networks output units after a whole set of training examples has been computed.

then the value-vector can be updated by:

where E(~w) is the gradient of E:∇

so for each value k can be updated by:

But in practice, because the function 3 sums all the error over a whole set of the training data, so need the algorithm with this function more time to compute, and can easily be effected by local minimum, so construct man a new function, named stochastic squared error:

As can be seen, the function computes error only about a example. The gradient of Ed(~w) is easily made out:

➲ 2.4 Backpropagation Algorithm The learning problem faced by Backpropagation is

to search a large hypothesis space defined by all possible weight values for all the units in the network. The diagram of Algorithm is:

Notice: the error term for hidden unit h is calculated by summing the error terms σ_k for each output unit influenced by unit h, weighting each of the σ_k’s by w_kh,the weight from hidden unit h to output unit k. This weight characterizes the degree to which hidden unit h is ”responsible for” the error in output unit k.

Chapter 3 A Simple IllustrationNow we make an example to give a more inductive knowledge. How does ANN learn the most simply function, a identity id. We construct the network shown in figure. There are eight network input units, which are connected to three hidden units, which are in turn connected to eight output units. Because of this structure, the three hidden units will be forced to represent the eight input values in some way that captures their relevant features, so that this hidden layer representation can be used by the output units to compute the correct target values.

This 8 x 3 x 8 network was trained to learn the identity function. After 5000training times, the three hidden unit values encode the eight distinct inputs using the encoding shown in the tabular. Notice if the encoded values are rounded to zero or one, the result is the standard binary encoding for 8 distinct values.