image processing for driver vigilance system using ann

130
CONTENTS LIST OF ABBREVIATION......................I LIST OF SYMBOLS..........................II LIST OF FIGURES.........................III ABSTRACT..................................1 1.INTRODUCTION............................. 1.1General...............................1 1.2Scope and Motivation..................2 1.3Literature Review.....................3 1.3.1 Detection Techniques 4 1.3.2 Tracking Techniques 7 1.4Problem statement....................13 1.5Objectives of dissertation...........14 1.6Organization of Dissertation.........15 1.7List of Publications.................16 1.8References...........................16 2.THEORETICAL BACKGROUND................... 2.1Fatigue..............................19 2.1.1 Weariness, Tiredness, or lack of energy is Fatigue 19

Upload: vidya-sisale

Post on 05-Feb-2016

228 views

Category:

Documents


0 download

DESCRIPTION

International statistics shows that a large number of road accidents are caused by driver fatigue. The main objective of our work is to implement an eye blink detection algorithm that is applicable in real time with standard cameras, in a real context such as people driving a car. Therefore, a system that can detect oncoming driver fatigue and issue timely warning could help to prevent many accidents. In this system camera used that points directly towards the driver’s eye in order to detect fatigue. Captured eye images trained with neural network’s back propagation algorithm. A neural classifier has been trained to recognize the two classes eyes opened or eyes closed using a number of eye images of the people. The decision whether the driver is fatigued or not is taken depending on whether the eyes are open or closed. If the eyes are found to be closed subsequently the driver is alerted with an alarm. Higher accuracy of the eye blink detection has been attained because of the higher image resolution made possible and that we used reliable facial feature such as iris & pupil contour.

TRANSCRIPT

CONTENTS

ILIST OF ABBREVIATION

IILIST OF SYMBOLS

IIILIST OF FIGURES

1ABSTRACT

1.INTRODUCTION

11.1General

21.2Scope and Motivation

31.3Literature Review

41.3.1Detection Techniques

71.3.2Tracking Techniques

131.4Problem statement

141.5Objectives of dissertation

151.6Organization of Dissertation

161.7List of Publications

161.8References

2.THEORETICAL BACKGROUND

192.1Fatigue

192.1.1Weariness, Tiredness, or lack of energy is Fatigue

192.1.2Drowsiness is common in Driving

192.1.3The Danger of Driving While Feeling Drowsy

202.1.4The Risk of Drowsy Driving Crashes

202.1.5The Symptom of Drowsy Driver

212.2Causes for drivers vigilance

212.2.1Sleep

222.2.2Sleepiness and Impairment

222.2.3Human Eye and Its Behavior

242.3Digital Image Processing

242.3.1Image Acquisition

242.3.2Pre-Processing

242.3.3Image Size Normalization

252.3.4Histogram Equalization

252.3.5Median Filtering

252.3.6Background Removal

252.3.7Illumination normalization

252.4Image Features

252.4.1Global Features

262.4.2Local Features

262.4.3Relational Features

262.5Introduction to Neural Network

262.5.1Artificial Neural Network

272.5.2Application of neural networks

272.5.3Other advantages include

272.5.4Neural networks versus conventional computers

282.5.5Training Algorithms for Neural Network

292.5.6Supervised Training

352.6Pre & Post processing steps in training

352.6.1Collect and Prepare the Data

352.6.2Preprocessing and Post processing

362.6.3Representing Unknown or Don't Care Targets

372.6.4Dividing the Data

372.7How to use Back Propagation

402.7.1Problems with Back propagation

412.7.2Network size

422.7.3Strengths and weaknesses

432.8References

3.IMPLEMENTATION OF IMAGE ENHANCEMENT TECHNIQUES FOR FEATURE EXTRACTION

463.1Preprocessing

473.1.1Image noise

483.1.2Salt-and-pepper noise

493.2Median Filtering

49Results of median filtering:

503.3Histogram

50Results for histogram:

503.4Histogram Equalization

513.5Histogram Equalization (Results)

523.6Edge Detection

533.7Iris detection

533.7.1Segmentation

543.7.2Daugman's Integro-differential Operator

543.7.3Image Normalization

553.7.4Daugman's Cartesian to Polar Transform

553.7.5Daugmans rubber sheet model

4.IMPLEMENTATION OF NEURAL NETWORK FOR EYE BLINK DETECTION

584.1ANN Training

604.1.1Input to Neural Network:

604.1.2Target given to neural network

614.1.3Database viewer:

634.2Neural Network Result:

654.3Confusion matrix for NN training

664.4Result of confusion matrix

674.5Testing results for offline images

684.6Testing results for real time images

5.SUMMARY, CONCLUSIONS AND FUTURE SCOPE

705.1Summary

735.2Conclusion

745.3Future work

LIST OF ABBREVIATIONPERCLOS

Percentage of eye closure

ANN

Artificial Neural Network

CCD

Charge Coupled Device

NHTSA

National Highway Traffic Safety Administration

LCD

Liquid Crystal Display

LW

Layer Weight

IW

Input Weight

N/W MODEL

Network Model

Dot

Department Of Transportation

MSE

Mean Square Error

Webcam

Web Camera

GUI

Graphic User Interface

Led

Light Emitting Diode

PIN

Personal Identification NumberTV

TelevisionCGB conjugate gradient back propagation ()

BFGS

Broyden-Fletcher-Goldfarb-Shanno memory Less quasi-Newton algorithm).

LIST OF SYMBOLSW

Weight Matrix

P

Input Vector

B

Bias Vector

A

Output vector

F

Transfer function

R

Number of elements in input vector

S

Number of neurons in layers

T

Target matrix

r

radius of iris

u

horizontal component of the flow at image point p(x,y)

v

vertical component of the flow at image point p(x,y)

Nan

dont care valueLIST OF FIGURES10Figure 1.1: Detailed Template used by Moriayana (et al.2004& Cohn et al.2004)

22Figure 2.1: Human Eye (http://www.medicinenet.com)

23Figure 2.2: Human Eye undergoing blinking

24Figure 2.3: Fundamental steps in digital image Processing

31Figure 2.4: Performance graph

31Figure 2.5: Multilayer Neural network

33Figure 2.6: Neural network processing

35Figure 2.7: Inputs & Target

35Figure 2.8: Input & target applied to network

36Figure 2.9: Flowchart for training

37Figure 2.10: Network Error

38Figure 2.11:Local minima

44Figure 3.1: Color image to gray image

45Figure 3.2 :Salt & pepper noise

46Figure 3.3: Results for median Filtering

47Figure 3.4: Gray image & its histogram

49Figure 3.5: Histogram Equalization Results

50Figure 3.6: Result for edge detection

53Figure 3.7: Normalization process

53Figure 3.8:Unwrapping the iris

54Figure 3.9: Results for Iris locate

56Figure 4.1: Neural Network Training Database Images

58Figure 4.2 :Close image features

59Figure 4.3: Close image features

59Figure 4.4: Close image features

60Figure 4.5: NN Tool Screen Shot

61Figure 4.6: NN Performance Class

61Figure 4.7: Training state

62Figure 4.8: Error histogram

63Figure 4.9: NN Confusion-Matrix

64Figure 4.10: Testing for open eye

64Figure 4.11: Testing for close eye

65Figure 4.12: Open eye for testing

65Figure 4.13: Open eye for testing

65Figure 4.14: Open eye for testing

66Figure 4.15: Close eye for testing

66Figure 4.16: Close eye for testing

66Figure 4.17: Close eye image for test

List of tables9Table 1 : Parameters describing the movement in the eye region

46Table 2 : Pixel values

57Table 3 : Extracted Features

57Table 4 : Target given to neural network

58Table 5 : Test T

58Table 6 : Output Y

63Table 7 : Confusion Table

ABSTRACT

International statistics shows that a large number of road accidents are caused by driver fatigue. The main objective of our work is to implement an eye blink detection algorithm that is applicable in real time with standard cameras, in a real context such as people driving a car. Therefore, a system that can detect oncoming driver fatigue and issue timely warning could help to prevent many accidents.

In this system camera used that points directly towards the drivers eye in order to detect fatigue. Captured eye images trained with neural networks back propagation algorithm. A neural classifier has been trained to recognize the two classes eyes opened or eyes closed using a number of eye images of the people. The decision whether the driver is fatigued or not is taken depending on whether the eyes are open or closed. If the eyes are found to be closed subsequently the driver is alerted with an alarm. Higher accuracy of the eye blink detection has been attained because of the higher image resolution made possible and that we used reliable facial feature such as iris & pupil contour. Chapter 1

1. INTRODUCTION__________________________________________________________1.1 General

The ever-increasing number of traffic accidents that are due to a diminished drivers vigilance level has become a problem of serious concern to the society. Drivers with a diminished vigilance level suffer from a marked decline in their perception, recognition, and vehicle-control abilities and, therefore, pose a serious danger to their own life and the lives of other people. Statistics shows that a leading cause of fatal or injury-causing traffic accidents is due to drivers with a diminished vigilance level. In the trucking industry, 57% of fatal truck accidents are due to drivers fatigue. It is the number one cause of heavy truck crashes. Seventy percent of drivers report driving fatigued. The National Highway Traffic Safety Administration (NHTSA) estimates that there are 1, 00, 000 crashes that are caused by drowsy drivers and result in more than 1,500 fatalities and 71, 000 injuries each year. The increasing number of cars has brought together the issue of traffic and accident on the road.

With the ever-growing traffic conditions, this problem will further increase. So that for this reason, Developing systems, which actively monitoring a drivers level of vigilance and alerting the driver of any insecure driving conditions are essential for accident prevention. Many efforts have reported in the literature, for developing active safety systems for reducing the number of automobile accidents due to reduced vigilance. With the ongoing reduction in size and cost of computing and optical monitoring equipment, many governmental and commercial groups are attempting to develop "intelligent automobiles." According to a review of the field, most major automakers have been pursuing some sort of drowsy driver detection, using methods ranging from monitoring the weaving of the car using yaw rate sensors to monitoring the driver's eye with an in-car camera.

Particularly notable is the U.S. Department of Transportation's Intelligent Vehicle Initiative. Intelligent cars are cars that can respond to the state of the driver and increase both safety and convenience. As 90% of accidents occur due to driver error, devices that help effective accident avoidance could realize a great savings in human life and financial loss.

Driver drowsiness is one specific form of human error that has been well studied. Studies have shown that immediately prior to fatigue-induced accidents, the driver's eye exhibits a change in blinking behavior. Specifically, the frequency of blinking increases and the percentage of the eye covered by the lid increases.

As the eye closure occurrences dramatically increase during the 10-second period preceding an accident, monitoring such closures could allow the car to take some form of automated response to wake up the driver, e.g. a loud noise, a bright light, possibly even the activation of an "autopilot" if that capability is developed. It is also known that the duration of the eye closures one minute before an accident is much higher than at earlier times.

Consequently, we attempted to devise a camera & image processor system to take images of a driver's eye and then attempted to process those images to determine whether the eyes were open or closed.

1.2 Scope and MotivationIf the computer could see with human eyes and understand with human brain at all levels, we could convey complex messages to it faster and easier. Image Analysis is a science, which teaches the computer to see and understand, what is captured on the image as humans do. Besides the more natural way of human-computer interaction, the computer itself becomes a more useful tool when it can collect and process visual information about the environment without needing our help it can warn us in emergent situations.

This thesis is a step towards the idea of improving the use of the computer in sense of preventing dangerous situations, namely not allowing a car or truck driver to fall asleep. Face recognition problem emerges the research efforts more than 20 years. A reason for this trend is a strong demand of user-friendly identification system in a wide range of commercial and police applications. Nowadays, one needs PIN to get cash from cash-point machines, a password for the computer, a dozen others to access the Internet, a thousand of secure codes to enter in his office and so on. A personal identification system, which analyses the facial images, is no intrusive and user-friendly. Having already faster computers and strong background in the field the scientists turn to real-time systems.The main motivation for this dissertation comes from the observation that none of the existing techniques fulfills all of the following requirements of a driver vigilance monitoring system:

1. A non-intrusive monitoring system that will not distract the driver or compromise privacy.

2. A real-time monitoring system, to ensure accuracy in detecting lowered levels of driver alertness.

3. The system performance should not be significantly influenced by environmental conditions (traffic, landscape, weather, and darkness).

4. The system must have low unit and operation (including data processing) cost, since automotive buyers may not be willing to pay high prices for an alertness monitoring system.

As an alternative to costly and computationally extensive existing image processing based techniques, we focus on the computer based driver fatigue detection.1.3 Literature ReviewMany techniques for eye detection and tracking have been developed in past. They classified in several subcategories as, feature-based schemes, template matching, active contours, neural networks, and so on. Local geometrical feature-based schemes are based on, the extraction relative position of the geometrical features and tracking of a set of geometrical distinctive features from the picture. The parameters calculated for these features, are further processed to fulfill the aim of the current task. Edges and colors are most often image features that analyzed in image processing. Template matching is another possible approach. The simplest way is to compare two-dimensional arrays for correspondence. However, variations with comparing the images at different resolution or comparing only patches of the whole image are also possible.

The literature overview aims to summarize the methods, used to solve three main problems at hand. (Eye feature detection, eye feature tracking, and eye blinking detection in the similar application domains). The running time, accuracy and the conditions under which the method is tested are the techniques used for evaluate and compare.1.3.1 Detection Techniques

Sirohey et al., 2002 [10] propose a feature-based method for detecting the irises and the eyelids. The input on every frame is eye regions of both eyes and eye corners, extracted on the colorful images. Edges detected, cleaned and labeled on, converted grayscale image sequences. They first detect the iris, then find the eye corners and finally localize the eyelids. To locate the iris, a semicircular template with certain radius is used. The anthropometric measures define radius, as iris diameter should be 1/3 of the length of the eye. The last constraint for iris detection is that the directions of the edge gradient and the normal to the annulus should not differ more than /6. For finding the eye corners, the authors refer to their previous publication, titled Eye detection in a face image using linear and nonlinear filters. It discusses the face segmentation but eye corners localization as well. It discovered that applying non-linear filter method in the RGB images giving no false alarms, also the corners are detection done in 90% of the cases. Third-degree polynomial used, to model the eyelids. The segments, which can fitted to such polynomials and are more than 1/3 of the horizontal length of the eye, considered as candidates. Combinations of two or three segments are also possible, but they have to not overlap along horizontal axis and have the same sign of curvature. To pick up the edge or combination of edges, which represent the best the upper eyelid; they consider all edge segments close above to iris center. These segments must curve downward and do not have large slopes. Finally, the segment chosen is with greatest number of pixels.

The authors apply this scheme as tracking technique, as they called it frame-to frame, i.e. the same method is used to detect the features on each frame. They use two sequences of 120 frames of one subject with (on first sequence) and without glasses (on second). The frames where the eye is closed excluded. Although, used as tracking technique, this gives the idea of the accuracy of the method. The successful locations of the eyelids are over 95% of cases, but the eye corners found only in 80% of the cases. As a disadvantage of the presented method, one can mention the absence of the information how the lower eyelid modeled or extracted.

Detecting the color of skin in red channel is a widely used technique. The approach by Vezhnevets and Degtiareva, 2003 [15] for extracting the eyelids contour is based on the same idea. In their paper, they detect the iris as well. It is done in two steps. First, the approximate location of the iris center is found as

1) If there are strong highlights, the illuminated pixels and the dark pixels around them are marked. The center of whole sets of pixels is the approximation of the iris center.

2) If there are no strong highlights a circular filter is applied. The mean position of 5% of darkest pixels gives the approximate center of the iris.

Next step of detection the iris is its accurate extraction.

The eyelids are detected in several steps as well. Vertical integral projection is calculated for the rows, which crosses the iris. The area of low values of this histogram corresponds to the visible area of the iris. For these rows outside of the iris in both direction is looking for the points of

1) Sharp luminance increase, which means that the skin is reached or

2) Of local minimum, with intensity values less than the lines brightest sclera points minus certain threshold.

These points sets are approximate by nearly vertical lines one for left-side set and one for the right-side set, to remove the artifacts. Then the leftmost and the rightmost points had chosen for the eye corners. The rest of the points are divided for the upper eyelid (all above the eye corners) and for lower eyelid (all below the eye corners). The eyelids are approximate by cubic polynomial (upper eyelid) and quadratic polynomial (lower eyelid) curves by the least squares fitting procedure.

The paper presents fully automatic approach, which is said to be robust and accurate, without giving any quantity evidence for this. It is a simple method for detecting the eye features and seems to be not time consuming. Its disadvantage is that it works for open eye but not helpful when the eye is closed.

Paradas, 2000, propose two different methods for extraction and tracking of the eyelids [25]. He uses method based on the finding the minimal path in the graph to detect the eyelids if the eye corners are known. The model for the eye consists of two curves one, for the lower eyelid, with one minimum and one, for the upper eyelid, with one maximum. The eye regions extracted based on the well-known spatial configuration of the facial features in the face. Image processing starts with filtering of the images. Then all candidates for the eye corners using deformable line templates found. The specific pattern consists of three-ordered segment, where the segments cannot deform, but their configuration can in order to find the corresponding eye shape. For each pair of candidate corner points a minimal algorithm is applied, namely looking for the two minimal paths between the corners. The knowledge about the shape of the eyelids is incorporated in the following way. It is known that the lower eyelid is descending up to a horizontal position around the middle point between A and B and vice versa for the upper. A pair of corner points which path costs the least is chosen for eye corners and both paths are the extraction of the eyelids.

The paper does not report under what conditions the detection method is tested. It also does not mention the running time of the algorithm, but looking for the minimum path between all corner candidates seems to be exhaustive search.

Fenget al., 1998 [18] explore variance projection functions for locating landmarks of the human eye. The eye model consist of three components iris, upper and lower eyelid. They assume that the upper and the lower boundaries of the iris Coincide with the apex of the eyelids. Thus, the model can be recovered by six landmarks on the eye region, namely the eye corners, the upper, lower, left and right boundaries of the iris. Changes in variance projection functions along horizontal direction used to detect vertical positions of the eye corners and left and right borders of the iris. The exact position of each corner and iris boundary is detected.

Among the pixels where x coordinate found which has a maximum gradient. Similarly are detected the upper and lower iris or eyelid boundaries. The iris is restored by it is for boundary points. The eyelids are parabolas, which are reconstructed by 3 points the left, right eye corner and the apex.

The algorithm is tested on 48x30 images of human eyes and they estimate the results as encouraging without citing success rate. One frame processed in less than 0.4 seconds on a 166MHz Pentium processor.

Gu et. al. [16] applies a SUSAN edge and corner detector for automatically detecting of facial feature points. The paper proposed the algorithm for extraction of the both eyeballs, eye corners, midpoints of the nostrils and mouth corners. Here only the algorithm for the eye regions summarized. The eye regions extracted with directional (vertical) integral projection functions over the face images. The eye regions are binaries using auto-adapted thresholds. A SUSAN edge detector applied over these images. The iris and the upper eyelid segmented out of the edge map. The eye corners are the corners of the boundary curve, which encloses the eyelid and the iris. The method tested against frontal faces, rotated on small and bigger angles on the left and right side, variety on light and expressions. The evaluation made by manually marking the facial features on the images. If the extracted feature location differs with a maximum of five pixels, it is accepted. The overall accuracy for all facial feature points is 95%. The inner corners detected in 98.52% (left eye) / 95.93% (right eye) and 97.41% for the outer corner on both eyes. The smaller percentage for the left eye might be because the left eyeball found on 99.63% of the cases. These results are for frontal images. The results for faces turned on small angles ( 0.1 downwards

Divergence a1+a5> 0.02 expansion

< -0.02 contraction

Deformation a1 - a5> 0.005 horizontal deformation

< -0.005 vertical deformation

Table 1 : Parameters describing the movement in the eye region

The curves of all three plotted against time and observed for local maximums and minimums. The changes in function have to appear nearly on the same time. An eye blink is detected when translation max &divergence min. & deformation max. The reported accuracy of 88% for artificial sequences and 73% for TV movies is measured for all facial expressions. Unfortunately, the achieved processing time is 2 min/frame, which is not applicable for real-time applications.

Cohn et al., 2004 [13] and Moriayama et al., 2004 [12] present different aspects if the same system, where a carefully detailed generative eye model (see Figure 1.1) is used. A template built with the usage of two types of parameters structure and motion. The structure parameters describe the appearance of the eye region, capturing all its racial, individual and age variations. This includes size and color of the iris, sclera, dark regions near left and right corners, the eyelids, width, and boldness of the double-fold eyelid, width of the bulge below the eye, width of the illumination reflection on the bulge and furrow below the bugle. Motion parameters describe the changes during time. Traditionally, movement of the iris is described by 2D position of its center. Closing and opening of the eye shown with the height of the eyelids. The skew of the upper eyelid is also motion parameter to catch changing of the upper eyelid when the eyeball is moving.

Figure 1.1: Detailed Template used by Moriayana (et al.2004& Cohn et al.2004)Unfortunately, the structure parameters are not implemented automatically. The model is individualized by manually adjusting the structural parameters. From the initialization, they derived structural parameters, which remain fixed for the entire sequence later. Further, the features tracked with iterative minimization of mean square error between the input image and the template, obtained by current motion parameters.

In the first paper by Cohn et al. [13], the model for tracking the eye features and blink detection is a part of a system for automatic recognizing of the embarrassing smiles. They tested a hypothesis that there is correlation between head movement, eye gaze and lip displacement during embarrassing smiles. That is why probably the accuracy of the tracking method not measured and not reported.

The second one reports for failure in only two image sequences from 576, which happens due to the head tracker. The database includes a variety of subjects from different ethnic groups, ages and gender. The in-plane and limited out-of-plane motion is included.

Paradas, 2000, apply an active contour technique [25] to track the eyelids. The model for the eye consists of two curves one, for the lower eyelid, with one minimum and one, for the upper eyelid, with one maximum. Tracking of the eyelids is done with active contour technique whereas the motion is embedded in the energy minimization process of the snakes. A closed snake, which tracks the eyelids, built by selection a small percentage of the pixels along the contours obtained during initialization or tracked on the previous frame. Among these points are the eyes corners. Motion compensation errors computed for each snaxel (x0, y0) within given range of allowed displacement (,). Those pixels (,), which produce the smaller computational error, are selected as possible candidates of the axes (x0, y0) at the current frame. A two-steps dynamic programming algorithm runs for these candidates. The paper does not report anything about the running time of the algorithm. The author only mentions that it is stable against blinking, head translation and rotation, up to the extent, where the eyes are visible. Blinking Detection Techniques Very briefly, I would like to mention the ways, in which the authors of the revised papers detect blinking.

In Tianet al., 2000 [8] blinking is detected if the iris is not visible. This is not appropriate way. If it assumed that the method of iris detection never fails, and thus gives false alarms, misclassification might occur because of eye or iris occlusion because of the head rotation.

Sirohey et al., 2002 [10] detects blinking occurrence as the height of the apex of the upper eyelid from iris center, which might be a consequence of not tracking the lower eyelid.

The extension of Tians approach (Tian et al., 2000 [6]) is a paper by Cohn et al., 2002 [13]. It focuses on blink detection, and not on locating the eye features. The eye region defined on the first frame manually picking 4 points the eye corners the center point of the upper eyelid and a point straight under it. It stays the same within the whole image sequence, because the face region stabilized. Eye region divided into two portions the upper and the lower by the line connecting the eye corners. Blink detection relays on the fact that the intensity distributions of the upper and the lower part change when the eye is opening and closing. The upper part consists of sclera, pupil, eyelash, iris and skin. For all of them only the first and the last (sclera, skin) contribute for increasing of the average intensity values. When the upper eyelid is closing, the eyelash moved in the lower region and the pupil and iris replaced by brighter skin, which leads to increasing the overall intensity of the average intensity of the upper portion and simultaneously decreasing the average intensity of the lower. The average grayscale intensities of the both portions plotted against time.

The eye closed when the curve of the upper has a maximum. The blink is detecting also by counting the number of crossings and the number of peaks in order to distinguish between blinking and eyelid flatter. If the blinking is undergoing between two neighbor crossings there is only one peak, otherwise the peaks are more than one. The approach works accurate. If three different eye actions are distinguished open eye, closed eye and fluttering the accuracy is 98% and if fluttering and blinking are not distinguished, the accuracy is 100%. Unfortunately, nothing is mentioned about testing against changing the illumination conditions or make-up wearing. The other disadvantage is that it is not precise in finding eye features.

Grauman, 2001, uses correlation with a template of the persons eye in the paper [9] as classifying the state of the eye. The difference image during first several blinks is used to detect the eye regions. The candidates discarded based on the anthropomorphic measures, as distances between the blobs, their width and height should keep a certain ration and others. The remaining pairs candidates were classifying on the Mahalanobis distance between their parameter vector and a mean vector of blink-pair property vector. The bounding box of the detected eye region determines the template. Further, it is decided for eye blinking by calculating the correlation between this template and the image on the current frame. As the eye closes, it begins to look less like the template eye and otherwise when it reopens - more and more similar. The correlation score ranging between 0.85 and 1 classifies the eye as open, the range between 0.55 and 0.8 as closed eye and less than 0.4 the tracker is lost.

Again, the technique is appropriate only for blink detection, not on precise eye feature extraction and tracking. The reported overall detection accuracy is 95.6% on average 28 frames per second. This result might change for a longer image sequences as using for detection driver drowsiness, because a template for a single person expires in time, when the person gets tired.

The active deformable model is a technique, which is used by Ramadan et al. [14] to track the iris. A statistical pressure snake, where the internal forces are eliminated, tracks the pupil. The snake expands and closes the pupil. When the upper eyelid occludes the pupil, i.e. the blink is undergoing, and the snake collapses. The duration of snake collapse is measurement for blink. After reopening the eye, the snake can expand itself again if the iris position is not changes during blinking.

Otherwise, it has to be initialized (position) manually. Although they reported very high accuracy of the tracking method, the system suffers by several disadvantages. The main problem is manual initialization and re-initialization. Further, the way of measuring blinking also does not seem to be very reliable. The snake might collapse in case of saccades, which will be misunderstanding with blinking. The third one is the position of the camera. It is attached to the head, which restricts to the head movements and makes the equipment to applicable for drivers.

1.4 Problem statement

Drowsiness of a driver is a major problem for road safety. The national Transportation & safety Board has stated that sleepiness while driving is one of the most important contributing factors for road crashes, and it has been indicated that 20-30 per cent of all heavy vehicle can be directly or indirectly attributed to fatigue-related impairment of the driver.

It has also been demonstrated in post-crash interviews that following are major predictors of road crashes.

i) The sleepiness level before the crash

ii) Less than 5 hours of prior sleep and iii) Night driving

Long before a fatigued driver falls asleep; he or she suffers from degraded decision-making capabilities, decreased co-ordination, reduced reaction time, protection & decreased memory & mental functioning. For commercial operators such as truck drivers, it is possible to reduce the amount of sleepy driver with regulations, education & fatigue management strategies .In addition to these preventive measures, there is need to automatically detect if the driver turns sleepy.

With this background, developing systems for monitoring the drivers level of vigilance and alerting the driver when he is not paying adequate attention to the road is essential in order to prevent accidents.

Objective of the project is to detect the divers fatigue state & alert him. To detect it, we are going to capture the image-using camera, localize the eyes, and extract the features of eyes. Extracted features are input to neural network & output of neural network is two classes, open eye & close eye. Neural network will be trained by using supervised learning back propagation algorithm. Even though the input image is going to be changed, the neural network will accurately classify the image. 1.5 Objectives of dissertationThe main objective of our work is to propose an algorithm to detect drivers drowsiness that is applicable in real time with standard cameras, in a real context such as people driving a car.

The work is focused by keeping in view the following objectives Create the database for eye images. Preprocessing of images-

Image is converted in to gray image. Noise from the image, is removed by using median filtering. When working in real time Salt-and-pepper noise may be added, to have accurate results noise should be removed.

Feature extraction-

Histogram is a graph showing the number of pixelsin an image at each different intensity value found in that image. For an 8-bitgrayscale image,there are 256 different possible intensities.

Edge Detection assembles the edge pixels into meaningful edges. We have used canny edge detection method. The Canny method differs from the other edge-detection methods in that it uses two different thresholds (to detect strong and weak edges), and includes the weak edges in the output only if they are connected to strong edges. Iris detection-

In segmentation, it is desired to distinguish the iris texture from the rest of the image. An iris is normally segmented by detecting its inner (pupil) and outer (limbus) boundaries. In our project, we have used Daugmans Integro-differential method for detecting iris.

Extracted features input for neural network. Train the images using artificial neural networks, back propagation algorithm used to train the network.

Select the image for testing from database or capture the image, preprocess it, extract the features &simulate image with the same n/w model & display the result.

If the eye found to be close for a certain number of consecutive frames then alert driver with an alarm.1.6 Organization of Dissertation

The dissertation is structured as:

Chapter 1 give details for necessity of drivers fatigue detection & different methods used earlier are discussed along with literature review. Problem statement & objective of dissertation work is also mentioned.

Chapter 2 gives an overview of scientific work, relevant to the problem at hand. Very briefly, some background knowledge about the eye appearance and mathematical principles are given as well.

Feature extraction results are discussed in chapter 3.Preprocessing results such as color to gray scale conversion, median filtering for removal of noise, histogram, and edge detection are discussed.

The implementation of back propagation algorithm for offline images discussed in chapter 4, which includes Results of Neural network training & results of testing which classifies the image into open or close also discussed. Real time results are also discussed in this chapter.

The fifth chapter summarized the work and gives directions for the future work. In addition, it concludes with the project work.1.7 List of Publications1. Prof.Vidya A.Sisale, Dr.M.S.Chavan Eye blink detection using neural network in a drivers vigilance system Proceedings of the International Conference on Instrumentation (ICI-2009).Vol.40, No.4, December 2010, pp.264-266.

1.8 References

1. Driving Related Facts and Figures http://driveandsurvive.co.uk/cont5.html2. A. Bromme, Real-Time Processing of Eye Movements,http://www.isg.cs.uni-magdeburg.de/bv/forschung/rtv.html

3. D. Gorodnichy, Second Order Change Detection and Its Application to Blink- Controlled Perceptual Interfaces, Proc. of the International Association of Science and Technology for Development (IASTED) Conference on Visualization, Imaging and Image Processing (VIIP 2003). Benalmadena, Spain. September 8- 10, 2003. pp. 140-145.

4. B. Thorslund,Electrooculogram Analysis and Development of a System for Defining Stages of Drowsiness, Ph.D. dissertation, Linkoping University, Linkoping, 2003.

5. Y. Tian, T. Kanade, and J. Cohn, Dual-state parametric eye tracking, Proc.of the 4th IEEE International Conference on Automatic Face and Gesture Recognition (FG'00), March 2000, pp. 110 115.

6. J.F. Cohn, J. Xiao, T. Moriyama, Z. Ambadar, T. Kanade, Automatic recognition of eye blinking in spontaneously occurring behavior, Behavior Research Methods, Instruments, & Computers, 1 August 2003, vol. 35, no. 3, pp. 420-428(9).

7. T. Moriyama, T. Kanade, J. F. Cohn, J. Xiao, Z. Ambadar, J. Gao, H. Imamura, Automatic Recognition of Eye Blinking in Spontaneously Occurring Behavior Proceedings of the 16th International Conference on Pattern Recognition (ICPR '2002), Vol. 4, August, 2002, pp. 78 81.

8. Y. Tian, T. Kanade, and J. F. Cohn, Eye-state Action Unit Detection by Gabor Wavelets, Proceedings of International Conference on Multi-modal Interfaces (ICMI 2000), October 2000.

9. K. Grauman, M. Betke, J. Gips, G. R. Bradski, Communication via Eye Blinks Detection and Duration Analysis in Real Time, Computer Vision and Pattern Recognition, Proceedings of the 2001 IEEE Computer Society Conference.

10. S. Sirohey, A. Rosenfeld, Z. Duric, A method of detecting and tracking irises and eyelids in video, Pattern Recognition, Vol.35 (2002), pp.13891401.11. M. J. Black, Y. Yacoob, Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models of Image Motion, International Journal of Computer Vision, Vol. 25(1), October 1997, pp. 23 48.

12. T. Moriyama, J. Xiao, J. F. Cohn, T. Kanade, Meticulously Detailed Eye Model and its Application to Analysis of Facial Image, Proceedings of the IEEE Conference on Systems, Man, and Cybernetics, 2004, pp. 629 634.

13. J. F. Cohn, L. Ian Reed, T. Moriyama, J. Xiao, K. Schmidt, Zara Ambadar, Multimodal coordination of facial action, head rotation, and eye motion during spontaneous smiles, Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FG'04), Seoul, Korea, 2004, pp. 129 138.

14. S. Ramadan, W. Abd-Almageed and C. E. Smith, Eye tracking using active deformable models Proc. of the IIIrd Indian Conference on Computer Vision, Graphics and Image Processing,2002.

15. V. Vezhnevets, A. Degtiareva, Robust and Accurate Eye Contour Extraction, Proc. Graphicon-2003, pp. 81-84.

16. H. Gu, G. Su, C. Du. Feature Points Extraction from Faces, Image and Vision Computing, 26-28 November, 2003, New Zealand.

17. D. H. Ballard, Generalizing the Hough transform to detect arbitrary shapes,Pattern Recognition, Volume 13, No. 2, 1981, pp. 111-122.

18. G.C. Feng, P. C. Yuen, Variance projection function and its application to eye detection for human face recognition. Pattern Recognition Letters, Volume 19, Issue 9 (July 1998), 1998, pp. 899 906.

19. P. Bourke, Bezier Curves, http://astronomy.swin.edu.au/~pbourke/curves/bezier/

20. Math world, Bezier Curves, http://mathworld.wolfram.com/BezierCurve.html

21. J. Zhu, J. Yang, Sub pixel Eye Gaze Tracking, Proceedings of FG '02, May 2002.

22. M. Pardas, Extraction and Tracking of the Eyelids, International Conference on Acoustics, Speech and Signal Processing ICASSP 2000, 4: 2357-2360, Istambul, Turkey, June 2000.

23. S. Sirohey, A. Rosenfeld, Eye detection in a face image using linear and nonlinear filters, Pattern Recognition 34(2001), pp.13671391.

24. A. Brmme, Type-oriented Data Interface Description of an Image Based System, Internal Report, 27 June 2005.

25. T. Huang, A Rapid Tracking Algorithm of Circular Iris Features Based on Hough Transform, Master Thesis at University of Magdeburg, 2005; pp.122.

26. D. Denney, C. Denney, The eye blink electro-oculogram, British Journal of Ophthalmol 1984; Vol. 68, pp. 225-228.

27. J. Canny, A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Nov. 1986, Vol. 8, No. 6, pp. 679-698.

28. Basler A500k Series Users Manual, Basler Vision Technologies, August 12, 2003.

29. Matlab help, http://www.mathworks.com/

30. R. Wildes, Iris Recognition, an Emerging Biometric Technology. Proceedings of the IEEE, Vol.85, No.9, September 1997, pp. 1347-1347.Chapter 2

2. THEORETICAL BACKGROUND

______________________________________________

2.1 Fatigue

2.1.1 Weariness, Tiredness, or lack of energy is Fatigue

Fatigue is different fromdrowsiness. In general, drowsiness is feeling the need to sleep. Fatigue is a lack of energy and motivation. Drowsiness and apathy (a feeling of not caring about what happens) can be symptoms that go along with fatigue.

Fatigue can be a normal and important response to physical exertion, emotional stress, boredom, or lack of sleep. However, it can also be a sign of a more serious mental or physical condition. Fatigue is a common symptom, and it is usually not due to a serious disease [25].

2.1.2 Drowsiness is common in DrivingGovernment estimation indicates that sleepiness contributes to just above 1% of road fatalities. However, fatigue more likely underestimated because of following point:

1) Police officers often untrained to recognize the signs of drowsy driving in a crash.

2) Sleepiness often not counted as a potential cause of an accident.

3) There is no breath test or blood test to test the level of sleepiness like alcohol intoxication.

In fact, some researchers have estimated that sleepiness plays a major role in as many as 25% of accidents on the highway each year. Drowsy driving crashes are more likely to occur at night or in the mid-after noon (a period of which greater sleepiness occurs), to occur on roads with higher speed limits, to involve a single vehicle running off the road, and to result in serious injuries [25]. The driver is often alone and makes no effort to hit the brake or to take other evasive action.

2.1.3 The Danger of Driving While Feeling Drowsy Drowsy drivers have slower reaction times, putting themselves and others in danger when they encounter unusual, unexpected, or emergency situations. Drowsy drivers also have reduced vigilance, which subjects them to perform worse on attention-based tasks when sleep deprived. Drowsiness reduces both the ability to process information and the accuracy of short-term memory. Thus, a drowsy driver may not remember the previous few minutes of driving and will be slower in evaluating the oncoming.

2.1.4 The Risk of Drowsy Driving Crashes No driver exempted from the risk of a drowsiness-induced car crash, however there are certain factors known to increase the probability of such an accident. These include:

1) Being chronically sleepy - Failure to get enough sleep on a regular basis to feel refreshed during the day is perhaps the greatest risk factor for having a drowsy driving crash.

2) Having a job that involves rotating shifts - Shift workers are likely at greater risk because they have high rates of daytime sleepiness.

3) Doing a lot of highway driving - This is likely one of the main reasons why commercial truck drivers are more likely to have sleepy driving accidents than other drivers.

4) Sleep disorder- Having an undiagnosed or untreated sleep disorder, such as insomnia, and narcolepsy.

5) Drunk & Drive Drinking even small amounts of alcohol, when already tired or taking drugs that creates sleepiness, will significantly increase the chance of having a drowsy driving crash.2.1.5 The Symptom of Drowsy Driver

Sleepiness is warning given by body in result of insufficient rest. In healthy individuals, sleepiness can be the result of

1) Obtaining in adequate or poor quality sleep,

2) Sleeping when the body wants to be awake (e.g., night shift workers).

3) Time of the day: people are likely to feel sleepy in the mid-afternoon and again in the early morning hours (i.e., 1:00 - 5:00 am) regardless of the quantity and quality of previous sleep [23].2.2 Causes for drivers vigilance

2.2.1 SleepHuman Brain needs sleep. Sleep essential and inevitable and not matter of choice. The longer someone remains awake, the greater the need to sleep and the more difficult it is to resist falling asleep. Sleep will eventually overpower the strongest intentions and efforts to stay awake. Hours are common, and 7 to 9 hours sleep is required to optimize Performance. Sleep patterns governed by the circadian rhythm (the body clock) that completes a full cycle approximately once every 24 hours. Humans are usually awake during daylight and asleep during darkness. There are two peaks of sleepiness; the early hours of the morning and the middle of the afternoon.

The loss or disruption of sleep results in sleepiness during periods when the person would usually be fully awake. The loss of even one nights sleep can lead to extreme short-term sleepiness, and continual disrupted sleep can lead to chronic sleepiness. The only effective way to reduce sleepiness is to sleep. Sleeping less than four hours per night impairs performance. The effects of sleep loss are cumulative and regularly losing one or two hours of sleep, a night can lead to chronic sleepiness over time. A wide range of factors, some of which are beyond the individuals control, but some of which are personal choices, can cause sleep loss and sleep disruption [25]:

Hours of work, including long hours and shift work

Family responsibilities

Social activities

Illness, including sleep disorders

Medication

Stress

Todays 24 hour society seems to pressurize many people to sacrifice sleep in favor of other activities, without realizing the negative effects this has on their health and ability to perform a wide range of tasks, including driving.2.2.2 Sleepiness and ImpairmentSleepiness reduces reaction time (a critical element of safe driving). It also reduces vigilance, alertness, and concentration so that the ability to perform attention-based activities (such as driving) is impaired. The speed at which information processed is reduces by sleepiness. The quality of decision-making may also be affected.

2.2.3 Human Eye and Its BehaviorThe most significant feature in the eye is the iris. It is has a ring structure with a large variety of colors. The ring might be not completely visible even if the eye is in its normal state (non-closed or partly closed). Visibility depends on the individual variations. Most often, it is partly occluded above by the upper eyelid. Completely visible or occluded by both eyelids are possible too. The iris changes its position as well from centered rolled to one side or rolled upwards or downwards. Depending on the speed, when the iris is moving from side to side, the motion is called smooth pursuit or saccades. A saccade is a rapid iris movement, which happens when fixation is jumping from one point to another.

Inside the iris is the pupil a smaller dark circle. Its size varies depending on the light conditions. Sclera is the white visible portion of the eyeball. At a glance with unaided eye, it is the brightest part in the eye region, which directly surrounds the iris.

Figure 2.1: Human Eye (http://www.medicinenet.com)

It is a natural act, which represents a closing the eye followed by an opening where the upper eyelid performs most of the movements. Similar to blinking is the eyelid fluttering. This is quick wavering or flapping motions of the upper eyelid. Here blinking and eyelid fluttering are not distinguished, but blinking and eye closings are not synonyms especially in context of safety driving system. To distinguish eye closings from blinking, time taken have into account. Blinking defined as a temporary hiding the iris because of touching of both eyelids within one second whereas closing takes more time.

Blinking frequency affects by different factors like mood, task demand, etc. In stress-free state, the blink rate is 15-20 times per minute. It drops down to 3 times per minute during reading. It increases under stress situation, time-pressure or when close attention is required.

In awake' state, the eyelids are far apart before they closed, they are closed for a short interval and closing the eye (single blink) is repeated rarely. As the person gets tired, the eyelids stay closer to each other, the time when the eye is closed increases and frequency increases as well. The drowsiness characterized by long flat blinks [26].

Figure 2.2: Human Eye undergoing blinking2.3 Digital Image ProcessingIn this section, the fundamental steps required in digital image processing discussed.

The basic steps involve digital image processing is depicted in Figure.

Figure 2.3: Fundamental steps in digital image Processing

2.3.1 Image AcquisitionImage acquisition is the first step in digital image processing. In a simple word, this is the stage where a digital image acquired. The data image may be any image file that comes from several different sources such as monochrome or color TV camera, frame grabber and scanner.

2.3.2 Pre-Processing The purpose of image pre-processing is to improve the image quality hence ensures that the image can be process at the other steps or stage more accurately. There are several important tasks in image pre-processing can be implemented in computer vision system. The tasks involves are image normalization, histogram equalization, median filtering, background removal and illumination normalization. The detail of each task discussed as follows:

2.3.3 Image Size NormalizationThe acquired image usually altered and edited to be in the form of default format. Each of the images must be in the same size, type of file, same mode either grayscale or color mode, and same resolution on which the computer vision systems runs.2.3.4 Histogram EqualizationTo enhance the quality of image and improving the face recognition performance, it is necessary to modify the dynamic range (contrast range) of the image. Consequently, the desired image features will be more apparent.

2.3.5 Median FilteringMedian filtering is one of the methods to reduce the noise information that exist in an image. This method is more effective to deal with strong noise patterns and to preserve the sharpness of edge.

2.3.6 Background RemovalThe background removal method will remove the unnecessary background in the image to ensure only the facial component will used for classification training.

2.3.7 Illumination normalizationThe image captured under different lighting condition (illumination) can affect the classification performance especially for face recognition systems that use the whole face information for recognition [25].2.4 Image Features Many types of features used for object recognition. Most features are based on either regions or boundaries in an image. It assumed that a region or a closed boundary corresponds to an entity that is either an object or a part of an object. Some of the commonly used features are as follows.

2.4.1 Global FeaturesGlobal features usually are some characteristics of regions in images such as area (size), perimeter, Fourier descriptors, and moments. Global features can be obtained either for a region by considering all points within a region, or only for those points on the boundary of a region. In each case, the intent is to find descriptors that are obtained by considering all points, their locations, intensity characteristics, and spatial relations.2.4.2 Local FeaturesLocal features are usually on the boundary of an object or represent a distinguishable small area of a region. Curvature and related properties commonly used as local features. The curvature may be the curvature on a boundary or computed on a surface. The surface may be an intensity surface or a surface in 2.5-dimensional spaces. High curvature points commonly called corners and play an important role in object recognition. Local features can contain a specific shape of a small boundary segment or a surface patch.

2.4.3 Relational FeaturesRelational features based on the relative positions of different entities, region, closed contours, or local features. These features usually include distance between features and relative orientation measurements. These features are very useful in defining composite objects using many regions or local features in images. In most cases, the relative positions of entities are what define objects. The exact same feature, in slightly different relationships may represent entirely different objects [19].

2.5 Introduction to Neural Network

2.5.1 Artificial Neural Network An Artificial Neural Network (ANN) is an information-processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of ANNs as well.2.5.2 Application of neural networks

Either humans or other computer techniques can use neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, to extract patterns and detect trends that are too complex to notice. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyze. This expert can then be used to provide projections given new situations of interest and answer "what if" questions.

2.5.3 Other advantages include

1. Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience. 2. Self-Organization: An ANN can create its own organization or representation of the information it receives during learning time.

3. Real Time Operation: ANN computations carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.

4. Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage [23].

2.5.4 Neural networks versus conventional computersNeural networks take a different approach to problem solving than that of conventional computers. Conventional computers use an algorithmic approach i.e. the computer follows a set of instructions in order to solve a problem. Unless the specific steps that the computer needs to follow are known the computer cannot solve the problem. That restricts the problem solving capability of conventional computers to problems that we already understand and know how to solve. However, computers would be so much more useful if they could do things that we do not exactly know how to do.

Neural networks process information in a similar way the human brain does. The network is composed of a large number of highly interconnected processing elements (neurons) working in parallel to solve a specific problem. Neural networks learn by example. They cannot be programmed to perform a specific task. The examples selected carefully otherwise useful time is wasted or even worse, the network might be functioning incorrectly. The disadvantage is that because the network finds out how to solve the problem by itself, its operation can be unpredictable.

On the other hand, conventional computers use a cognitive approach to problem solving; the way the problem solved must know and stated in small unambiguous instructions. These instructions are then converted to a high-level language program and then into machine code that the computer can understand. These machines are very predictable; if anything goes wrong is due to a software or hardware fault.

Neural networks and conventional algorithmic computers are not in competition but complement each other. There are tasks are more suited to an algorithmic approach like arithmetic operations and tasks that are more suited to neural networks. Even more, a large number of tasks, require systems that use a combination of the two approaches (normally a conventional computer is used to supervise the neural network) in order to perform at maximum efficiency. These features of Neural Network motivate us for carrying out character recognition using neural network architectures [23].

2.5.5 Training Algorithms for Neural Network

Once a network structured for a particular application, that network is ready to get trained. To start this process, the initial weights chosen randomly. Then, the training, or learning, begins.

There are two approaches to training - supervised and unsupervised. Supervised training involves a mechanism of providing the network with the desired output either by manually "grading" the network's performance or by providing the desired outputs with the inputs. Unsupervised training is training where the network has to make sense of the inputs without outside help. The vast bulk of networks utilize supervised training. Unsupervised training used to perform some initial characterization on inputs. However, in the full-blown sense of being truly self-learning, it is still just a shining promise and not fully understood, does not completely work, and thus it relegated to the lab. 2.5.6 Supervised Training

In supervised training, both the inputs and the outputs provided. The network then processes the inputs and compares its resulting outputs against the desired outputs. Errors then propagated back through the system, causing the system to adjust the weights, which control the network. This process occurs repeatedly as the weights continually tweaked. The set of data, which enables the training, called the "training set." During the training of a network the same set of data processed many times as the connection weights are ever refined.

The current commercial network development packages provide tools to monitor how well an artificial neural network is converging on the ability to predict the right answer. These tools allow the training process to go on for days, stopping only when the system reaches some statistically desired point, or accuracy. However, some networks never learn. This could be because the input data does not contain the specific information from which the desired output derived. Networks also do not converge if there is not enough data to enable complete learning. Ideally, there should be enough data so that part of the data can be hold back as a test. Many layered networks with multiple nodes are capable of memorizing data [13].

To monitor the network to determine if the system is simply memorizing its data in some no significant way, supervised training needs to hold back a set of data to use to test the system after it has undergone its training. If a network simply cannot solve the problem, the designer then has to review the input and outputs, the number of layers, the number of elements per layer, the connections between the layers, the summation, transfer, and training functions, and even the initial weights themselves. Those changes required to create a successful network constitute a process wherein the "art" of neural networking occurs.

Another part of the designer's creativity governs the rules of training. There are many laws (algorithms) used to implement the adaptive feedback required to adjust the weights during training. The most common technique is backward-error propagation, more commonly known as back-propagation. These various learning techniques explored in greater depth later in this report.

Yet, training is not just a technique. It involves a "feel," and conscious analysis, to insure that the network not over trained. Initially, an artificial neural network configures itself with the general statistical trends of the data. Later, it continues to "learn" about other aspects of the data, which may be spurious from a general viewpoint.

When finally, the system correctly trained, and no further learning needed, the weights can, if desired, be "frozen." In some systems, this finalized network then turned into hardware so that it can be fast. Other systems do not lock themselves in but continue to learn while in production use.

2.5.6.1 Back propagation

Back propagation is the generalization of the Widrow-Hoff learning rule to multiple-layer networks and nonlinear differentiable transfer functions. Input vectors and the corresponding target vectors are used to train a network until it can approximate a function, associate input vectors with specific output vectors, or classify input vectors in an appropriate way as defined by you. Networks with biases, a sigmoid layer, and a linear output layer are capable of approximating any function with a finite number of discontinuities. Standard back propagation is a gradient descent algorithm, as is the Widrow-Hoff learning rule, in which the network weights are moved along the negative of the gradient of the performance function. The term back propagation refers to the manner in which the gradient is computed for nonlinear multilayer networks. There are a number of variations on the basic algorithm that are based on other standard optimization techniques, such as conjugate gradient and Newton methods. Neural Network Toolbox implements a number of these variations. This chapter explains how to use each of these routines and discusses the advantages and disadvantages of each. Properly trained back propagation networks tend to give reasonable answers when presented with inputs that they have never seen. Typically, a new input leads to an output similar to the correct output for input vectors used in training that are similar to the new input being presented. This generalization property makes it possible to train a network on a representative set of input/target pairs and get good results without training the network on all possible input/output pairs. There are two features of Neural Network Toolbox that are designed to improve network generalization: regularization and early stopping.

The primary objective of this chapter is to explain how to use the back propagation training functions to train feed forward neural networks to solve specific problems. There are generally four steps in the training process:

1. Assemble the training data.

2. Create the network object.

3. Train the network.

4. Simulate the network response to new inputs.

Training produces a plot that resembles the following:

It is the plot for Mean Square Error(MSE) versus number of epochs.

Epochs

Figure 2.4: Performance graph

2.5.6.2 Multiple Layers of NeuronsA network can have several layers. Each layer has a weight matrix W, a bias vector b, and an output vector a. To distinguish between the weight matrices, output vectors, etc., for each of these layers in the figures, the number of the layer is appended as a superscript to the variable of interest. You can see the use of this layer notation in the three-layer network shown below, and in the equations at the bottom of the figure.

Figure 2.5: Multilayer Neural network

The network shown above has R1 inputs, S1 neurons in the first layer, S2 neurons in the second layer, etc. It is common for different layers to have different numbers of neurons. A constant input 1 is fed to the bias for each neuron.

Note that the outputs of each intermediate layer are the inputs to the following layer. Thus layer 2 can be analyzed as a one-layer network with S1 inputs, S2 neurons, and an S2xS1 weight matrix W2. The input to layer 2 is a1; the output is asx2. Now that all the vectors and matrices of layer 2 have been identified, it can be treated as a single-layer network on its own. This approach can be taken with any layer of the network.

The layers of a multilayer network play different roles. A layer that produces the network output is called an output layer. All other layers are called hidden layers. The three-layer network shown earlier has one output layer (layer 3) and two hidden layers (layer 1 and layer 2). 2.5.6.3 Scale conjugate algorithmA supervised learning algorithm (Scaled Conjugate Gradient, SCG) with super linear convergence rate is used. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural network but requires only O(N) memory usage, where N is the number of weights in the network. The performance of SCG is benchmarked against the performance of the standard back propagation algorithm (BP), the conjugate gradient back propagation (CGB) ,Broyden-Fletcher-Goldfarb-Shanno memory less quasi-Newton algorithm (BFGS).

SCG yields a speed-up of at least an order of magnitude relative to BP. The speed-up depends on the convergence criterion, i.e., the bigger demand for reduction in error the bigger the speed-up. SCG is fully automated including no user dependent parameters and avoids a time consuming line-search, which CGB and BFGS uses in each iteration in order to determine an appropriate step size.

Incorporating problem dependent structural information in the architecture of a neural network often lowers the overall complexity. The smaller the complexity of the neural network relative to the problem domain, the bigger the possibility that the weight space contains long ravines characterized by sharp curvature. While BP is inefficient on these ravine phenomena, it is shown that SCG handles them effectively.

From an optimization point of view learning in a neural network is equivalent to minimizing a global error function, which is a multivariate function that depends on the weights in the network. This perspective gives some advantages in the development of effective learning algorithms because the problem of minimizing a function is well known in other fields of science, such as conventional numerical analysis.

Since learning in realistic neural network applications often involves adjustment of several thousand weights only optimization methods that are applicable to large-scale problems, are relevant as alternative learning algorithms. The general opinion in the numerical analysis community is that only one class of optimization methods exists that are able to handle large-scale problems in an effective way. These methods are often referred to as the Conjugate Gradient Methods. Several conjugate gradient algorithms have recently been introduced as learning algorithms in neural networks.

Both CGB and BFGS raise the calculation complexity per learning iteration considerable, because they have to perform a line-search in order to determine an appropriate step size. A line-search involves several calculations of either the global error function or the derivative to the global error function, both of which raise the complexity. a variation of a conjugate gradient method (Scaled Conjugate Gradient, SCG), which avoids the line-search per learning iteration by using a Levenberg-Marquardt approach in order to scale the step size.

2.5.6.4 Optimization strategyMost of the optimization methods used to minimize functions are based on the same strategy. The minimization is a local iterative process in which an approximation to the function in a neighborhood of the current point in weight space is minimized. The approximation is often given by a first or second order Taylor expansion of the function.

The idea of the strategy is illustrated in the pseudo algorithm presented below, which minimizes the error function E(w).

1. Choose initial weight vector w1 and set k = 1.

2. Determine a search direction pk and a step size k so that E(wk+ kpk)< E(wk).3. Update vector: wk+1 = wk+ kpk.

4. If E'(wk) =!0 then set k = k+1 and go to 2 else return wk+1 as the desired minimum.

Determining the next current point in this iterative process involves two independent steps. First a search direction has to be determined, i.e, in what direction in weight space do we want to go in the search for a new current point. Once the search direction has been found we have to decide how far to go in the specified search direction, i.e, a step size has to be determined.

If the search direction pk is set to the negative gradient E'(w) and the step size k to a constant , then the algorithm becomes the gradient descent algorithm. In the context of neural networks this is the BP algorithm without momentum term. Minimization by gradient descent is based on the linear approximation E (w+y) E (w) + E'(w) Ty, which is the main reason why the algorithm often show poor convergence. Another reason is that the algorithm uses constant step size, which in many cases are inefficient an attempt in an ad hoc fashion to force the algorithm to use second order information from the network. Unfortunately the momentum term is not able to speed up the algorithm considerable, but causes the algorithm to be even less robust, because of the inclusion of another user dependent parameter, the momentum constant.2.5.6.5 The SCG algorithmThis method is only available to batch training. To begin, initialize the weight vector tow0, and letNbe the total number of weights.

1. k=0. Choose scalars0