limitations of sensitivity analysis for neural networks

Limitations of sensitivity analysis for neuralnetworks in cases with dependent inputs

Maciej A. Mazurowski Przemyslaw M. SzecowkaDepartment of Electrical and Computer Engineering Faculty of Microsystem Electronics and Photonics

University of Louisville Wroclaw University of TechnologyLouisville, KY 40292, USA Wroclaw, Poland

[email protected] [email protected]

Abstract-In this paper the limitations of the sensitivity as compared to [8]. The generalization norm described inanalysis method for feedforward neural networks in the (4) is called the maximum norm. The two following normscases of dependent input variables are discussed. First, it are called the Euclidean norm and the absolute norm ((5)is explained that in such cases there can be many func- .tions implemented by neural networks that will accurately and (6), respectvely).approximate training patterns. Then it is pointed out thatmany of these functions do not allow for proper estimation of Simi maXk [sij] (4)the inputs importance using the sensitivity analysis methodfor neural networks. These two facts are demonstrated tobe the reason why one can not completely rely upon the z

S[k( 2]results of this method, when evaluating a real importance Se. k iji jof inputs. Examples with graphs visualizing the discussed jK (5)phenomena are presented. Finally, general conclusions aboutoverall usefulness of the method are introduced. K l3k

Stbs _ Y:k=l Iij (6)I. INTRODUCTION iKZ K

The sensitivity analysis method for feedforward neural In [8] significance measure of i-th input is defined asnetworks and its applications have been extensively dis-cussed in scientific literature [1], [2], [6], [7], [8], [9], [10] i,ag maxk=,...,K {S,aj7} (7)over the last decade. Its effectiveness and some limitations[5] have been shown. uS3The sensitivity analysis method is used to evaluate ference between these two definitions of norms has multi-

importance of neural network inputs. Sensitivities of each plicative character and does not affect generality of consid-output for each input are calculated as shown below. erations) and is normalized. J.i,avg represents importance

of an input i for an evaluation of outputs. Further in [8],k _ Fi (xi x....,Xn) significance measure is used to prune a neural network bysij - Ox~X( removing inputs.

where sij is the sensitivity of output i to the input j in the II. LIMITATIONS ON VALIDITY OF THE SENSITIVITYpoint xk (n-tuple of input variables) and Fi (xi, Xn) iS ANALYSIS METHODthe function implementated by a neural network for output The authors show in [6] and [7] that the sensitivitynumber i. Values of sij can be calculated on the basis of analysis method for neural networks can be ineffective,parameters of the networks (weights and transfer functions when used to evaluate an inputs importance and thusof neurons). The recursive algorithm for this calculation ineffective for a neural network pruning. Specifically, itfor a network with any finite number of layers has been can happen, when inputs are dependent.described in [7]. It is presented below. More formally, authors claim that neural network prun-

ing heuristic algorithms based on the significance measure

k(L)OFi (xi z... ~xn)k(L

M ki,avg defined in (7) can effect with highly non-optimalOxj . k (smj im ) results (i.e. networks with important inputs removed) in

m=l \(2) the cases with input dependencies.Moreover, the explanation of this phenomenon and more

k(1) _ Yi F, ,k(1) \(1) general conclusions are presented here with focus on theij -0 Xk- i (X )WtUi (3) unwanted impact of input dependency on the results of the

Formla () isvali forL >1 whre Lis te nuber sensitivity method.Formla () isvali forL >1 whre Lis te nuber Assume that output is related to inputs byof layers in the neural network. After partial derivativesare calculated for certain points in an input variables y=Ux:...:n 8space, generalization has to be undertaken to find the x) 8actual sensitivity of an output to an input. Three types of Assume also that one input can be expressed as a valuegeneralization are presented here with slight modification of a function g of remaining inputs.

and they can differ widely for a given patterns.

III. EXPERIMENTAL RESULTS

To illustrate the phenomena described in the previoussection, experimental results are provided. In the four

0.8 ~~~~~~~~~~~~following subsections, there are shown different functionsimplemented by the neural networks as a result of dif-ferent functions g used to introduce input dependency.

04 + RuThecase without input dependency is also analyzed. Inall these cases, the original function f relating outputand inputs was the same. The examples clearly show thatthe consequences of dependency between inputs on the1implemented function and thus on the results of sensitivity

0.6 ~~~~~~~~~~analysis are considerable. Last subsection presents a more

In0complicated case of the system with 7 inputs and more0 0 complex dependencies between inputs in the training set.

fig.ned Graphs of funcions hi (xi, X2) xi (XI, X2) =X1+X2 Results of the sensitivity analysis method are comparedFig. 1., Graphs of functionsh(X2 , )2 with a real importance evaluation.

The function relating output and inputs in the examplesA-D was

Xi=g(Xl,--- i1, Xjil, ..., Xn). (9) f(Xl, 12) =sin(6xi) + sin(6X2) (10)In sch acasethee ismorethanone uncton h After the procedure of creating patterns, output values

deinedoesu heacpase, nthr ismored tha one funchtion h have been normalized to the range [0.15, 0.85] for network

defined-- X)f(X1ove te sac Xn)otlimitedby (9) -,such thatch training. In all examples, except for the first, the additional

satisfy (9). All these functions overlap in the narow area each timeo has been introduced to the training patterns. Thisof the given set of patterns but can be very different ..

beyondit. One of them7isobvo h(xl, X) condition demonstrates functional dependency between in-b (xit..., One1, ofX1, themXis1, obiously---, X01 Xi+l, -- Xn

puts. Input values of patterns were normalized to the range

Simple example can be given. Assume that f(Xl, X2) [0, 1], and a multilayer feedforward neural network withXI±X2 and 12 =9(x) =x. Arguments range is two inputs and one output was trained using the standard

2[0,1] x [0,1]. As said before, one of the functions h can backpropagation algorithm with momentum.

be h(xl,X2 fX Xl)x'+X2 =x However, All the figures below show the function realized by the2 1x However, neural network (grid surface) in the space [0,1] x [0,1] andfunction h can be any function of the form h(xi, 12) the patterns, which have their representation (black points)

ax, + bX2 where a + b = 1. Figure 1 shows three of theseabove the surface (the rest of them are hidden below thefunctions.

Consequences of described fact for neural network train- )ing are significant. A. Inputs independent

Patterns produced by (8) and (9) do not densely cover In the first example, the case without dependenciesall the regions of the input space. When such patterns between inputs is analyzed. Fig. 2 shows the graph of theare used to train a neural network, values of function neural network function and patterns for this case. It meansimplemented by the network are highly unpredictable for that both 11 and 12 values have been generated randomly inthe inputs belonging to these empty regions. It means that the brackets [0,1]. Since the patterns cover densely wholethe neural network can implement any function as long as input variables space, the function implemented by neuralthis function has proper (or close) values for the inputs network is very similar to the original f (X1, 12) in thiscovered by patterns. space.

For example, in the cases of two inputs analyzed in this Table I shows the results of the sensitivity analysisarticle, patterns create a three dimensional curve, as seen in method for three different methods of generalization. ForFigs. 3 - 6. Every surface, containing this curve represents every norm, sensitivities for both inputs are almost thea potential neural network function for these patterns. same. This can be explained by the fact that the function

Obviously, different functions yield different slopes (dif- (10) is symmetrical with respect to the plane described byferent partial derivatives), potentially in the whole domain, the equation I y. Thus, in this case without dependencyThis fact is very important when considering that these between inputs, results of sensitivity analysis are correct.partial derivatives are the only base for the sensitivity3analysis method to evaluate an inputs significance. B. Functional dependency between inputs. 12 =11

The outcome of these considerations is that one can Fig. 3 shows the graph of the function implemented bynot completely rely on sensitivity analysis results, when a network trained using patterns with the second variableevaluating real inputs significance in cases when dependen- dependent on the first. This dependency was 1C2 = JC3.cies between inputs are observed. These results are highly Limitation on the training set is apparent on the graphdependent on unwanted factors, such as training method, (black points create a three dimensional curve). It is clear

generalization, sensitivity to the second input is muchgreater than for the first input. At the same time, it canbe said that both inputs are equally insignificant, becausevalue of the output can be uniquelly determined when only

- - -', Q : ji,, . > one of this inputs is available (as we can always uniquelydetermine another). Sensitivity analysis in this case does

0.6 0 s .* . . * not reflect this fact, thus we can consider its results as0.4 ,, rj4.> >>z>u i.>.a a>a, --- invalid.0.2 0 9TABLE 11

,l , \SENSITIVITIES FOR THE CASE OF DEPENDENT INPUTS (X2 X=)

0.8 1.2 method input 1 input 21.6 - 1 - - - . - - - - /-0.4 max 2.176415 4.670336

1.4 1.<06 x2 euclidean 0.981789 2.196103xi 0~~~.2 1.8

absolute 0.854413 1.993096

Fig. 2. Patterns (black points) and neural network function (grid) for thecase of independent inputs

C. Functional dependency between inputs: 12 sim(2x,)TABLE I In the third analyzed case (Fig. 4), the inputs were

SENSITIVITIES FOR THE CASE OF INDEPENDENT INPUTS dependent such that x2 = 9(x1) = sin(2x1). Note that

method input 1 input 2 for x1 C [0,1] g is not a 1-1 function, which means thatmax 1.32478 1.310584 the value of 12 can be uniquely determined having the

euclidean 0.728889 0.710269 value of x, but it is not always possible to determine aabsolute 0.628355 0.616506 unique value of x1 having the value of 12. That property

of g(x1) differs this case from the previous one and willhelp to evaluate real input significance.

that, even though both pattern sets were created using thesame formula (10), the neural network function from Fig.3 appears quite different than the one from Fig. 2. It isexplained here as a consequence of a special patterns con-figuration, which, as stated earlier, is caused by the inputs 0.7'dependency. In addition, the function which is implemented 0.6 -by the neural network in this case, as the solution forthe approximation problem, is much less complicated thanthe original function (10) (which obviously is still properapproximation in this case) and is probably easier to learn.3musing the backpropagation algorithm. 0.2 .

0.1~ ~ ~ 12 .

1 1 1.8 0

0.8 -- - -- - - /\ </ /XX />X\/. \ . /i0x20.6~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.

Fig. 4. Patterns (black points) and neural network function (grid) for the0.4 ) X )9/\g X\caseof dependent inputs (X2 - sin(2xi))

IgX ' y :;' \ rS \ ' Now, having both f(X,1X2) (10) and g(11) (in the realworld this is usually not the case), real significance of theinputs can be determined. It is obvious that the output can

0 be uniquely determined on the basis of both inputs. It is0.4\ - . - - <~0. 0.2 also known that a second input can be uniquely determined

a 0.2\ 0.6 0 on the basis of the first input and that the first input can- 0.8 X2 not always be uniquely determined on the basis of the

second one. Thus, it can be said that input of the minimalFig. 3. Patterns (black points) and neural network function (grid) for the significance in this case is the second one, since the valuecase of dependent inputs (x2 - x3) of output can be equally well determined, when the value

of the second input is initially unknown (there does notResults of the sensitivity analysis (table II) reflect the exist an input with a lower significance).

difference between functions implemented by the neural Table III shows the values of sensitivities for the neuralnetwork in this and the previous case. For each type of network trained using the described patterns. Sensitivity

analysis method shows that the first input has smaller second input removed. Mean errors over the testing set aresignificance than the second. This obviously contradicts the shown in the table (average over three trials was taken) V.fact, that significance of the first input is minimal whichshows again that sensitivity analysis method gives incorrect TABLE Vresults. The next section will return to this case.

input removed mean errorTABLE III 10131 0.1034

SENSITIVITIES FOR THE CASE OF DEPENDENT INPUTS 2 0.0086(X2 = sin(2xi))

method input 1 input 2

max 0.611417 1.641876 These results clearly indicate that removing the firsteuclidean 0.198803 0.88746 input causes significantly greater decrease of a networkabsolute 0.116956 0.801896 performance, which shows a failure of sensitivity analysis

approach to the neural network pruning in this case.

E. 7-inputs systemD. Functional dependency between inputs: 12 [6] and [7] show the results of the application of sen-sin(4.7xi) + x2(1i + 1) sitivity analysis to the systems with 4 and 6 (dependent)The last case analyzed is when the function g(1l) is inputs. Here additional experiment for 7 input systems with

relatively complex. Here an 12 = sin(4.7x1) +x2(X1 + 1). input dependencies is shown. The data was generated usingFunction g is again not a 1-1 function for x1 C [0, 1]. Graph the following equations:of the function implemented by the neural network can beseen in Fig. 5. y = f(,2, 13,14, 15, 16, 7)

sin (0.5 (xi + 2X2 + 3X3 + 61415 + siTn (6X6X7)))

0.8

15 9(X4) 14 (13)

0.4-

17 9(11,15) X114 + random ([0,2)) (14)

1. ; ,<04It was also normalized to improve neural network train-o _, 7 40.6 ing and sensitivity analysis results. Table VI presents

0204 7Z/0.8 sensitivities of the output to the inputs as well as averagex206 °.85 1 errors (over three trials) of the networks with one input

removed.Fig. 5. Patterns (black points) and neural network function (grid) for the TABLE VIcase of dependent inputs (x2 - Siri(Sxi))

SENSITIVITIES AND AVERAGE ERRORS

The sensitivity analysisis results are presented in the input number sensitivity mean error

table IV. It can be seen that all norms show approximatelly 1 0.128 0.0298two times greater sensitivity for the second input. If pruning 2 0.237 0.0284heuristics based on these values was applied to the neural 3 0.166 0.0177network, first input would be removed. 4 0.198 0.0786

5 0.197 0.0128TABLE IV 6 0.125 0.0374

SENSITIVITIES FOR THE CASE OF DEPENDENT INPUTS 7 0.14 0.0337

(X2 = sin(4.7xi) +x1(xi + 1))

method input 1 input 2

max 1.36 2.408 Figure 6 shows the comparison of normalized sensitivityeuclidean 0.582 0.995 analysis results and a real importance evaluation (it is basedtabsolute 0.488 0.87 on Table VI).

It can be easily seen that the sensitivities for the in-puts differ significantly from their real importance. More

Additional experiment has been performed, showing a detailed analysis shows that the input with the smallestperformance of the neural network after removing each sensitivity and thus the first candidate to be removed isof the inputs. Specifically, two neural networks have been input 1. However removing this input yields relatively bigtrained, one with a first input removed and one with a drop of a network performance. Input with the smallest

0.25 variable dependencies in general and in real-life problems* a 1 can be drawn. Also, there can be inferred some information

0.2-. brabout backpropagation.learning.algorithm.whichkwasausedin the experiments. These problems are indicated as the

=015.n| | | ||Lpossible further research topics.EHZ 01 2 w I | * | | | ACKNOWLEDGMENT

* 0.1.| | | The authors would like to thank Jacek M. Zurada, KatieTodd and Matt Turner for their help in preparation of this

o. .- article.

REFERENCES1 2 3 4 5 6 7 [1] A. P. Engelbrecht, "A new pruning heuristic based on variance

input number analysis of sensitivity information", IEEE Transtactions on Neural

Networks 12 (6), pp. 1386-1399, 2001.

Fig. 6. Comparison of sensitivities and real importance estimations a - [2] A. P. Engelbrecht, "Selective Learning for Multilayer Feedforwardsensitivities, b - mean errors as an input importance estimation Neural Networks", Fundamenta Informaticae 45 (4), pp. 295-328,

2001.[3] A. P. Engelbrecht, I. Cloete, "Incremental Learning using Sensitivity

Analysis", International Joint Conference on Neural Networks, Volumeimportance (i.e. input 5) on the other hand is characterized 2, pp. 1350-1355, 1999.by almost the greatest sensitivity. This can be explained by [4] A. P. Engelbrecht, L. Fletcher, I. Cloete, "Variance Analysis ofSensitivity Information for Pruning Multilayer Feedforward Neuralthe fact that sensitivity analysis is unable to reflect precisely Networks", International Joint Conference on Neural Networks, Vol-inputs dependency described by (13). This experiment ume 3, pp. 1829-1833, 1999.clearly shows an inefficiency of sensitivity analysis. [5] J. J. Montano , A. Palmer, "Numeric sensitivity analysis applied tofeedforward neural networks", Neural Computing & Application 12,

pp. 119125, 2003.IV. IS SENSITIVITY ANALYSIS METHOD STILL WORTH [6] P. M. Szec6wka, A. Szczurek, M.Mazurowski, B.W.Licznerski, "Neu-

USING? ral Network sensitivity analysis approach for gas sensor array opti-misation", Proceedings of the Eleventh International Symposium on

Considerations in this article show that sensitivity anal- Olfaction and Electronic Nose, ISOEN, Barcelona, 2005.ysis results can be invalid for certain cases. The authors [7] P. M. Szec6wka, A. Szczurek, M. A. Mazurowski, B. W. Licznerski,claim however that this does not make the method com-

and F. Pichler, "Neural Network Sensitivity Analysis Applied for theReduction of the Sensor Matrix", Lecture Notes In Computer Science

pletely useless. 3643 pp. 27-32, 2005.The example from the previous section, where 12 [8] J. M. Zurada, A. Malinowski, S. Usui, "Perturbation method for

deleting redundant inputs of perceptron networks, NeurocomputingsiM(2x1) dependency was used, shows relatively small 14", pp. 177-193, 1997.values of sensitivity for the first input. On the other hand [9] J. M. Zurada, A. Malinowski, I. Cloete, "Sensitivity analysis forit is known that removing the first input from the network pruning of training data in feedforward neural networks", Proc. ofFirst

Australian and New Zealand Conference on Intelligent Informationwould not be necessarily the best choice since the second Systems, Perth, Western Australia, December 1-3, pp. 288-292, 1993.input is characterized by minimal possible significance. [10] J. M. Zurada, A. Malinowski, I. Cloete, "Sensitivity analysis forHowever, even if not optimal, it can be still considered minimization of input data dimension for feedforward neural network",

reasonable choice. LowsenstivityorthefrsProc. of IEEE International Symposium on Circuts and Systems,reasonable choice. Low sensitivity for the first input means London, May 28-June 2, pp. 447-450, 1994.that the value of the output does not change significantlywith changes in the first input for this particular neuralnetwork function. That in turn means that some constantvalue of this variable can be used every time to calculatethe output value without significant error. This implies thatthe sensitivity analysis method can be useful (but may benot sufficient) in the process of evaluating least significantinputs. This problem, however needs more precise consid-eration.

V. CONCLUSIONS

This paper has presented the limitations of sensitivityanalysis method and postulated some ideas concerningthe issue of general usefulness of this method. As theresult, the authors state that the sensitivity analysis methodas the method providing precise evaluation of the inputssignificance is limited to the very narrow set of cases,when it is known that the inputs are independent. In manyreal-world problems, information about inputs dependencyis not available. Note that in these cases the sensitivityanalysis method can not be safely used.

All the considerations covered by this paper, however,can have more general influence. Some conclusions about

limitations of sensitivity analysis for neural networks

Documents