principal sensitivity analysis
TRANSCRIPT
Principal Sensitivity Analysis�
Sotetsu Koyamada (Presenter), Masanori Koyama, Ken Nakae, Shin Ishii Graduate School of Informatics, Kyoto University
@PAKDD2015
May 20, 2015
Ho Chi Minh City, Viet Nam�
Table of contents�
��
2�
3�
4�
Sensitivity analysis and PSA �
Results�
Conclusion�
1� Motivation�
Prediction and Recognition tasks at high accuracy�
Machine learning is awesome�
��
Horikawa et al., 2014�
Taigman et al., 2014�
Machines can carry out the tasks beyond human capability
Deep Learning matches human in the accuracy of face Recognition tasks�
Predicting the dream contents from Brain activities�You can’t do this unless you are Psychic! � �
How can the machines carry out the tasks beyond our capability?�
��
How can we learn the machine’s “secret” knowledge?�
In the process of training, machines must have learned the knowledge not in our natural scope
Machine is a black box�
��
Neural Networks, Nonlinear kernel SVM, …�
Input� Classification result�
?�
Visualizing the knowledge of Linear Model�The knowledge of the linear classifiers like Logistic Regression are expressible in terms of weight parameters w = (w1,…, wd)
�
Classifier�
Input x = (x1,…, xd)
Classification labels: {0, 1} wi : weight parameter b: bias parameter σ : sigmoid activation function �
Meaning of wi =� Importance of i-th input dimension within machine’s knowledge �
It is extremely difficult to make sense out of weight parameters in Neural Networks (nonlinear composition of logistic regressions)
Visualizing the knowledge of non Linear Model �
�
Our proposal�
We shall directly analyze the behavior of f in the input space! �
Meaning of wij(k)
= ??????? �
h: nonlinear activation function�
Table of contents�
��
2�
3�
4�
Sensitivity analysis and PSA �
Results�
Conclusion�
1� Motivation�
Sensitivity analysis�
Sensitivity analysis compute the sensitivity of f with respect to i-th input dimension
��
Def. Sensitivity with respect to i-th input dimension�
Note�
Def. Sensitivity map�
Zurada et al., 1994, 97, Kjems et al., 2002�
q: true distribution of x�
In the case of linear model (e.g. logistic regression)�
c.f.�Sensitivity with respect to i-th input dimension �
PSM: Principal Sensitivity Map�Define the directional sensitivity in the arbitrary direction and seek the the direction to which the machine is most sensitive
���
Def. (1st) Principal Sensitivity Map�
���
Def. Directional sensitivity in the direction v �
Recall�
ei: standard basis of �
PSA: Principal Sensitivity Analysis�Define the kernel metric K as:
���
Def. (1st) Principal Sensitivity Map (PSM)�
1st PSM is the dominant eigen vector of K! �
When K is covariance matrix, 1st PSM is same as the 1st PC �
Recall� PSA vs PCA�
PSA: Principal Sensitivity Analysis�Define the kernel metric K as:
���
Def. (1st) Principal Sensitivity Map (PSM)�
1st PSM is the dominant eigen vector of K! �
k-th PSM is the k-th dominant eigen vector of K!
Def. (k-th) Principal Sensitivity Map (PSM)�
When K is covariance matrix, 1st PSM is same as the 1st PC �
Recall�
k-th PSM := analogue of k-th PC�
PSA vs PCA�
Table of contents�
���
2�
3�
4�
Sensitivity analysis and PSA �
Numerical Experiments�
Conclusion�
1� Motivation�
Digit classification�
! Artificial Data�Each pixel have the same meaning�
! Classifier
– Neural Network (one hidden layer) – Error percentage: 0.36% – We applied the PSA to the log of each output from NN�
���
(b) Noisy samples�
(a) Templates�
c = 0, …, 9 �
c = 0, ….9 �
Strength of PSA (relatively signed map)�
���
(b) 1st PSMs (proposed)�
(a) (Conventional) sensitivity maps�
visualize the values of
Strength of PSA (relatively signed map)�
��
(Conventional) sensitivity map cannot distinguish the set of the edges whose presence characterizes the class 1 and The set of the edges whose absence characterizes the class 1 On the other hand, 1st PSM (proposed) can!
(b) 1st PSMs (proposed)�
(a) (Conventional) sensitivity maps�
visualize the values of
Strength of the PSA (sub PSM) �PSMs of f9 (c = 9)�
What is the meaning of sub PSMs
��
(Same as the previous slide)�
Strength of the PSA (sub PSM) �PSMs (c = 9)�
���
By definition, globally important knowledge�
Perhaps locally important knowledge?�
Local Sensitivity �
���
Def. Local sensitivity in the region A�
sA(v) := EA
!
"
∂fc(x)
∂v
#2$
Expectation over the region A�
Local Sensitivity �
���
Measure of the contribution of k-th PSM in the classification of class c in the subset A�
Def. Local sensitivity in the region A�
sk
A := sA(vk)k-th PSM�
sA(v) := EA
!
"
∂fc(x)
∂v
#2$
Def. Local sensitivity in the direction of k-th PSM�
Local Sensitivity �
���
Measure of the contribution of k-th PSM in the classification of class c in the subset A�
Def. Local sensitivity in the region A�
c = 9, k = 1, A = A(9,4) := set of all the samples of the classes 9 and 4 SA
k is The contribution of 1st PSM in the classification of 9 in the data containing class 9 and 4 = The contribution of 1st PSM in distinguishing 9 from 4. �
sk
A := sA(vk)k-th PSM�
sA(v) := EA
!
"
∂fc(x)
∂v
#2$
Def. Local sensitivity in the direction of k-th PSM�
Example�
Strength of the PSA (sub PSM) �
Let’s look at what the knowledge of f9 is doing in distinguishing the pairs of classes (class 9 vs the other class)
�
���
Local sensitivity of k-th PSM of f9 on the subdata containing class 9 and class c’ ( = A(9, c’) )�
c = 9 �
Strength of the PSA (sub PSM) �
Let’s look at what the knowledge of f9 is doing in distinguishing the pairs of classes (class 9 vs the other class)
�
���
Local sensitivity of k-th PSM of f9 on the subdata containing class 9 and class c’�
c = 9 �Example: c = 9, c’ = 4, k = 1�
Recall This indicates the contribution of 1st PSM in distinguishing 9 from 4�
Strength of the PSA (sub PSM) �
Let’s look at what the knowledge of f9 is doing in distinguishing the pairs of classes (class 9 vs the other class
�
���
Local sensitivity of k-th PSM of f9 on the subdata containing class 9 and class C’ � 3rd PSM contributes MUCH
more than the 1st PSM in the classification of 9 against 4!!! �c = 9 �
In fact….! �
���
PSM (c = 9, k = 3)�
We can visually confirm that the 3rd PSM of f9 is indeed the knowledge of the machine that helps (MUCH!) in distinguishing 9 from 4! �
9�4�
When PSMs are difficult to interpret �
• PSMs of NN trained from MNIST data in classifying 10 digits • Each pixel have different meaning • In order to applying PSA, Data should be registered
��
Table of contents�
��
2�
3�
4�
Sensitivity analysis and PSA �
Numerical Experiments�
Conclusion�
1� Motivation�
Conclusion�
���
Can identify the sets of input dimensions that acts oppositely in characterizing the classes�
Sub PSMs provide additional information of the machines (possibly local)�
PSA is different from the original sensitivity analysis in that it identifies the weighted combination of the input dimensions that are essential in the machine’s knowledge�
2 �
1 �
Made possible with the definition of the PSMs that allows negative elements�
Merits of PSA�
Sotetsu Koyamada [email protected]�Thank you�