visualization techniques utilizing the sensitivity...

34
VISUALIZATION VISUALIZATION TECHNIQUES UTILIZING TECHNIQUES UTILIZING THE SENSITIVITY THE SENSITIVITY ANALYSIS OF MODELS ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering, Faculty of Eletrical Engineering, Czech Technical University in Prague, Czech Republic Presenting author: Pavel Kordík ([email protected]) Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering, Faculty of Eletrical Engineering, Czech Technical University in Prague, Czech Republic Presenting author: Pavel Kordík ([email protected])

Upload: others

Post on 15-Jul-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

VISUALIZATION VISUALIZATION

TECHNIQUES UTILIZING TECHNIQUES UTILIZING

THE SENSITIVITY THE SENSITIVITY

ANALYSIS OF MODELSANALYSIS OF MODELS

Ivo Kondapaneni, Pavel Kordík, Pavel SlavíkDepartment of Computer Science and Engineering, Facult y of Eletrical Engineering,

Czech Technical University in Prague, Czech Republic

Presenting author: Pavel Kordík ([email protected])

Ivo Kondapaneni, Pavel Kordík, Pavel SlavíkDepartment of Computer Science and Engineering, Facult y of Eletrical Engineering,

Czech Technical University in Prague, Czech Republic

Presenting author: Pavel Kordík ([email protected])

Page 2: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

2

OverviewOverview

• Motivation• Data mining models• Visualization based on sensitivity

analysis• Regression problems• Classification problems• Definition of interesting plots• Genetic search for 2D and 3D plots

Page 3: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

3

MotivationMotivation

• Data mining – extracting new, potentially useful information from data

• DM Models are automatically generated• Are models always credible?• Are models comprehensible?• How to extract information from

models?Visualization

Page 4: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

4

Data mining modelsData mining models

• Often black-box models generated from data

• E.g. Neural networks

• What is inside? Data mining black box model

Input variables

Output variable (s)

Page 5: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

5

Inductive modelInductive model

• Estimates output from inputs

• Generated automatically• Evolved by niching GA• Grows from minimal

form• Contains hybrid units• Several training

methods

• Ensemble of models

input variables

output variable

first layer

second layer

third layer

output layer

interlayer connection

3 inputsmax

4 inputs max

P C P G

P P C

L

P L C

input variables

output variable

first layer

second layer

third layer

output layer

interlayer connection

3 inputsmax

4 inputs max

P C P G

P P C

L

P L C

Page 6: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

6

Example: Housing dataExample: Housing data

CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA

MEDV

Per capita crime rate by town

Proportion of owner-occupied units built prior to 1940

Weighted distances to five Boston employment centres

Input variables

Output variable

Median value of owner-occupied homes in $1000's

Page 7: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

7

Housing data Housing data –– recordsrecords

CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA

MEDV

Input variables

Output variable

24 0.00632 18 2.31 53.8 6.575 65.2 4.09 1 296 15.3 396.9

4.98

21.6 0.02731 0 7.07 46.9 6.421 78.9 4.9671 2 242 17.8 396.9

9.14

… … …

Page 8: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

8

Housing data Housing data –– inductive inductive

modelmodel

CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA

MEDV

Input variables

Output variable

Niching genetic algorithmevolves units in first layer

sigmoid

MEDV=1/(1-exp(-5.724*CRIM+ 1.126))

sigmoid

MEDV=1/(1-exp(-5.861*AGE+ 2.111))

Error: 0.13 Error: 0.21

Page 9: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

9

Housing data Housing data –– inductive inductive

modelmodel

CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA

MEDV

Input variables

Output variable

Niching genetic algorithmevolves units in second layer

sigmoid sigmoid

Error: 0.13 Error: 0.21

sigmoid

Error: 0.26

linear

Error: 0.24

polynomial

MEDV=0.747*(1/(1-exp(-5.724*CRIM+ 1.126))) +0.582*(1/(1-exp(-5.861*AGE+ 2.111)))2+0.016

Error: 0.10

Page 10: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

10

Housing data Housing data –– inductive inductive

modelmodel

CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA

MEDV

Input variables

Output variable

sigmoid sigmoid sigmoid linear

polynomial

polynomial

linear

exponential

Error: 0.08

Constructed model has very low validation error!

Page 11: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

11

Housing data Housing data –– inductive inductive

modelmodel

CRIM ZN INDUS NOX RM AGE DIS RAD TAX PTRATIO B LSTA

MEDV

Input variables

Output variable

S S S L

P

P L

E

Error: 0.08

MEDV=(exp((0.038* 3.451*(1/(1-exp(-5.724*CRIM+ 1.126)))*(1/(1-exp(2.413*DIS-2.581)))*(1/(1-exp(2.413*DIS-2.581)))+0.429*(1/(1-exp(-5.861*AGE+ 2.111)))+0.024*(1/(1-exp(2.413*DIS-2.581)))+0.036+ 0.038*0.350*(1/(1-exp(-3.613*RAD-0.088)))+ 0.999*( 0.747*(1/(1-exp(-5.724*CRIM+ 1.126)))+0.582*(1/(1-exp(-5.861*AGE+ 2.111)))*(1/(1-exp(-5.861*AGE+ 2.111)))+0.016)-0.046*(1/(1-exp(-5.724*CRIM+ 1.126)))-0.079+ 0.002*INDUS-0.001*LSTA+ 0.150)*0.860)*13.072)-14.874

Math equation is not comprehensible any more – we have to treat it as a black box model !

Page 12: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

12

Visualization based on Visualization based on

sensitivity analysissensitivity analysis

x

ModGMDH min

=

yk

const.

ModGMDH

ym

constant

constant

constant

moving

moving

moving

yk1

x1

x1

x2

x3

x2

x3

x2 max

x1 x3 =

min

const.ym

x2 max

x3 =

max

GAME

GAME

Page 13: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

13

Sensitivity analysis of Sensitivity analysis of

inductive model of MEDVinductive model of MEDV

House no. 189 House no. 164

What will happen with the value of house when criminality in the area decreases/increases?

Credible output?

Page 14: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

14

Ensemble of inductive Ensemble of inductive

modelsmodels

x1

ModGMDH

x2

x3

y

x1

ModGMDH

x2

x3

k

x1

ModGMDH

x2

x3

yk-1

yk+1

•• Random initializationRandom initialization

•• Developing on the same Developing on the same

training settraining set

•• Training affect just well Training affect just well

defined areas of input spacedefined areas of input space

•• Each model Each model -- unique architecture, unique architecture,

similar complexitysimilar complexity

similar transfer functions similar transfer functions

•• Similar behavior for well defined areasSimilar behavior for well defined areas

•• Different behavior Different behavior –– underunder--defined areas defined areas

yk

yk-1

yk+1

i = x2min max

GAME

GAME

GAME

Page 15: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

15

Credibility of models: Credibility of models:

Artificial data setArtificial data set

Credibility: the criterion is a dispersion of models` responses.

Advantages:

• No need of the training data set,

• Modeling method success considered,

• Inputs importance considered.

Page 16: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

16

Example: Models of hot Example: Models of hot

water consumption water consumption

Page 17: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

17

Cold water consumption, Cold water consumption,

increasing humidityincreasing humidity

Page 18: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

18

Models on Housing dataModels on Housing data

• Single model • Ensemble of 10 models

Before After

Page 19: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

19

Classification problemsClassification problems

• Data: Setoza class Virginica class Versicolor class

Petal lengthP

etal

wid

th

• Blue GAME model (Iris Setoza class)

output = 1decision boundary

output = 0

Page 20: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

20

Credibility of classifiersCredibility of classifiers

*

*

** =

=*

* =Ir

is S

etoz

aIr

is V

irgin

ica

Iris

Ver

sico

lor

GAME model 1 GAME model 2 GAME model 3 GAME models (1*2*3)

Page 21: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

21

Overlapping models X ensembleOverlapping models X ensemble

Page 22: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

22

Random behavior filtered outRandom behavior filtered out

Before After

Page 23: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

23

Problem Problem –– how to find how to find

information in ninformation in n--dim space?dim space?

• Multidimensional space of input variables

• What we are looking for?– Interesting relationship of IO variables– Regions of high sensitivity

– Credible models (compromise response)

• Can we automate the search?

Page 24: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

24

When a plot is interesting When a plot is interesting

for us?for us?

xistartxisize

xi

Page 25: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

25

Definition of interesting plotDefinition of interesting plot

• Minimal volume of the envelope p min

• Maximal sensitivity of the output to the change of xi input variable – ysize max

• Maximal size of the area – xisize max

Page 26: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

26

MultiobjectiveMultiobjective optimizationoptimization

• Interestingness:

• Unknown variables:– x1,x2,..., xi-1,xi+1,…xn xistart, xisize

• We will use “Niching” genetic algorithm

Chromosome: x1 x2 ... xi-1 xi+1 … xn xistart xisize

Page 27: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

27

NichingNiching GA on simple dataGA on simple data

GAME ensemble

Training dataChromosome:

xstart xsize

fitness = xsize * 1/p * ysize

fitness

xsize

xstart

Very simple problem

Search space is 2D,can be visualized

Page 28: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

28

Niching GA locates also

local optima

• Three subpopulations (niches) of individuals survived

Page 29: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

29

Automated retrieval of plots Automated retrieval of plots

showing interesting behaviorshowing interesting behavior

Genetic Algorithm

Genetic algorithm with special fitness function is used to adjust all other inputs (dimensions)

Best so far individual found(generation 0 – 17)

Page 30: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

30

Housing data Housing data –– interesting interesting

plot retrievedplot retrieved

Before After

Low fitness High fitness

Page 31: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

31

ConclusionConclusion

• Credible regression

• Credible classification

• Automated retrieval

Page 32: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

32

Future work IFuture work I

• Automated knowledge extraction from data

FAKE INTERFACE

AUTOMATEDDATA MINING

INPUTDATA AUTOMATED

DATAPREPROCESSING

KNOWLEDGEEXTRACTION

andINFORMATION

VISUALIZATIONKNOWLEDGE

GAME ENGINE

Page 33: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

33

Future work IIFuture work II

• FAKE GAME framework

FAKE INTERFACE

MO DELMO DEL

MO DEL MO DEL

MO DEL

MO DEL Math equations

Feature ranking

Interesting behaviour

Credibilityestimation

Classes boundaries,relationship of variables

DATAWAREHOUSING

DATAINTEGRATION

DATACLEANING

INPUTDATA

Classification, Prediction,Identification and Regression

DATACOLLECTION

PROBLEMIDENTIFICATION

DATAINSPECTION

AUTOMATEDDATA

PREPROCESSINGGAME ENGINE

Page 34: VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ...fakegame.sourceforge.net/lib/exe/fetch.php?media=... · Output variable Median value of owner-occupied homes in $1000's. 7 Housing

34

Future work IIIFuture work III

• Just released as open source project– Automated data preprocessing– Automated model building, validation– Optimization methods– Visualization

see and join us:http://www.sourceforge.net/projects/fakegame