interactive evolution in automated knowledge discovery tomáš Řehořek march 2011

34
Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Upload: jonathan-york

Post on 17-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Interactive Evolutionin Automated Knowledge Discovery

Tomáš ŘehořekMarch 2011

Page 2: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Knowledge Discovery Automation

• Our goal:– Given input dataset, automatically construct

KF and offer output knowledge that the user is satisfied with

– Create such a system is a big deal!

AutomatedKnowledge Discovery

Page 3: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Knowledge Discovery Automation

• What is Knowledge Discovery?– Transformation of input data to human-

interpretable knowledge– Oriented graph of actions (Knowledge Flow)

is a suitable approach

Page 4: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Knowledge Discovery Ontology

• Ontology (definition)– Formal representation of a domain– Specification of entities, their properties and

relations– Provides a vocabulary, which can be used to

model a domain• E.g.: dataset, model, testing sample, scatter plot,

confusion matrix, association rule…

Page 5: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Knowledge Discovery Ontology

• Ontology design problems in KD:– Which KFs are reasonable?– How should the output report look like?– May the metadata be helpful?– Are the some categories of users with similar

interests?

• Two ideas concerning Ontology:– Deductive approach– Inductive approach

Page 6: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Knowledge Discovery Ontology

• Deductive approach:– Ontology is given– Based on the Ontology, and the given

dataset, try to construct appropriate KF

Page 7: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Knowledge Discovery Ontology• Deductive approach:

Taken from: M. Žáková, P. Křemen, F. Železný, Nada Lavrač: Automating Knowledge Discovery Workflow Composition Through Ontology-Based Planning (2010)

Page 8: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Knowledge Discovery Ontology• Inductive approach:

– No prior assumptions about the Ontology– Learn the Ontology based on a database of

KFs designed by experts

Meta-Knowledge Discovery

DiscoveredKD Ontology

Page 9: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Our Approach:

Revolutionary Reporting

• There may be thousands of useful KFs– Different datasets may require different

actions– Different users may require different

knowledge

• Maybe, users form clusters:– „DM Scientist“ – may experiment with different

algorithms on a given dataset– „Business Manager“ – may appreciate

beer-and-diapers rule

Page 10: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

• Let’s design a system capable of learning what do users like!– Adopt Interactive Evolutionary Computation– Collect feedback to evaluate fitness

• of a given KF,• for a given user,• on a given dataset,

– Store the feedback, along with the metadata, to a database

– As the DB grows, offer intelligent KF mutation based on the experience

Our Approach:

Revolutionary Reporting

Page 11: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

• Interactive Evolutionary Computation (IEC)– Also known as „Aesthetic Selection“– Evolutionary Computation using Human

evaluation as fitness function

• Inspiration: http://picbreeder.org

Our Approach:

Revolutionary Reporting

Page 12: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

PicBreeder

Jimmy Secretan

Kenneth Stanley

Interactive Evolution

by

Page 13: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Next

generation

and so on

Page 14: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

And after 75 generations ...

... you eventually get something interesting

Page 15: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

The technology hidden behind

x

z

grayscale

x

z

Neural net draws the image

Page 16: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Neuroevolution

grayscale

By clicking, you increase fitness of nets

Next generations inherit fit building patterns

x

z

Page 17: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Gallery of discovered images

Page 18: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Collaborative evolution

You start your evolution,

where others finished …

… and when discover

something interesting …

… you store it to database.

Page 19: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

System core

ExperienceDatabase

Feedback

User

Our Approach:

Revolutionary Reporting

Page 20: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

First Experiments: Data Projection

• Transform input Dataset to 2D

• Similar to PCA, Sammon projection etc.

: nf 2

Examples inn-Dimensional

space

2D

Page 21: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Experiment Setup

User

Web Client

AJAXGoogle API

Tomcat Server

Feedback Collection GUI

RapidMiner 5

jabsorbJSON-RPC(via HTTP)

MySQL

Genetic Algorithm

CurrentPopulation

Feedback

Page 22: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Data Projection Experiments

• Linear transformation– Evolve coefficient matrix

– Do the transformation using formula:

… resulting a point in 2D-space

1 2 n

1 2 n

, , ,

, , ,

a a a

b b b

f a x b xxn n

i i i ii=1 i=1,

Page 23: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

[ Demonstration ]

Page 24: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Data Projection Experiments• Sigmoidal transformation

– Evolve coefficient matrix

– Do the transformation using formula:

a a a b b b c c c

a a a b b b c c c1,1 1,2 1,n 1,1 1,2 1,n 1,1 1,2 1,n

2,1 2,2 2,n 2,1 2,2 2,n 2,1 2,2 2,n

, , , , , , , , , , ,

, , , , , , , , , , ,

+ +

b x c b

a af x

1,i i 1,i 2,i i 2,i

n n1,i 2,i

x ci=1 i=11 e 1 e,

a

b

c

Page 25: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Interactive Evolution: Issues• Fitness function is too costly:

– GA requires a lot of evaluations– User may get annoyed, bored, tired…

• Heuristic approach needed to speed up the evolution!– „Hard-wired“ estimation of projection quality

• E.g. Clustering homogenity, separability,intra-cluster variability…

• Puts a limitation on what „quality“ means!

– Modeling user’s preferences…?

Page 26: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Surrogate Model

• Optimization approach in areas where evaluation is too expensive

• Builds an approximation model of the fitness function

• Given training dataset of so-far-known candidate solutions and their fitness…

• …predicts fitness of newly generated candidates

Page 27: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Surrogate Model

1. Collect fitness of an initial sample

2. Construct Surrogate Model

3. Search the Surrogate Model• Surrogate Model is cheap to evaluate• Genetic Algorithm may be employed

4. Collect fitness at new locations foundin step 3.

5. If solution is not good enough, go to 2.

Page 28: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Evaluating Fitness

• In order to construct fitness-prediction models, training dataset must be delivered

• Information about fitness provided by the user is indirect– In scope of single population, good projection

is sure better than bad one– However, better is a relative term– Is good projection in generation #2 better than

bad projection in generation #10…?

Page 29: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Interconnecting generations• In each generation, population may be

divided to up to 3 categories:– bad, neutral, good

• Let’s copy the best projection to the next-epoch population– So-called elitism in Evolutionary Computation– In scope of new population, the elite will again

fall in one of these 3 categories– This gives us information about

cross-generation progress!

Page 30: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Generation #1

Absolutizing Fitness

Page 31: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Generation #2

Equivalence relation

Partial order relation

Equivalence classesAbsolutizing Fitness

Page 32: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Generation #3

Page 33: Interactive Evolution in Automated Knowledge Discovery Tomáš Řehořek March 2011

Fitness Prediction KF in RM

Training dataset

Current population

Normalization

Learning (3NN)

Fitnessprediction