exposé: an ontology for machine learning...

88
Joaquin Vanschoren, K.U.Leuven (Belgium), U. Leiden (The Netherlands) Larisa Soldatova, University of Aberystwyth (UK) Exposé: An ontology for machine learning experimentation 1 DM Ontology Jamboree 2010

Upload: lamxuyen

Post on 27-Aug-2018

259 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Joaquin Vanschoren, K.U.Leuven (Belgium), U. Leiden (The Netherlands)Larisa Soldatova, University of Aberystwyth (UK)

Exposé: An ontology for machine learning experimentation

1DM Ontology Jamboree 2010

Page 2: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Joaquin Vanschoren, K.U.Leuven (Belgium), U. Leiden (The Netherlands)Larisa Soldatova, University of Aberystwyth (UK)

Exposé: An ontology for machine learning experimentation

1DM Ontology Jamboree 2010

Page 3: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

OverviewOntology lessonsExposé ontology

Use cases

Page 4: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Ontology lessonsWhat did we learn from other ontologies

Page 5: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Ontology design

• Start from accepted classes & properties (top-level ontologies, e.g. OBI, RO)

• If possible, reuse prior ontologies to build on their knowledge/consensus

• Use ontology design patterns: reusable patterns for recurrent problems

• http://ontologydesignpatterns.org

• Check clarity, consistency, extensibility, minimal commitment

Page 6: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Ontology recap:OntoDM (Panov et al., ’09,’10)

• Aim: unified framework for DM research, builds on BFO

DM algorithm

task

achieves

component

has part

classification, pattern mining,...

kernel, distance function,...

dataset

data types,feature types,...

has input

generalization

has output

model, pattern, clustering,...

constraints

realizes

algo impl algo appl

plan processplan specification

Page 7: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Ontology recap:OntoDM (Panov et al., ’09,’10)

• Aim: unified framework for DM research, builds on BFO

DM algorithm

task

achieves

component

has part

classification, pattern mining,...

kernel, distance function,...

dataset

data types,feature types,...

has input

generalization

has output

model, pattern, clustering,...

constraints

realizes

algo impl algo appl

plan processplan specification

Page 8: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

classification algorithm

dataset

specifiesinput type

model

specifies output type

DM object

optimizationproblem

optimizationstrategy

has opt. problem

has optimizationstrategy

model complexitycontrol strategy

has model complexitycontrol strategy

algorithmassumption

assumes

constraint

has constraint

inductioncost function loss function

regularizationparameter

p=?has obj. funct.

operator

implements

modelstructure

p=?model

parameter

+-

decisionboundary

p=?

hyperparameter

has hyperparameter

Ontology recap:DMOP (Hilario et al., ’09)

• Model internal structure of learning algorithms

Page 9: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

classification algorithm

dataset

specifiesinput type

model

specifies output type

DM object

optimizationproblem

optimizationstrategy

has opt. problem

has optimizationstrategy

model complexitycontrol strategy

has model complexitycontrol strategy

algorithmassumption

assumes

constraint

has constraint

inductioncost function loss function

regularizationparameter

p=?has obj. funct.

operator

implements

modelstructure

p=?model

parameter

+-

decisionboundary

p=?

hyperparameter

has hyperparameter

Ontology recap:DMOP (Hilario et al., ’09)

• Model internal structure of learning algorithms

Page 10: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Ontology recap:DMWF (Kietz et al., ’09)

• Reason about KD operators: in/outputs, conditions/effects (SWRL rules)

data

Thing

operator

model evaluation

modelingdata tableprocessing

modelprocessing

writerreader

model report

IO object

meta-data

attribute_value_tablecategorical,...

producesuses

Page 11: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Ontology recap:DMWF (Kietz et al., ’09)

• Reason about KD operators: in/outputs, conditions/effects (SWRL rules)

data

Thing

operator

model evaluation

modelingdata tableprocessing

modelprocessing

writerreader

model report

IO object

meta-data

attribute_value_tablecategorical,...

producesuses

Page 12: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Ontology recap:DMWF (Kietz et al., ’09)

• Reason about KD operators: in/outputs, conditions/effects (SWRL rules)

data

Thing

operator

model evaluation

modelingdata tableprocessing

modelprocessing

writerreader

model report

IO object

meta-data

attribute_value_tablecategorical,...

producesuses

”RapidMiner.ID3”:Superclass:

ClassificationLearning and (uses exactly 1 AttributeValueDataTable) and (produces exactly 1 Model) and (simpleParameter1(name=”minimal size for split”) exactly 1 integer) and (simpleParameter2(name=”minimal leaf size”) exactly 1 integer) ...

Condition:(AttributeValueDataTable and MissingValueFreeData and(inputAttribute only (hasAttributeType only Categorial)) and(targetAttribute exactly 1 (hasAttributeType only Categorial)) )(?D), noOfRecords(?D,?Size), ?P1 is ?Size / 100 → uses(this,?D), simpleParameter2(this,?P1)

Effect:uses(this,?D), hasFormat(?D,?F), inputAttribute(?D,?IA),targetAttribute(?D,?TA), → new(?M,?D), DecisionTree(?M), produces(this,?M), hasFormat(?M,?F), inputAttribute(?M,?IA),predictedAttribute(?M,?TA),

Page 13: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

data

Thing

operator

model evaluation

modelingdata tableprocessing

modelprocessing

writerreader

model report

IO object

meta-data

attribute_value_tablecategorical,...

producesuses

Ontology recap:DMWF (Kietz et al., ’09)

• Reason about KD operators: in/outputs, conditions/effects (SWRL rules)

Page 14: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

data

Thing

operator

model evaluation

modelingdata tableprocessing

modelprocessing

writerreader

model report

IO object

meta-data

attribute_value_tablecategorical,...

producesuses

Ontology recap:DMWF (Kietz et al., ’09)

• Reason about KD operators: in/outputs, conditions/effects (SWRL rules)

”RapidMiner.ID3”:Superclass:

ClassificationLearning and (uses exactly 1 AttributeValueDataTable) and (produces exactly 1 Model) and (simpleParameter1(name=”minimal size for split”) exactly 1 integer) and (simpleParameter2(name=”minimal leaf size”) exactly 1 integer) ...

Condition:(AttributeValueDataTable and MissingValueFreeData and(inputAttribute only (hasAttributeType only Categorial)) and(targetAttribute exactly 1 (hasAttributeType only Categorial)) )(?D), noOfRecords(?D,?Size), ?P1 is ?Size / 100 → uses(this,?D), simpleParameter2(this,?P1)

Effect:uses(this,?D), hasFormat(?D,?F), inputAttribute(?D,?IA),targetAttribute(?D,?TA), → new(?M,?D), DecisionTree(?M), produces(this,?M), hasFormat(?M,?F), inputAttribute(?M,?IA),predictedAttribute(?M,?TA),

Page 15: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Ontology recap:EXPO (Soldatova and King, ’06)

• Make goal and structure of scientific experiments more explicit

experiment

experimentgoal

has part

experimentdesign

has part

compare,confirm hypothesis,explain,...

experimentmodel

has part

experimentdesign strategy

has part

experimentalvariable

has part

(un)controlled,(in)dependent,...

factorial,orthogonal,...

admin info

has part

author,biblio_reference,...

experiment hypothesis

has part

experimentalresult

has part

Page 16: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Ontology recap:EXPO (Soldatova and King, ’06)

• Make goal and structure of scientific experiments more explicit

experiment

experimentgoal

has part

experimentdesign

has part

compare,confirm hypothesis,explain,...

experimentmodel

has part

experimentdesign strategy

has part

experimentalvariable

has part

(un)controlled,(in)dependent,...

factorial,orthogonal,...

admin info

has part

author,biblio_reference,...

experiment hypothesis

has part

experimentalresult

has part

Page 17: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposéan ontology for data mining experimentation

Page 18: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Context

• Giant, public database(s) of data mining experiments

• We need:

• Common language to share experiments (through DM tools)

• Intuitive ways to store and query experimental results

• We want:

• Interoperable ontology: OntoDM for top-level, DMOP for detailed properties of learning algorithms

• Driven by actual experiments submitted to database

• New algorithms -> ideally, described by author

• Instances automatically extracted from database

Page 19: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Problem 1: ExperimentsWhat is a machine learning experiment?

What do we need to know about it?

Page 20: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposé: Experiments

KDworkflow

experimentworkflow

hp: has participanthd: has description

Workflow:has inputs, outputs,operators (participants)

Page 21: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposé: Experiments

KDworkflow

experimentworkflow

composite experiment

experimentaldesign

experimentalvariable

singular experiment

machine

EXPO

hp: has participant hp hp

is exec.on

hd: has description

hp

Workflow:has inputs, outputs,operators (participants)

Page 22: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposé: Experiments

KDworkflow

experimentworkflow

composite experiment

experimentaldesign

experimentalvariable

singular experiment

learnerevaluation

machine

model evaluation

result

EXPO

evaluation

model evaluationfunction

learning algorithm model

prediction resultdataset

performanceestimation

hp: has participant

hp

hp

has output

hp

haspart

hp hp

is exec.on

hd: has description

hdhp

hd

has inputhas output

has output

has inputWorkflow:has inputs, outputs,operators (participants)

Page 23: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Problem 2: AlgorithmsWhen talking about an algorithm, what is meant?

General algorithm?Specific implementation? Which version?When run, which parameters, components?

Page 24: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposé: AlgorithmsSpecification, implementation, application

operator

algo appl

algo impl

hp: has participanthd: has description

p=?

param impl

algo quality

algorithmspecif.

function appl

has parthas quality

ico

hp

hp

p=?

param settinghp

hdname, version, url,...

ico: is concretization of Similar to OntoDM

Page 25: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

algo implalgo appl

operator

algorithmspecif.

hp ico

ico = is concretization ofp=? p=? p=?

param setting param impl parameter

ico

has part has partfunction appl.

hp = has participant

hp

hp

hp

functionfunction impl.

hp ico

Same for functions and parameters

Page 26: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Problem 3: Algorithm composition

plug-in functions, kernels, other algorithms

such components play different roles-> Agent-role pattern

Page 27: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposé: Algorithms

operator

algo appl

algo impl

hp: has participanthd: has description

p=?

param impl

algo quality

algorithmspecif.

function appl

has parthas quality

ico

hp

hp

p=?

param settinghp

hdname, version, url,...

Page 28: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposé: Algorithms

operator

algo appl

algo impl

hp: has participanthd: has description

p=?

param impl

algo quality

algorithmspecif.

rolealgorithm

component role

algorithmrole

functionrole

baselearner

search

data preprocessor

kernel

distance functions

function appl

has parthas quality

ico

hp

hprealizes

p=?

param settinghp

hdname, version, url,...

Page 29: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposé: Algorithms

operator

algo appl

algo impl

hp: has participanthd: has description

p=?

param impl

algo quality

learningalgorithm

algorithmspecif.

kernelizedalgorithm

rolealgorithm

component role

algorithmrole

functionrole

baselearner

search

data preprocessor

kernel

distance functions

function appl

has parthas quality

ico

hp

hp

has part

realizes

p=?

param settinghp

hdname, version, url,...

Page 30: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Problem 4: WorkflowsInputs, outputs, operators

Hierarchical: workflows within workflowsReuse, parameterize common workflows, e.g. k-fold CV

Page 31: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposé: workflows

20

data processingapplication

data processingapplication

data processingapplication

dataset

dataset

dataset

dataset

data processing workflow

learnerapplication

performance estimationapplication

model evaluation function applicationmodel

learner evaluation

model evaluation

result

train testevaluation

d1

workflow

op1 d2 op2 d2

has output

has output

has input

has participanthas participant

has input has output

has input

Similar to RapidMiner?

Page 32: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Problem 5: ReuseHow can we make maximal use of existing ontologies?

OBI: top-levelOntoDM: top-level DM concepts

DMO: operators, learning mechanisms

Page 33: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

BFO: accepted top-level classes

dataset

thing

quality realizable entity

material entity

plannedprocess

digitalentity

informationcontent entity

operator implemen-tation

data quality

-algorithm

quality

role

machineworkflow

model

hp

Page 34: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

BFO: accepted top-level classes

dataset

thing

quality realizable entity

material entity

plannedprocess

digitalentity

informationcontent entity

operator implemen-tation

data quality

-algorithm

quality

role

machineworkflow

model

hp

BFO

Page 35: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

OntoDM: top-level DM concepts

objective

dataset

algo implalgo appl

thing

quality realizable entity

material entity

plannedprocess

digitalentity

informationcontent entity

operator implemen-tation

algorithmspecif.

data quality

-algorithm

quality

role

machine

hp ico

model

dataset spec

model spec

ico

has descr.

ico = is concretization of

function appl.

hp = has participant

hp

functionfunction impl.

hp ico

BFO

OntoDM

Page 36: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

DMO: operators, learning mechanisms

objective

dataset

algo implalgo appl

thing

quality realizable entity

material entity

plannedprocess

digitalentity

informationcontent entity

operator implemen-tation

algorithmspecif.

data quality

-algorithm

quality

role

machineKD

workflow

hp ico

model

dataset spec

model spec

ico

has descr.

ico = is concretization of

function appl.

hp = has participant

hp

hp

functionfunction impl.

hp icop=?

+-

BFO

OntoDM

DMOP

Page 37: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposé: top level classes

25

objective

dataset

algo implalgo appl

thing

quality realizable entity

material entity

plannedprocess

digitalentity

informationcontent entity

operator implemen-tation

algorithmspecif.

data quality

-algorithm

quality

role

machineKD

workflow

hp ico

model

dataset spec

model spec

ico

has descr.

ico = is concretization ofp=? p=? p=?

param setting param impl parameter

ico

has part has partfunction appl.

hp = has participant

hp

hp

hp

hpexecuted

on

functionfunction impl.

hp icop=?

+-

BFO

OntoDM

DMOP

Page 38: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Other aspects

Page 39: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

KDworkflow

data processingworkflow

hp

dataset

data processing appl

has input

has output

Datasets

Page 40: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

data role

realizes

dataset

attribute-value table

time series

item sequence

relational database

graph

role data mining data role

bootstrap bag

test set

training set

optimization setis concretization of

data specification

has quality

data featuredata instance

has part has part

target feature

numerictarget feature

class feature

quality data property

dataset property

feature property

instance property labeled

labeling unlabeled

has quality has quality

has part

qualitative feature

property

quantitative feature property

feature datatype

numeric datatype

nominal value set

feature entropy

feature kurtosis

quantitative dataset property

statistical dataset property

information-theoretic dataset property landmarker

simple dataset property

# features# instances

# missing values

target skewnessfrac1

identifier

data item

name

url

data repository

version

part of

has description

sequence

set of instances set of tuples

...

...

KDworkflow

data processingworkflow

hp

data processing appl

has input

has output

Datasets

Page 41: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

learnerevaluation

evaluationfunction appl.

evaluationfunction impl.

evaluationfunction

hphp

ico

Evaluation

Page 42: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

learnerevaluation

association evaluation measure

binary prediction evaluation function

confusion matrix

multi-class predictionevaluation measure

computational evaluation measure

build cpu time

build memory consumption

clustering evaluation measure

probabilistic distribution evaluation measure

predictive model evaluation measure class prediction

evaluation measure

graphical evaluation measure

numeric prediction evaluation measure

probabilistic model distance measure

AUROC

derived measure

f- measure

precisionrecall

specificity

supportconfidence

lift

leverage

conviction

frequency

density-based clustering measure

distance-based clustering measure

integrated squared error

inter-cluster similarity

intra-cluster variance

integrated average squared error

probability distribution scoring function

distribution likelihood

distribution log-likelihood

class RMSEpredictive accuracy

averaged binary prediction measure

kappa statistic

cost curve

lift chart

precision-recall curve

ROC_curve

correlation coefficient

probability error-based measure

error-based evaluation measure

RMSE MADMAPERRSE

RSS

information criterion

AIC BIC

Kullback-Leibner divergence

likelihood ratio

versionname

has description

single point AUROC

AUPRC

PRgraph point

has part

has part

has participant

has participant

evaluationfunction appl.

evaluationfunction impl.

evaluationfunction

hphp

ico

Evaluation

Page 43: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Experiment context

composite experiment

singular experiment

hp

Page 44: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

experimentaldesign

experimental variable

has participant

has participant

experiment conclusion

has part

experimentalgoal

has part

experimentalhypothesis

has part

author name

has description

bibliographic reference

has description

experiment id

experimentproperty

quality

experimentcontext

experimentadmin info

experimentexecution status

has description

experimenterror

experimentpriority

active learning

exploration design

factorial design

one factor at a time

orthogonal design

latin hypercube

monte carlo design

random sampling design

latin square design

full factorial designfractional

factorial design

Tagushi matrix

Planckett-Burman design

controlled experimental

variable

uncontrolled experimental

variable

dependent experimental

variable

task

has description

Experiment context

composite experiment

singular experiment

hp

Page 45: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Exposé: final notes

• In total 860 classes, 32 properties (from RO + DMOP)

• Individuals: all algorithms, preprocessors, evaluation from WEKA

• actually stored in experiment database

• should be programmatically added (and updated)

• Written in OWL-DL, using Protégé 4.0

• Can be browsed at:

• http://expdb.cs.kuleuven.be/expdb/expose.owl

• http://www.e-lico.eu/OWLBrowser2/manage/

Page 46: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Use Cases

Page 47: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

32

Page 48: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

!new algorithm

32

Page 49: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

!

datasets

32

Page 50: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

!preprocessing

workflows

32

Page 51: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

!

evaluationprocedures

32

Page 52: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

!

32

Page 53: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

!

32

Page 54: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

!

32

Page 55: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

!

32

Page 56: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

!

32

• A lot of work, limits depth

• Results cannot be reused by others (have to be repeated)

• Hard to repeat experiments from descriptions in papers!

Page 57: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Goal: Collaborative experimentationNow: small-scale, not repeatable, not reusable

!

32

• A lot of work, limits depth

• Results cannot be reused by others (have to be repeated)

• Hard to repeat experiments from descriptions in papers!

The journal system is perhaps the most open system

for the transmission of knowledge that could be built ...

with 17th century media. Nielsen (APS Physics 2008)

Page 58: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Data mining as an e-scienceOntologies: experiments shared, run automatically

!

33

Page 59: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Data mining as an e-scienceOntologies: experiments shared, run automatically

• Share experiments• Internet = large, collaborative workspace

33

Page 60: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Data mining as an e-scienceOntologies: experiments shared, run automatically

• Store them in experiment databases• Ensure reproducibility• Reuse millions of prior experiments• Use all info on algorithms, datasets• Results universally accessible and useful

• Share experiments• Internet = large, collaborative workspace

33

Page 61: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

e-SciencesAstrophysics: Virtual Observatories

34

Page 62: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

e-Sciences Bio-informatics: Micro-array Databases

35

Page 63: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

e-Sciences Bio-informatics: Micro-array Databases

35

Page 64: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Collaborative ExperimentationWhy?

36

Page 65: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Collaborative ExperimentationWhy?

ReproducibilityGood science

36

Page 66: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Collaborative ExperimentationWhy?

ReproducibilityGood science

Quick, easy analysisQuerying: Answer questions

Test hypotheses

36

Page 67: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Collaborative ExperimentationWhy?

ReproducibilityGood science

Quick, easy analysisQuerying: Answer questions

Test hypotheses

Reuse Save time & energy

(e.g. benchmarking)

36

Page 68: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Collaborative ExperimentationWhy?

ReproducibilityGood science

Quick, easy analysisQuerying: Answer questions

Test hypotheses

Reuse Save time & energy

(e.g. benchmarking)

Generalizability:Plug into prior results: larger studies

36

Page 69: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Collaborative ExperimentationWhy?

ReproducibilityGood science

Quick, easy analysisQuerying: Answer questions

Test hypotheses

Reuse Save time & energy

(e.g. benchmarking)

Generalizability:Plug into prior results: larger studies

IntegrationData mining tools

import/export

36

Page 70: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Collaborative ExperimentationWhy?

ReproducibilityGood science

Quick, easy analysisQuerying: Answer questions

Test hypotheses

Reuse Save time & energy

(e.g. benchmarking)

Generalizability:Plug into prior results: larger studies

IntegrationData mining tools

import/export

Reference‘Map’ of known approachesCompare to state-of-the-art

Includes negative results36

Page 71: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Use Case 1Describe experiments in a common language

-> sharing or running experiments on grid

Page 72: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Use Exposé to define common language: ExpML

38

learnerevaluation

Page 73: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Use Exposé to define common language: ExpML

38

learnerevaluation

in

Page 74: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Use Exposé to define common language: ExpML

38

learnerevaluation

in out

Page 75: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Use Exposé to define common language: ExpML

38

learnerevaluation

in out

has participant

performance estimation

evaluation function

Page 76: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Use Exposé to define common language: ExpML

38

learnerevaluation

in out

has participant

has participant

appl

performance estimation

evaluation function

Page 77: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Use Exposé to define common language: ExpML

38

learnerevaluation

in out

has participant

has participant

has participant

impl

parametersetting

operator(component)

has participant appl

p=?

performance estimation

evaluation function

Page 78: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

ExpML: a markup language for DM experiments

• Share DM experiments, XML-based

39

learnerevaluation

impl

param.sett.

appl

operator

perform. estim. appl.

eval. function appl.

model evaluation

dataset

appl

appl

p=?

Page 79: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

ExpML: a markup language for DM experiments

• Share DM experiments, XML-based

39

learnerevaluation

impl

param.sett.

appl

operator

perform. estim. appl.

eval. function appl.

model evaluation

dataset

<expml><dataset id=‘d1’><learner evaluation id=‘e1’ input_data=‘d1’>

<learner_appl><learner_impl name=... version=...><parameter_setting name=‘P’ value=‘100’/><learner_appl role= ‘base-learner’>

...</learner_appl><performance_estimation_appl>...<model_evaluation_function_appl>...

</learner_evaluation><model_evaluation_result output_of=‘e1’>

<evaluation name=‘accuracy’ value= ‘0.99’>...

appl

appl

p=?

Page 80: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

ExpML: a markup language for DM experiments

• Share DM experiments, XML-based

39

learnerevaluation

impl

param.sett.

appl

operator

perform. estim. appl.

eval. function appl.

model evaluation

dataset

<expml><dataset id=‘d1’><learner evaluation id=‘e1’ input_data=‘d1’>

<learner_appl><learner_impl name=... version=...><parameter_setting name=‘P’ value=‘100’/><learner_appl role= ‘base-learner’>

...</learner_appl><performance_estimation_appl>...<model_evaluation_function_appl>...

</learner_evaluation><model_evaluation_result output_of=‘e1’>

<evaluation name=‘accuracy’ value= ‘0.99’>...

appl

appl

p=?

ontology XML

has-part,has-participant XML subelement

(with role attribute)

has-description (required) attribute

has-quality `property’ subelement

is-concretization-of implementation_of attr.

part-of attributes

has-specific-input input_data attribute

has-specified-output output_of attribute

Page 81: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Use Case 2Collect experiments in a database

to query all empirical results

Page 82: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

ExpDB: a database to share experiments

41

learnerevaluation

in out

has participant

has participant

has participant

impl

p=?

parameterimplementation

parametersetting

operator(component)

has participant appl

p=?

Page 83: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Experiment Database

>650,000 experiments, 54 algorithms, >87 datasets, 45 evaluation measures, 2 data processors, bias-variance analysis

42

learnerevaluation impl

parametersetting

componentsetting

applspec

algorithm properties

p=?

parameterimplementation

appl impl spec

in

outp=?

Page 84: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

laid

eid

experiment

evaid

did

sid

data_type

description

default

min

max

suggested_rangeliid

laid

learner_application

is_default lid

liid

learner_implementation

name

lpid

liid

learner_property_value

value

caid

laid

learner_component

role

lpid

laid

learner_parameter_setting

value

kiid

caid

kernel_application

is_default

fiid

caid

function_application

is_defaultfid

fiid

function_implementation

name

name

kid

kernel

function

description

name

fid

function

function

description

kid

kiid

kernel_implementation

name

version, url, path, library

version, url, path, library

caid name

lid

learner

function

description

version, url, path, library

fiid

lpid

learner_parameter

kiid

liid

name

alias

name

lpid

learner_property

description

formula

min

max

unit

Experiment Database

>650,000 experiments, 54 algorithms, >87 datasets, 45 evaluation measures, 2 data processors, bias-variance analysis

in

out

Page 85: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Use Case 3Intuitive querying

Page 86: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

Query Interface (YouTube “experiment database”)http://expdb.cs.kuleuven.be

44

Page 87: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

The way ahead

• 3rd generation of tools could make data mining into e-science

• Experiments shared, reused, run worldwide

• Repeatable, generalizable, reusable

• Cooperation on a standardized ontology for data mining?

• Automatic ontology extraction: DM paper -> ontology extension

• RDF experiment databases?

• Open problems:

• Queriable models, auto-population (active meta-learning), quality control

45

Page 88: Exposé: An ontology for machine learning experimentationkt.ijs.si/janez_kranjc/dmo_jamboree/Expose.pdf · Exposé: An ontology for machine learning experimentation 1 ... train test

http://expdb.cs.kuleuven.be

Thanks

Gracias

Xie XieDanke

Dank U

Merci

Efharisto

Dhanyavaad

GrazieSpasiba

Obrigado

Tesekkurler

Diolch

KöszönömArigato

Hvala

Toda

46