ecml 2001 a framework for learning rules from multi-instance data yann chevaleyre and jean-daniel...

ECML 2001

A Framework forA Framework for

Learning Rules fromLearning Rules from

Multi-Instance DataMulti-Instance Data

Yann Chevaleyre and Jean-Daniel Zucker University of Paris VI – LIP6 - CNRS

ECML 2001

atomicdescription

Motivations

Att/Valrepresentation

Relationalrepresentation

global

descriptio

n- Low expressivity+ Tractable

+ high expressivity- Untractability, unless strong biases

MI RepresentationMI Representation

Most available MI learners use numerical data, and generate nonMost available MI learners use numerical data, and generate noneasily interpretable hypotheseseasily interpretable hypotheses

Our goal: design Our goal: design efficient MI learners efficient MI learners handlinghandling numeric and numeric andsymbolic datasymbolic data, and generating interpretable hypotheses, such as , and generating interpretable hypotheses, such as decision treesdecision trees or or rule setsrule sets

The choice of a good representation is a central issue in ML tasks.The choice of a good representation is a central issue in ML tasks.

ECML 2001Outline

• 1) Multiple-Instance Learninig

– Multiple-instance representation, where are the MI-data,

the MI learning problem

• 2) Extending a propositional algorithm to handle MI data

– Method, extending the Ripper rule learner

• 3) Analysis of the multiple-instance extension of Ripper

– Misleading litterals, unrelevant litterals, litteral selection problem

• 4) Experimentations & Applications

• Conclusion et future work

ECML 2001The Multiple Instance Representation: definition

Standard A/V representation:

Multiple Instance representation:

{0,1}-valued label li

is represented byA/V vector xi

is represented by

A/V vector xi,1

A/V vector xi,2

A/V vector xi,r

{0,1}-valued label li+

examplei +

baginstances

example i

ECML 2001

Many complex objects, such as Many complex objects, such as imagesimages or or moleculesmolecules, can, caneasily be represented with bags of instanceseasily be represented with bags of instances

Atom Type Chargec 1 1.18h 3 -1,2h 2 2.78… … …

Relational databases Relational databases may also be represented this waymay also be represented this way

id name age sex year balance

2 Joe 12 m 1999 932… … … … … …

id name age sex year balance1 Laura 43 f 1997 2341 Laura 43 f 1998 8031 Laura 43 f 1999 1200

More complex representations, such as More complex representations, such as datalog factsdatalog facts, may be , may be MI-propositionalized MI-propositionalized [zucker98], [Alphonse and Rouveirol 99][zucker98], [Alphonse and Rouveirol 99]

id name age sex

1 Laura 43 f2 Joe 12 m3 Marry 24 f… … … …

id year balance1 1997 2341 1998 8031 1999 12002 1999 932

… … …

0,n

1

Where can we find MI data?

ECML 2001

t

s(t)

s(tk) s(tk+) s(tk+2.) ... s(tk1+n.)

s(tj) s(tj+) s(tj+2) ... s(tj+n.)

Representing time series as MI data

By encoding each sub-sequence By encoding each sub-sequence ((s(ts(tkk), ... ,s(t), ... ,s(tk+nk+n)))) as an instance, as an instance,

the the representation becomes invariant by translationrepresentation becomes invariant by translation

tk tj

Windows can be chosen of various size to Windows can be chosen of various size to make the representationmake the representation invariant by rescaling invariant by rescaling

ECML 2001The multiple-instance learning problem

From B+,B- sets of positive(resp. negative) bags, find a

consistent hypothesis H

Their exists a function f, such that :lab(b)=1 iff x b, f (x)

unbiasedunbiased multiple-instance multiple-instanceLearning problemLearning problem

single-tuple single-tuple bias bias

multi-instance learningmulti-instance learning[Dietterich 97][Dietterich 97]

Find a function h covering at leastone instance per positive bag and no instancefrom any negative bag

Note: the domain of Note: the domain of hh is the is theinstance space, instead of theinstance space, instead of thebag spacebag space

ECML 2001Extending a propositional learner

We need to represent the We need to represent the bags of instancesbags of instances as a as asingle set of vectorssingle set of vectors

att1 att21.2 c-33 a

att1 att27.9 a

b1+b1+

b2-b2-

Adding bag-id and label

to each instance

att1 att2 bag-id lab1.2 c 1 +-33 a 1 +

7.9 a 2 -

Measure the Measure the degree of multiple-instance-consistancydegree of multiple-instance-consistancy of the of the hypothesis being refined.hypothesis being refined.

Instead of measuring p(r), n(r), the number of vectors covered by Instead of measuring p(r), n(r), the number of vectors covered by r, compute r, compute p*(r), n*(r), the number of bags for which r covers at p*(r), n*(r), the number of bags for which r covers at least one instance least one instance Single-tuple coverage measureSingle-tuple coverage measure

ECML 2001Extension de l ’algorithme Ripper (Cohen 95)

Ripper (Cohen 95) is a fast and efficient top-down rule learner,Ripper (Cohen 95) is a fast and efficient top-down rule learner,which compares to C4.5 in terms of accuracy, being much fasterwhich compares to C4.5 in terms of accuracy, being much faster• Naive-RipperMi Naive-RipperMi is the MI-extensions of Ripperis the MI-extensions of Ripper

Naive-Ripper-MINaive-Ripper-MI was tested on the was tested on the musk musk (Dietterich 97) tasks. On (Dietterich 97) tasks. On musk1 musk1 (avg of 5,2 instances per bag), it achieved good accuracy.(avg of 5,2 instances per bag), it achieved good accuracy.On On musk2musk2 (avg 65 instances per bag), only 77% of accuracy. (avg 65 instances per bag), only 77% of accuracy.

Learner Accuracy Induced hypothesisIterated Discrimin 92.4 APRDiverse Density 88.9 point in instance spaceRipper-MI 88 Rule set (avg 7 litterals)Tilde 87 1st order decision treeAll positive APR 80.4 APRMulti-Inst 76.7 APR

ECML 2001Empirical Analysis of Naive-RipperMI

Goal: Analyse pathologies linked to the MI problem and to the Goal: Analyse pathologies linked to the MI problem and to the Naive-Naive-Ripper-MIRipper-MI algorithm. algorithm.

5 positive5 positivebags:bags:

• white triangles bag• white squares bag...

• black triangles bag• black squares bag...

5 negative5 negativebags:bags:

Y

X2 4 6 8 10 12

2

4

6

8

Misleading litteralsMisleading litterals

Unrelevant litteralsUnrelevant litterals

Litteral selection problemLitteral selection problem

Analysing the behaviour of NaiveRipperMi on a simple datasetAnalysing the behaviour of NaiveRipperMi on a simple dataset

ECML 2001

Learning task: induce a rules covering Learning task: induce a rules covering at least one instance of each positive bag.of each positive bag.

Target concept : Target concept :

Y

X2 4 6 8 10 12

2

4

6

X > 5X > 5 & X < 9& X < 9 & Y > 3& Y > 3

Analysing Naive-RipperMI

ECML 2001

Y

X2 4 6 8 10 12

2

4

6

11stst step: Naive-RipperMi induces a rule step: Naive-RipperMi induces a rule

X > 11X > 11 & Y < 5& Y < 5

Analysing Naive-RipperMI : misleading litterals

Target concept : Target concept : X > 5X > 5 & & X < 9X < 9 & Y > 3& Y > 3

Misleadinglitterals

ECML 2001

Y

X2 4 6 8 10 12

2

4

6

2nd step: Naive-RipperMi removes the covered bag(s), and2nd step: Naive-RipperMi removes the covered bag(s), andinduces another rule...induces another rule...

Analysing Naive-RipperMI : misleading litterals

ECML 2001Analysing Naive-RipperMI : misleading litterals

Misleading litterals: litterals bringing information gain but contradicting the target concept

Multiple-instance specific phenomenon.

Dispite other single-instance pathologies, (overfitting,Dispite other single-instance pathologies, (overfitting, attribute selection problem), attribute selection problem), increasing the number of examples won’t help

The « Cover-and-differentiate » algorithm reduced the chance ofThe « Cover-and-differentiate » algorithm reduced the chance of finding the target concept finding the target concept

If If ll is a misleading litteral, then is a misleading litteral, then ll is not. is not.

It is thus sufficient, when the litteral It is thus sufficient, when the litteral l l has been induced, tohas been induced, to

examin examin ll at the same time. at the same time.=> => partitioning the instance space

ECML 2001Analysing Naive-RipperMI : misleading litterals

2 4 12

Y

X6 8 10

2

4

6

Build a Build a partition of the instance spaceof the instance space

Extract the best possible rule : X < 11 & Y < 6 & X > 5 & Y > 3Extract the best possible rule : X < 11 & Y < 6 & X > 5 & Y > 3

ECML 2001Analysing Naive-RipperMI : irrelevant litterals

In multiple-instance learnig, In multiple-instance learnig, irrelevant litteralsirrelevant litterals can occur can occur anywhere in the ruleanywhere in the rule, instead of mainly at the end of a rule in the , instead of mainly at the end of a rule in the single-instance casesingle-instance case

Use Use global pruning

Y

X2 4 6 8 10 12

2

4

6

Y < 6 & Y > 3 & X > 5 & X < 9Y < 6 & Y > 3 & X > 5 & X < 9

ECML 2001

X

Y

2 4 6 8 10 12

2

4

6

Analysing Naive-RipperMI : litteral selection problem

When the number of instances per bag increases, any litteral When the number of instances per bag increases, any litteral covers any bag. Thus, covers any bag. Thus, we lack information to select a good we lack information to select a good litteralslitterals

ECML 2001

X

Y

2 4 6 8 10 12

2

4

6

When the number of instances per bag increases, any litteral When the number of instances per bag increases, any litteral covers any bag. Thus, covers any bag. Thus, we lack information to select a good we lack information to select a good litteralslitterals

Analysing Naive-RipperMI : litteral selection problem

ECML 2001Analysing Naive-RipperMI : litteral selection problem

We must We must take into account the number of covered instances

Making an assumption on the distribution of instances canMaking an assumption on the distribution of instances can lead to a lead to a formal coverage measureformal coverage measure

+ widely studied in MI learning [Blum98,Auer97,...]+ simple coverage measure, and good learnability properties- very unrealistic

The single distribution model: The single distribution model: A bag is made of A bag is made of rr instances instances

drawn i.i.d. from a unique distributiondrawn i.i.d. from a unique distribution D

The two distribution model: The two distribution model: A positive (resp. negative) bag A positive (resp. negative) bag

is made of is made of rr instances drawn i.i.d. from instances drawn i.i.d. from D+ (resp.(resp.D- ) with at ) with at least one (resp. none) covered by least one (resp. none) covered by f.f.

+more realistic- complex formal measure useful for small number of instances (log # bags)

Design algorithms or measures which « work well » with these Design algorithms or measures which « work well » with these modelsmodels

ECML 2001Analysing Naive-RipperMI : litteral selection problem

Compute for each positif bag Pr(at least one of the k covered Compute for each positif bag Pr(at least one of the k covered instance instance target concept) target concept)

X

Y

2 4 6 8 10 12

2

4

6

Target concept

Y > 5

1 1k

r

PD f k

ECML 2001

# instances per bag

Err

or r

ate

(%)

Analysis of RipperMi: experiments

Artificial datasets of 100 bags with a variable number of instances per bag.Artificial datasets of 100 bags with a variable number of instances per bag.

Target concept: monomials (hard to learn with 2 instances per bag [Haussler89])Target concept: monomials (hard to learn with 2 instances per bag [Haussler89])

On the mutagenesis problem : On the mutagenesis problem : NaiveRipperMi: 78% NaiveRipperMi: 78% RipperMi-refined-cov: 82%RipperMi-refined-cov: 82%

ECML 2001

Perception

W

IF Color = blue AND size > 53 THEN DOOR

segmentation

Main color Size X/Y ratio YposRed 12 1,5 152Green 56 0,34 11Blue 176 0,2 11

What isall this ?

I seea door

lab = door

Application : Anchoring symbols

[with Bredeche]

Early experiments with NaiveRipperMi reached Early experiments with NaiveRipperMi reached 80% accuracy80% accuracy

ECML 2001Conclusion & Future work

Many problems which existed in relational learning appear Many problems which existed in relational learning appear clearly within the multiple-instance framework.clearly within the multiple-instance framework.

Algorithms presented here are aimed at solving these problems Algorithms presented here are aimed at solving these problems They were tested on artificial datasets.They were tested on artificial datasets.

Other realistic models, leading to better heuristicsOther realistic models, leading to better heuristics

Instance selection and attribute selectionInstance selection and attribute selection

Future work: MI-propositionalization, applying multiple-instance Future work: MI-propositionalization, applying multiple-instance learning to data-mining taskslearning to data-mining tasks

Many ongoing applications ...Many ongoing applications ...

ecml 2001 a framework for learning rules from multi-instance data yann chevaleyre and jean-daniel...

Documents

instance space

mi learning problem

representation invariant

mi data method

av vector x i

good representation

bag space slide

rescaling slide