bump hunting the objective prim algorithm beam search references: feelders, a.j. (2002). rule...

42
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with the data flood (STT, 65) (pp. 697-700). Den Haag, the Netherlands: STT/Beweton. J.H. Friedman and N.I. Fisher (1999) Bump-hunting in high- dimensional data. Statistics and Computing, 9:123–143. Brief Intro to Undirected Graphical models Overview Regression-based models

Upload: todd-webb

Post on 25-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting

The objectivePRIM algorithmBeam search

References:Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with the data flood (STT, 65) (pp. 697-700). Den Haag, the Netherlands: STT/Beweton.J.H. Friedman and N.I. Fisher (1999) Bump-hunting in high-dimensional data. Statistics and Computing, 9:123–143.

Brief Intro to UndirectedGraphical models

OverviewRegression-based models

Page 2: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - The objective

Find regions in the feature space, where the outcome variable has high average value.

In classification, it means a region of the feature space where the majority of the samples are in one class.

The decision rule looks like an intersection of several conditions (each on one predictor variable)

If condition 1 & condition 2 &…… & condition N, then predict value …

Ex: if 0<x1<1 & 2<x2<5 &…& -1<xn<0, then class 1

Page 3: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - The objective

When the dimension is high, and there is many such boxes, the problem is not easy.

Page 4: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - The objectiveLet’s formalize the problem:Predictors x=( )Target variable y, either continuous or binary

Feature space:

Find subspace such that

Note: when y is binary, this is not mean of y. Rather, it is Pr(y=1 | x R)

Define any box:

Page 5: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Box in continuous feature space:

Bump Hunting - The objective

Page 6: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Box in categorical feature space.

Bump Hunting - The objective

Page 7: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Sequentially find box in subsets of the data.

Bump Hunting - PRIM

Support of a box:

N

iiB BxI

N 1

)(1

Continue search for boxes until not enough support for the new box.

Page 8: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM

“Patient Rule Induction Method”

Two steps:

(1) Patient successive top-down refinement(2) Bottom-up recursive expansion

These are greedy algorithms.

Page 9: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIMPeeling:Begin with box B containing all data (or all remaining data in later steps)Remove sub-box b*, which maximizes in B-b*The candidate box b is defined on a single variable (peeling only in one of the dimensions), and only a small percentile is peeled each time.

Page 10: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM

This is a greedy hill-climb algorithm.Stop the iteration when the support drops to pre-determined threshold.Why called “patient …”? Only remove a small fraction at each step.

Page 11: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM

Pasting:

In peeling, box boundries are determined without knowledge of later peels. Some non-optimal steps can be taken.Final box could be improved by boundary adjustments.

Page 12: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 13: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 14: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 15: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 16: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 17: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 18: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 19: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 20: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 21: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

2/7

Page 22: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 23: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

3/7

Page 24: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM exampleThe winner is:

Page 25: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM exampleThe next peel:

1. And β= 0.4

Page 26: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - PRIM example

Page 27: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - Beam search algorithm

At each step, w best sub-boxes (each on a single variable) are selected.Minimum support requirement.More greedy --- at each step, much more can be peeled than PRIM optimization on one of the variables.

Page 28: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - Beam search algorithm

Page 29: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

W=2

Bump Hunting - Beam search algorithm

Page 30: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - Beam search algorithm

Page 31: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Bump Hunting - About PRIM

It is a greedy search. However, it is “patient”. This is important.

Methods that partition the data much faster, e.g. Beam search and CART, could be less successful.

The “patient” method makes it easier to recover from previous “unfortunate” steps, since we don’t run out of the data too fast.

It doesn’t select off predictors due to high correlation within them.

Page 32: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Undirected Graph Models - Introduction

A network/graph is a set of vertices connected by edges.undirected edges “undirected network”directed edges “directed network”.

Vertex-level characteristic:The number of connections to a vertex : “degree”

Incoming edges “in-degree” ki

Outgoing edges “out-degree” ko

k=ki+ko

ki ko

Evolution of networks. S.N. Dorogovtsev, J.F.F. Mendes

Page 33: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Undirected Graph Models - Introduction

Graphical models – a visual expression of the joint distribution of the entire set of random variables.

Undirected graphical model – also known as “Markov random fields” or “Markov networks”.

Lack of connection in such a network – conditional independence given all other variables.

Sparse graphs – small number of edges – easy to interpret.

Edges – encode the strength of conditional dependency.

Page 34: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Undirected Graph Models - Introduction

Page 35: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Undirected Graph Models - Introduction

Y “separates” X and Z

Pairwise Markov independency

Ex:

Global Markov independency:Subgraphs A, B and C. If every path between A and

B intersects with a node in C C separates A and B.

Ex:

Page 36: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Undirected Graph Models - Introduction

Pairwise Markov independency

Based on Global Markov independency

Clique – a complete (all pairs connected) subgraphMaximal clique – a clique; no other vertices can be added to yield a clique.

Ex: {X, Y}, {Y, Z}, {Z, W} of graph above

Page 37: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Undirected Graph Models - Introduction

A probability density function f over a Markov graph G can be presented:

Either distribution can represent the dependence structure:

Pairwise Markov graphs concerns f(2) above.

Page 38: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Undirected Graph Models – Gaussian Graphical Model

- Observations have a multivariate Gaussian distribution with mean μ and covariance matrix Σ.

- Gaussian distribution represents at most second-order relationships, it automatically encodes a pairwise Markov graph.

- All conditional distributions are also Gaussian. - If the ijth component of Θ = Σ−1 is zero, then variables i

and j are conditionally independent - Y is one variable, Z = (X1,...,Xp−1) is the rest of variables,

then the conditional distribution is

Same as population multiple linear regression of Y on Z

Page 39: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Partition Θ = Σ−1 the same way, then because ΣΘ = I,

Thus the regression coefficient of Y~Z,

- Zero elements in β and hence θZY mean that the corresponding elements of Z are conditionally independent of Y

- We can learn the dependence structure through multiple linear regression.

Undirected Graph Models – Gaussian Graphical Model

Page 40: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Finding parameters when network structure is known.

- Take empirical mean x and ̄� covariance matrix

- The log likelihood of data is

- The quantity −l(Θ) is a convex function of Θ

Undirected Graph Models – Gaussian Graphical Model

Page 41: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Undirected Graph Models – Gaussian Graphical Model

Estimating graph structure:

Meinshausen and Bu hlmann’s regression approach ̄�(2006)

- Fit a lasso regression using each variable as the response and the others as predictors.

- θij is estimated to be nonzero if estimated coefficient of variable i on j is

nonzero,OR (alternatively AND)

estimated coefficient of variable j on i is nonzero

Page 42: Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with

Undirected Graph Models – Gaussian Graphical Model

More formally – graphical lasso- Penalized likelihood