introduction to machine learning fall 2013 decision trees koby crammer department of ee technion...

Post on 29-Mar-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Machine LearningFall 2013

Decision Trees

Koby CrammerDepartment of EE

Technion

Most figures courtesy of Ben Taskar z”l

Course outline

Supervised

Unsupervised

supervised

Parameter Estimation

DecisionTreeRegression

Bayesian Reasoning Classification Boosting

NearestNeighbor

Theory

Regularization

Linear

Mainly Generative Models

Mainly Discriminative Models

Material

Section 9.5.2 Section 9.2

Outline

• Example and inference (8.1)• Tree learning (8.2)• Impurity (8.3)• Issues (8.4)• Regression (8.5)

Example and inference (8.1)

example

Example Regression (HTF, 2001)

Building decision trees (8.2)

• Input to algorithm• Output: tree

• Q: can we fit a tree to any sample?

• Goals: – accuracy– size (simplicity, generalization)

1,

nk k kx d

Approach

• Top-down– Start from the root

• Greedy / myopic search– One node at a time

• Main question:– Given a tree, how to grow it– In other words, choose a feature and a criteria

example

Intuition

A2 B2A1 B1

Feature a

{8,12}

{8,0} {0,12}

Feature b

{8,12}

{0,0} {8,12}

Intuition II

E3C2 D2C1 D1

Feature c

{8,12}

{4,6} {4,6}

Feature d

{8,12}

{2,3} {6,9}

E2E1

Feature e

{8,12}

{2,3} {3,5} {3,4}

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

Stage 1

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

Stage 2

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

mpg cylinders displacement horsepower weight acceleration modelyear makerBad 8 350 150 4699 14.5 74 AmericaBad 8 400 170 4746 12 71 AmericaBad 8 400 175 4385 12 72 AmericaBad 6 250 72 3158 19.5 75 AmericaBad 8 304 150 3892 12.5 72 AmericaBad 8 350 145 4440 14 75 AmericaBad 6 250 105 3897 18.5 75 AmericaBad 6 163 133 3410 15.8 78 AsiaBad 8 260 110 4060 19 77 AmericaBad 8 305 130 3840 15.4 79 AmericaBad 6 250 110 3520 16.4 77 AmericaBad 6 258 95 3193 17.8 76 AmericaBad 4 121 112 2933 14.5 72 AsiaBad 6 225 105 3613 16.5 74 AmericaBad 4 121 112 2868 15.5 73 AsiaBad 6 225 95 3264 16 75 AmericaBad 6 200 85 2990 18.2 79 AmericaOK 4 121 98 2945 14.5 75 AsiaOK 6 232 90 3085 17.6 76 AmericaOK 4 120 97 2506 14.5 72 EuropeOK 4 151 85 2855 17.6 78 AmericaOK 4 116 75 2158 15.5 73 AsiaOK 4 119 97 2545 17 75 EuropeOK 6 146 120 2930 13.8 81 EuropeOK 4 116 81 2220 16.9 76 AsiaOK 4 156 92 2620 14.4 81 AmericaOK 4 140 88 2870 18.1 80 AmericaOK 4 97 60 1834 19 71 AsiaOK 4 134 95 2560 14.2 78 EuropeOK 4 97 75 2171 16 75 EuropeOK 4 97 78 1940 14.5 77 AsiaOK 4 98 83 2219 16.5 74 AsiaGood 4 79 70 2074 19.5 71 AsiaGood 4 91 68 1970 17.6 82 EuropeGood 4 89 71 1925 14 79 AsiaGood 4 83 61 2003 19 74 EuropeGood 4 112 88 2395 18 82 AmericaGood 4 81 60 1760 16.1 81 EuropeGood 4 135 84 2370 13 82 AmericaGood 4 105 63 2125 14.7 82 AmericaBad 4 135 84 2370 13 82 AmericaBad 4 105 63 2125 14.7 82 America

Impurity (8.3)

• Given a set (training set or subset of it)

• Denote empirical distribution of labels

• Goal: measure the impurity of the distribution

1,

Nk k k

S x y 1{ , , }k Ky c c

1

1ˆ { }j

N

k jk

p I y cN

1ˆ ˆ ˆ( , , )Kp p p

Impurity functions

• Bayes-optimal error• Gini index• Entropy

• Properties:– For point-distribution– For uniform distribution

ˆ ˆ ˆ( ) (1 )j jjQ p p p

{1, , }ˆ ˆ( ) 1 max j N jQ p p

2 21

ˆ ˆ ˆ ˆ ˆ( ) ( ) log log ( )ˆj j jj jj

Q p H p p p pp

ˆ( ) 0Q p

ˆ( ) ismaximalQ p

ˆˆ( )) (pQ Q QSp S

illustration

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

p1

Q(p

)

misclassificationGinientropy0.5*entropy

Information of a split

• Pick a node, with a set S of size N• Compute the impurity of the set Q(S)• Pick a criteria A• split the set S into M subsets• The average impurity of these sets is

• Reduction of impurity (or increase of purity)

{ : 1,2, , }mS m M

1

| |( | ) ( )

Mm

mm

SQ S A Q S

N

( | ) ( ) ( | )Q S A Q S Q S A

Algorithm

• Pick the test A which maximizes

• Q: how many values to consider?

• Lemma:

• ( see code below )

( | )Q S A

0 |Q S A Q S

Algorithm

• Initialize: single leaf (what label?)• Iterate:

– Go over all leafs– Go over all features d– Go over all splitting values N– Pick (leaf, feature, splitting value) that reduces most

impurity– Replace leaf with:

• new node• two new leafs (their label?)

Issues (8.4)

• number of splits• Missing features• Prevent over-fitting

– Early stopping– pruning

• Optimality vs greediness (Rivest et al, 76)

Example: xor

• Function:• Tree with single node?• Tree with two nodes

21xxsigny label input

1 (1,1)

1 (-1,-1)

-1 (-1,1)

-1 (1,-1)

X1>0

+1

X2>0X2>0

-1

-11

+1

yes

yes yesNo

no

no

Regression (8.5)

• Value of leaf– Replace a single label with

majority of outputs

• Impurity of a leaf– Replace discrete functions above with variance

{( , )}Ni i iS x y y

1( ) i

i

yN

y S

2( ) ( )1

)( ii

yN

Q S y S

top related