chapter 7 – classification and regression treesdm.snu.ac.kr/static/docs/dm2015/chap9_...
TRANSCRIPT
-
Decision Tree
-
Plant
FDC
+
-
Data(x, y)
Plant
FDC
+
-
Data(x, y)
Plant
FDC
+
FDC
; y
-
Data(x, y)
Plant
FDC
+
FDC
; y
Data
-
y
,
-
yy = ?()
y
,
-
yy = ?()
y New Data
yes
No
,
-
y 0/1, /, /
y ?
-
Prediction = Classification
(y) (x) (f) , {(x,y)} y = f ( x ) f
, , ,
: y ? => 6 => => => =>
-
Prediction Y y ? , x ??? X predictive variable
X Y . Y , 100%
Y , .
-
Predictive Analytics
A new fancy name of Supervised data mining, or Regression and Classification
Find a mapping/function f such that y = f(x) given data set D = {(x,y)}
Regression when y is continuous Classification when y is categorical/binary
-
Predictive Analytics
Regression Multiple Linear Regression k-NN Decision Tree Regression Neural Networks
Classification Logistic Linear Regression, Discriminant Analysis k-NN, Nave Bayese Decision Tree Classifier Neural Networks SVM
-
Decision Tree
-
15
-
tree?
; : :
-
()
-
Trees and RulesGoal: Classify or predict an outcome based on a set of
predictorsThe output is a set of rulesExample: Goal: classify a record as will accept credit card
offer or will not accept Rule might be IF (Income > 92.5) AND (Education
< 1.5) AND (Family
-
: .
:
-
, xi xi, si, ()
= xi si
, 2
-
:
: 24
= ,
-
Income Lot_Size Ownership60.0 18.4 owner85.5 16.8 owner64.8 21.6 owner61.5 20.8 owner87.0 23.6 owner110.1 19.2 owner108.0 17.6 owner82.8 22.4 owner69.0 20.0 owner93.0 20.8 owner51.0 22.0 owner81.0 20.0 owner75.0 19.6 non-owner52.8 20.8 non-owner64.8 17.2 non-owner43.2 20.4 non-owner84.0 17.6 non-owner49.2 17.6 non-owner59.4 16.0 non-owner66.0 18.4 non-owner47.4 16.4 non-owner33.0 18.8 non-owner51.0 14.0 non-owner63.0 14.8 non-owner
-
,
E.g. 14.4 (14.0 14.8 )
lotsize > 14.4 lotsize < 14.4
, 15.4 (14.8 16.0 )
-
Note:
E.g., A, B, C 3{A} and {B, C}{B} and {A, C}{C} and {A, B}
, XLMiner
-
: Lot Size = 19,000
-
: Income = $84,000
-
X . ? : X = {B, B, B, B, W, W, W, W} ? Y = {B, B, W, W, W, W, W, W} ? Z = {W, W, W, W, W, W, W, W} ? Z= {B, B, B, B, B, B, B, B} ? X= {B, W} ?
?
-
M A
p = k A
I(A) = 0 (= 0.50, )
Note: XLMiner uses a variant called delta splitting rule
I(A) = 1 -
-
p = k A(m)
0() log2(m) ()
-
().
, .
.
.
-
()
(, , )
.
.E.g., If lot size < 19, and if income > 84.75, then class = owner
-
, .
.
=0.5.
= 0.75: 75% 1 1 .
-
100% .
, .
.
, .
-
CHAID
CART CHAID.
.
-
CART, .
.
.
, . .
-
CC(T) = Err(T) = = ()
.
.
CC(T) = Err(T) + L(T)
-
.
:.
/.
-
UP Sell
-
30 4,887
295,123 ?
-
Upselling
HOW? IF THEN rule
-
11 & 787 (Platinum 93.1%)
48 & 10 & 151 (Platinum 92.7%)
7 & 24 & 11 & 90 (Platinum 93.3%)
-
Sales bidding
-
- Spec ,
-
56
2008
( L )
-
57
-
,
-
CT
(CT)
.
RMSE ().
-
.
.
.
, .
2 3 4 5 6 7 8 Prediction = Classification PredictionPredictive AnalyticsPredictive Analytics Decision Tree 15 16 17 tree ? ( )Trees and Rules : 25 Note: : Lot Size = 19,000 : Income = $84,000 CHAID UP Sell 50 51 52 53Sales bidding 57 CT