chapter 7 – classification and regression treesdm.snu.ac.kr/static/docs/dm2015/chap9_...

Download Chapter 7 – Classification and Regression Treesdm.snu.ac.kr/static/docs/dm2015/Chap9_ 의사결정나무... ·  · 2015-11-10Logistic Linear Regression, ... 51.0 14.0 non-owner

If you can't read please download the document

Upload: lelien

Post on 22-May-2018

232 views

Category:

Documents


9 download

TRANSCRIPT

  • Decision Tree

  • Plant

    FDC

    +

  • Data(x, y)

    Plant

    FDC

    +

  • Data(x, y)

    Plant

    FDC

    +

    FDC

    ; y

  • Data(x, y)

    Plant

    FDC

    +

    FDC

    ; y

    Data

  • y

    ,

  • yy = ?()

    y

    ,

  • yy = ?()

    y New Data

    yes

    No

    ,

  • y 0/1, /, /

    y ?

  • Prediction = Classification

    (y) (x) (f) , {(x,y)} y = f ( x ) f

    , , ,

    : y ? => 6 => => => =>

  • Prediction Y y ? , x ??? X predictive variable

    X Y . Y , 100%

    Y , .

  • Predictive Analytics

    A new fancy name of Supervised data mining, or Regression and Classification

    Find a mapping/function f such that y = f(x) given data set D = {(x,y)}

    Regression when y is continuous Classification when y is categorical/binary

  • Predictive Analytics

    Regression Multiple Linear Regression k-NN Decision Tree Regression Neural Networks

    Classification Logistic Linear Regression, Discriminant Analysis k-NN, Nave Bayese Decision Tree Classifier Neural Networks SVM

  • Decision Tree

  • 15

  • tree?

    ; : :

  • ()

  • Trees and RulesGoal: Classify or predict an outcome based on a set of

    predictorsThe output is a set of rulesExample: Goal: classify a record as will accept credit card

    offer or will not accept Rule might be IF (Income > 92.5) AND (Education

    < 1.5) AND (Family

  • : .

    :

  • , xi xi, si, ()

    = xi si

    , 2

  • :

    : 24

    = ,

  • Income Lot_Size Ownership60.0 18.4 owner85.5 16.8 owner64.8 21.6 owner61.5 20.8 owner87.0 23.6 owner110.1 19.2 owner108.0 17.6 owner82.8 22.4 owner69.0 20.0 owner93.0 20.8 owner51.0 22.0 owner81.0 20.0 owner75.0 19.6 non-owner52.8 20.8 non-owner64.8 17.2 non-owner43.2 20.4 non-owner84.0 17.6 non-owner49.2 17.6 non-owner59.4 16.0 non-owner66.0 18.4 non-owner47.4 16.4 non-owner33.0 18.8 non-owner51.0 14.0 non-owner63.0 14.8 non-owner

  • ,

    E.g. 14.4 (14.0 14.8 )

    lotsize > 14.4 lotsize < 14.4

    , 15.4 (14.8 16.0 )

  • Note:

    E.g., A, B, C 3{A} and {B, C}{B} and {A, C}{C} and {A, B}

    , XLMiner

  • : Lot Size = 19,000

  • : Income = $84,000

  • X . ? : X = {B, B, B, B, W, W, W, W} ? Y = {B, B, W, W, W, W, W, W} ? Z = {W, W, W, W, W, W, W, W} ? Z= {B, B, B, B, B, B, B, B} ? X= {B, W} ?

    ?

  • M A

    p = k A

    I(A) = 0 (= 0.50, )

    Note: XLMiner uses a variant called delta splitting rule

    I(A) = 1 -

  • p = k A(m)

    0() log2(m) ()

  • ().

    , .

    .

    .

  • ()

    (, , )

    .

    .E.g., If lot size < 19, and if income > 84.75, then class = owner

  • , .

    .

    =0.5.

    = 0.75: 75% 1 1 .

  • 100% .

    , .

    .

    , .

  • CHAID

    CART CHAID.

    .

  • CART, .

    .

    .

    , . .

  • CC(T) = Err(T) = = ()

    .

    .

    CC(T) = Err(T) + L(T)

  • .

    :.

    /.

  • UP Sell

  • 30 4,887

    295,123 ?

  • Upselling

    HOW? IF THEN rule

  • 11 & 787 (Platinum 93.1%)

    48 & 10 & 151 (Platinum 92.7%)

    7 & 24 & 11 & 90 (Platinum 93.3%)

  • Sales bidding

  • - Spec ,

    -

    56

    2008

    ( L )

  • 57

  • ,

  • CT

    (CT)

    .

    RMSE ().

  • .

    .

    .

    , .

    2 3 4 5 6 7 8 Prediction = Classification PredictionPredictive AnalyticsPredictive Analytics Decision Tree 15 16 17 tree ? ( )Trees and Rules : 25 Note: : Lot Size = 19,000 : Income = $84,000 CHAID UP Sell 50 51 52 53Sales bidding 57 CT