logistic regression notes

Upload: ytamar-visbal-perez

Post on 02-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Logistic Regression Notes

    1/50

    A COMPARISON OF MULTIPLEREGRESSION, LOGISTIC REGRESSION AND

    DISCRIMINATION FUNCTION INCLASSIFICATION OF OBSERVATIONS

    by: Dr. Yap Bee Wah

    UNIVERSITI TEKNOLOGI MARAFaculty of Information Technology & Quantitative Sciences

    PETALWID

    3.02.52.01.51.0.50.0

    7

    6

    5

    4

    3

    2

    1

    0

    TYPE

    iris virginica

    iris versicolor

    iris setosa

    Kolokium Statistik 24 Julai 2004 Th 5, FTMSK

    -6 -4 -2 0 2 4 6 8

    0.0

    0.1

    0.2

    0.3

    0.4

  • 7/27/2019 Logistic Regression Notes

    2/50

    KOLOKIUM STATISTIK 2004,FTMSK

    2

    OVERVIEW OF PRESENTATIONIntroduction

    Multiple Regression

    Logistic Regression

    Discriminant Function

    Methodology (Model Building and Evaluation Process)

    Results

    Conclusion

  • 7/27/2019 Logistic Regression Notes

    3/50

    KOLOKIUM STATISTIK 2004,FTMSK

    3

    Introduction:

    Two(2) pioneer studies Efron (1975) studied the relative efficiency of logistic regression and normal

    discrimination analysis.

    He found that typically, logistic regression is between one half and two thirdsas effective as normal discrimination.(Efron, B (1975). The Efficiency of Logistic Regression Compared to NormalDiscrimination Function Analysis. Journal of the American Statistical Association,December 1975, Volume 70, Number 352, Theory and Methods Section)

    Press and Wilson (1978) compared logistic regression and parametricdiscriminant analysis and conclude that logistic regression is preferable toparametric discriminant analysis in cases for which the variables do not havemultivariate normal distributions.

    However, for normal distributions, logistic regression is less efficient thanparametric discriminant analysis.(Press, S. J. & Wilson, S. (1978). Choosing between logistic regression anddiscriminant analysis. Journal of the American Statistical Association, 73, 699-705)

  • 7/27/2019 Logistic Regression Notes

    4/50

    KOLOKIUM STATISTIK 2004,FTMSK

    4

    Introduction to Multiple Linear

    Regression Multiple Linear Regression is a useful statistical

    modeling technique for describing the relationship

    between a response (dependent) variable with oneor several predictor variables

    When the response variable is dichotomous (2

    categories) or polytomous (more than 2

    categories), logistic regression or discriminantanalysis is frequently used to model the

    relationship.

  • 7/27/2019 Logistic Regression Notes

    5/50

    KOLOKIUM STATISTIK 2004,FTMSK

    5

    Multiple Regression Model

    Considerk predictor variables, themultiple regression model is stated asfollow:

    iikkiii XXXY ...22110

    ),(~

    2

    0 Ni

    Regression coefficients

    Must be a

    quantitative

    variable

  • 7/27/2019 Logistic Regression Notes

    6/50

    KOLOKIUM STATISTIK 2004,FTMSK

    6

    Regression: Research Application Example

    IS Faculty Research Productivity: Influential Factors

    and Implications

    by :Qing Hu & T. Grandon Gill(Florida Atlantic University)

    (Information Resources Management Journal, Vol 13, No 2,

    2000)

    Research

    Productivity

    (Annual rate of

    publication)

    Number of years in IS faculty

    Percentages of time allocated for

    teaching

    Percentages of time allocated for research

    Percentages of time allocated for academic services

    Type of degree

  • 7/27/2019 Logistic Regression Notes

    7/50

    KOLOKIUM STATISTIK 2004,FTMSK

    7

    Introduction to Logistic regression

    Allows estimating the probability of anevent happening

    Useful for modeling data with adichotomous dependent variable (Y) (eg:survive/die; purchase/do not purchase;pass/fail etc)

    Allows a mixture ofquantitative andqualitativepredictor variables (X).

  • 7/27/2019 Logistic Regression Notes

    8/50

    KOLOKIUM STATISTIK 2004,FTMSK

    8

    Application examplesDependent variable Independent variables

    otherwise0

    survive1Y

    genderX

    X

    X

    ageX

    :

    leveldosage:

    illnessoflength:

    :

    4

    3

    2

    1

    otherwise0

    billscardcreditsettle1Y

    gender:

    childrenofnumber:

    income:

    cardscreditofnumber:

    :

    5

    4

    3

    2

    1

    X

    X

    X

    X

    ageX

  • 7/27/2019 Logistic Regression Notes

    9/50

    KOLOKIUM STATISTIK 2004,FTMSK

    9

    Logit model,otherwise known as the

    logistic regression model For k explanatory variables and i =1,2,,n

    the model is

    where

    ikkii

    i

    i xxxp

    p

    ...log 221101

    )1( ii YPpReferred to as the logit

    or log-odds

  • 7/27/2019 Logistic Regression Notes

    10/50

    KOLOKIUM STATISTIK 2004,FTMSK

    10

    We can solve the logit equation to obtain:

    In mathematicalexpression, this

    formula is called the

    logistic function and

    can be written as:

    )]...(exp[1

    1)1Pr(

    22110 kkXXXY

    z-e1

    1f(z)

    kk110 X...Xzwhere

  • 7/27/2019 Logistic Regression Notes

    11/50

    KOLOKIUM STATISTIK 2004,FTMSK

    11

    Simple logit model

    Let Y and X be defined as follows:

    )()]|(log[

    )()]|(log[

    ,)(

    )(log

    001

    111

    11

    1

    0

    0

    0

    XYodds

    XYodds

    Hence

    X

    YP

    YP

    1

    0

    10

    NSvsS

    e

    e

    e

    snonsmoodds

    ssmooddsOR

    )ker(

    )ker(

    otherwise0

    smokerif1

    otherwise0

    cancerlungdevelopif1

    1X

    Y

    OR (Odds-ratio) :

    A ratio of 2 odds

  • 7/27/2019 Logistic Regression Notes

    12/50

    KOLOKIUM STATISTIK 2004,FTMSK

    12

    Interpretation of odds-ratio,

    If for example ,

    This odds ratio (OR) indicates that a smoker is 3

    times more likely to develop lung cancercompared to a nonsmoker.

    3ratioOdds,0986.1 0986.1 e

  • 7/27/2019 Logistic Regression Notes

    13/50

    KOLOKIUM STATISTIK 2004,FTMSK

    13

    Introduction to Discriminant Analysis

    An appropriate technique for classifying or separating individualsinto different groups (dependent variable) based on a set ofquantitativeindependent random variables

    Involves deriving the linear combination of predictor variables(called the discriminant function) that will discriminate best

    between the given groups

    The main objective of discriminant analysis is to predict groupmembership based on a set of quantitative variables.

    Assumptions: The predictor variables for each group has a

    multivariate normal distribution

    -6 -4 -2 0 2 4 6 8

    0.0

    0.1

    0.2

    0.3

    0.4

  • 7/27/2019 Logistic Regression Notes

    14/50

    KOLOKIUM STATISTIK 2004,FTMSK

    14

    Scatter Plot of Income Vs Lotsize

    LOTSIZE

    24222018161412

    120

    100

    80

    60

    40

    20

    GROUP

    nonowners

    owners

    Can we find a discriminant function based on income and lotsize of

    house to predict if a house owner will or will not purchase a lawn

    mower? (Johnson and Wichern, Applied Multivariate Statistical

    Analysis, Wiley,2002).

  • 7/27/2019 Logistic Regression Notes

    15/50

    KOLOKIUM STATISTIK 2004,FTMSK

    15

    We can classify a new observation (xo)

    using:1) Linear or quadratic discriminant functions

    2) Posteriorprobabilities

    i

    1

    k

    ofyprobabilitpriortheiswhere

    observed)x wasgiven thatfromcomes

    i

    ii

    g

    i

    kk

    k

    p

    xfp

    xfp

    xPxP

    )(

    )(

    ()|(

    Group k

  • 7/27/2019 Logistic Regression Notes

    16/50

    KOLOKIUM STATISTIK 2004,FTMSK

    16

    Classification for two (2) normal populations

    Homoscedastic Case (when )

    Allocate to if

    Otherwise allocate into .

    21

    1

    221

    1210

    121

    12

    2121

    p

    p

    c

    cxxSxxxSxx pooledpooled

    /

    /ln'/'

    0

    x

    0x 2

    1

    Linear Discriminant

    Function Cost of misclassification

    Prior

    probabilities

    An observationNote: Assume c(1/2)=c(2/1) and

    p1=p2 is they are unknown.

    Hence, ln (1)=0.

    Source: Johnson &Wichern, 2002

  • 7/27/2019 Logistic Regression Notes

    17/50

    KOLOKIUM STATISTIK 2004,FTMSK

    17

    Classification for two (2) normal

    populations:

    Heteroscedastic case (when ) Allocate to if

    Otherwise allocate into

    0x

    0x2

    121

    1

    20

    1

    22

    1

    110

    1

    2

    1

    1012

    2121

    p

    p

    c

    ckxSxSxxSSx

    /

    /ln'''/

    21

    221

    1

    11

    2

    1 2121 xSxxSxS

    Sk

    ''/||

    ||ln/

    Quadratic Discriminant Function

    Source: Johnson & Wichern, 2002

  • 7/27/2019 Logistic Regression Notes

    18/50

    KOLOKIUM STATISTIK 2004,FTMSK

    18

    Example:Admission into graduate programs

    based on GPA and GMAT

    9046224640385

    40385029690S

    24734618058090

    05809004350S

    28n31n07447

    482x

    23561

    403x

    21

    2121

    ..

    ..,

    ..

    ..

    ,,.

    .,

    .

    .

    admitnotdo

    admit

    2

    1

    :

    :

    00030020610

    020610483728S

    939349452902

    529020369501-pooled

    ..

    ..,

    ..

    ..pooledS

    scoreGMAT

    GPAateundergradu

    2

    1

    :

    :

    X

    X

    Independent variables (X)Response variable (Y)

  • 7/27/2019 Logistic Regression Notes

    19/50

  • 7/27/2019 Logistic Regression Notes

    20/50

    KOLOKIUM STATISTIK 2004,FTMSK

    20

    Classification with severalpopulations

    Allocate to

    if the linear discriminant score

    =the largest of

    where

    0x k

    )(

    xdk )(

    ),...,(

    ),(

    xdxdxd g21

    ,...,g,ipxd ii 21xSx21xSx i

    1pooled

    'i

    1pooled

    'i ln)(

    ,...,g,ipxxSxxSxd iiiiQ

    i21

    2

    1

    2

    1i

    1 ln)()(||ln)( '

    Quadratic discriminant score

    Fishers discrimianant

    function given inSPSS/SAS output

    NOTE:

    (1) Equal covariance matrices

    (2)Unequal covariance matrices

    Source: Johnson &

    Wicheren 2002)

  • 7/27/2019 Logistic Regression Notes

    21/50

    KOLOKIUM STATISTIK 2004,FTMSK

    21

    Assessing the performance of the

    classification functions

    Error rates-percentage of observations

    misclassified Predicted MembershipActual

    Membership

    Group Owners Non-

    owners

    Sample

    size

    Owners n1c n1m n1

    Non-owners n2m n2c n2

    21

    21rateErrornn

    nn mm

  • 7/27/2019 Logistic Regression Notes

    22/50

    KOLOKIUM STATISTIK 2004,FTMSK

    22

    Comparing the performance of multiple regression,

    logistic regression, and discrimination functions in

    classification of observations

    These three statistical methods were applied to aa data set to compare their predictive ability ofclassifying a baby as low birth weight or normalbased on several predictor variables.

  • 7/27/2019 Logistic Regression Notes

    23/50

    KOLOKIUM STATISTIK 2004,FTMSK

    23

    Dependent variableY = Birth weight (g)

    Independent variables

    X1 = Race (Malay, Chinese and Indian), X6 = Abortion (yes, no)

    X2 = Gender (male, female) X7 = Mothers height (cm)

    X3 = Mothers age (years) X8 = Vitamin (mg)

    X4 = Fathers income (RM) X9 = Weight gain (kg)

    X5 = Parity (children) X10 = Antenatal visits (number of times)

    Data set (collected in 1997) courtesy of

    Hospital Kuala Lumpur

  • 7/27/2019 Logistic Regression Notes

    24/50

    KOLOKIUM STATISTIK 2004,FTMSK

    24

    Yes

    Split data into

    (the training data set (n1= 365)

    (the validation data set (n2 = 50)

    Build the model(s) using

    the training data set.

    Evaluate the performance of the

    model using the validation data

    set.

    Find the probabilities of misclassifications;

    E1, E2 and E3.

    Compare the error rates E1, E2 and E3.

    Are remedial

    measures

    needed?

    Checking the model

    adequacy using plot

    of residuals and other

    diagnostics.

    Select the best model.

    Yes

    No

    Methodology (The Process of Developing and

    Evaluating the Models)

  • 7/27/2019 Logistic Regression Notes

    25/50

    KOLOKIUM STATISTIK 2004,FTMSK

    25

    SPSS Results(Multiple Linear Regression

    Analysis)

    ANOVA

    16531299 4 4132824.770 15.500 .000

    95990816 360 266641.157

    1.13E+08 364

    Regression

    Residual

    Total

    Model1

    Sum of

    Squares df Mean Square F Sig.

    Coefficients

    -1532.707 857.305 -1.788 .075

    45.828 17.534 .131 2.614 .009

    23.679 5.506 .210 4.300 .000

    39.234 9.698 .210 4.046 .000

    51.366 14.606 .178 3.517 .000

    (Constant)

    PARITY

    MUM_HEIG

    WGHTGAIN

    ANT_VST

    Model

    1

    B Std. Error

    UnstandardizedCoefficients

    Beta

    Standardi

    zed

    Coefficients

    t Sig.

    Model Summ ary

    .383 .147 .137 516.37

    Model

    1

    R R Square

    Adjusted

    R Square

    Std. Erro r of

    the Estimate

    Significant

    predictor

    variables

  • 7/27/2019 Logistic Regression Notes

    26/50

    KOLOKIUM STATISTIK 2004,FTMSK

    26

    The final estimated regression function is:

    Birth Weight= -1532.707 + 45.828(Parity) +

    23.679(Mothers Height) + 39.234(Weight Gain) +

    51.366(Antenatal Visits)

    SPSS Results (Multiple Linear Regression)

  • 7/27/2019 Logistic Regression Notes

    27/50

    KOLOKIUM STATISTIK 2004,FTMSK

    27

    Multiple Regression ResultsInterpretation of the estimated regression coefficients;

    1. For parity (b1

    = 45.828): every additional one child inthe family, the birth weight of babies will increase byapproximately 46g holding mothers height, weight gain andantenatal visits constant.

    2. For mothers height (b2= 23.679), it indicates that the birthweight of babies will increase by approximately 24g for every

    1 cm increase in mothers height, holding parity, weight gainand antenatal visits constant.(Birth weight higher for tallermothers)

    3. For weight gain (b3= 39.234), it indicates that the birth weightof babies will increase by approximately 40g for every1kg increase in weight gain, holding parity, mothers height

    and antenatal visits constant.4. For antenatal visits (b4= 51.366), it indicates that the birth

    weight of babies will increase by approximately 52g forevery one unit (time) increase in number of antenatal visitsholding parity, mothers height and weight gain constant.

    Checking Model Adequacy Through

  • 7/27/2019 Logistic Regression Notes

    28/50

    KOLOKIUM STATISTIK 2004,FTMSK

    28

    Checking Model Adequacy Through

    Diagnostic Plots

    Observed Value

    43210-1-2-3

    3

    2

    1

    0

    -1

    -2

    -3

    Regress ion Standardized Predicted Value

    43210-1-2-3

    4

    3

    2

    1

    0

    -1

    -2

    -3

    -4

    Q-Q Plot of Residuals Plot of Residuals against Fitted Values

    Notes: Kolmogorov-Smirnov = 0.045, p-value = 0.077Skewness = - 0.153

    Kurtosis = 0.048

    No violation of regression model assumptions of normal

    errors with constant variance.

  • 7/27/2019 Logistic Regression Notes

    29/50

    KOLOKIUM STATISTIK 2004,FTMSK 29

    Evaluating Regression Model Performance

    Through Error Rate The estimated regression function is then used to

    predict the birth weight of the 50 observations inthe validation sample

    Predicted values below 2500 were classified aslow birth weight. Otherwise, they are classified asnormal birth weight.

    The following classification table gives the trueand predicted category obtained.

  • 7/27/2019 Logistic Regression Notes

    30/50

    KOLOKIUM STATISTIK 2004,FTMSK 30

    Predicted Total

    Normal weight Low weight

    Observed Normal weight 34 0 34

    Low weight 15 1 16

    Total 49 1 50

    Classification Table.

    300

    50

    015

    1

    .

    E

    Error rate for Multiple Regression Model

  • 7/27/2019 Logistic Regression Notes

    31/50

    KOLOKIUM STATISTIK 2004,FTMSK 31

    Independent variables

    X1 = Race (Malay, Chinese and Indian), X6 = Abortion (yes, no)

    X2 = Gender (male, female) X7 = Mothers height (cm)

    X3 = Mothers age (years) X8 = Vitamin (mg)

    X4 = Fathers income (RM) X9 = Weight gain (kg)

    X5 = Parity (children) X10 = Antenatal visits (number of times)

    otherwise02500g)ht(birthweightbirth weiglowif1Y

    APPLYING LOGISTIC

    REGRESSION

  • 7/27/2019 Logistic Regression Notes

    32/50

    KOLOKIUM STATISTIK 2004,FTMSK 32

    SPSS Results for Multiple Logistic Regression.

    -.193 .052 13.666 1 .000 .824

    -.194 .070 7.624 1 .006 .824

    .648 .308 4.428 1 .035 1.912

    -.108 .028 15.136 1 .000 .898

    18.247 4.380 17.356 1 .000 8.4E+07

    WGHTGAIN

    ANT_VST

    ABORT(1)

    MUM_HEIG

    Constant

    Step

    1

    B S.E. Wald df Sig. Exp(B)

    where

    zj = 18.247 - 0.193(Weight gain) - 0.194(Antenatal visits) +

    0.648(History of abortion)0.108(Mothers height)

    The estimated logistic regression

    model obtained:

    jzj eYP

    1

    11)(

  • 7/27/2019 Logistic Regression Notes

    33/50

    KOLOKIUM STATISTIK 2004,FTMSK 33

    1. For weight gain; the odds ratio means that for every 1 kg increasein weight gain, the odds of low birth weight will decrease.

    2. For antenatal visits; the odds ratio indicates that when a

    mother increases antenatal visit by 1 time, the odds of

    low birth weight will decrease.

    3. For abortion; the odds ratio indicates that a mother who has

    had abortion(s) is approximately 2 times more likely to have a

    baby with low birth weight compared to those who have no

    history of abortion(s).

    4. For mothers height; the odds ratio indicates that the odds of low

    birth weight is lower for mothers who are taller

    Interpretation of the odds-ratio

    e

  • 7/27/2019 Logistic Regression Notes

    34/50

  • 7/27/2019 Logistic Regression Notes

    35/50

    KOLOKIUM STATISTIK 2004,FTMSK 35

    Evaluating the performance of the logistic

    regression model

    The estimated logistic function is then used to predict the 50

    observations in the validation data set

    If

    we classify the observation as belonging to (low birth weight)

    50215001

    ,...,,.)(

    jYP j

    1

  • 7/27/2019 Logistic Regression Notes

    36/50

    KOLOKIUM STATISTIK 2004,FTMSK 36

    Error Rate for the Logistic Regression

    Model

    Predicted Total

    Normal weight Low weight

    Observed Normal weight 33 1 34

    Low weight 11 5 16

    Total 44 6 50

    240

    50

    1112

    .

    E

    Discriminant Analysis (Checking the

  • 7/27/2019 Logistic Regression Notes

    37/50

    KOLOKIUM STATISTIK 2004,FTMSK 37

    Discriminant Analysis (Checking the

    assumption of multivariate normal distribution)

    Variables Normal birth weight Low birth weight

    Mothers age Approximately Normal Approximately Normal

    Fathers income Nonnormal Nonnormal

    Parity Approximately Approximately Normal

    Mothers height Approximately Normal Approximately Normal

    Vitamin Approximately Normal Approximately Normal

    Weight gain Approximately Normal Approximately Normal

    Antenatal visit Approximately Normal Approximately Normal

    Chi square plots for checking multivariate

  • 7/27/2019 Logistic Regression Notes

    38/50

    KOLOKIUM STATISTIK 2004,FTMSK 38

    Chi-square plots for checking multivariate

    normality

    0.00 5.00 10.00 15.00 20.00 25.00

    chisq

    0.00000

    10.00000

    20.00000

    30.00000

    40.00000

    50.00000

    MahalanobisDistance

    0.00 5.00 10.00 15.00 20.00

    chisq

    0.00000

    10.00000

    20.00000

    30.00000

    40.00000

    50.00000

    MahalanobisDistanc

    e

    Low Birth Weight Group Normal Birth Weight

    Group

    -6 -4 -2 0 2 4 6 8

    0.0

    0.1

    0.2

    0.3

    0.4 Chi-square plots indicate both

    groups have approximate

    multivariate normal distributions.

  • 7/27/2019 Logistic Regression Notes

    39/50

    KOLOKIUM STATISTIK 2004,FTMSK 39

    Boxs M 12.83

    F 2.11

    df 1 6

    df 2 217363

    Sig. 0.049

    Use Fishers Linear Disriminant Function.

    Can assumeequal covariancematrices

    Boxs M Test of

    Equality of Covariance

    Matrices.

    Discriminant Analysis Results

    211

    210

    :

    :

    H

    H

  • 7/27/2019 Logistic Regression Notes

    40/50

    KOLOKIUM STATISTIK 2004,FTMSK 40

    SPSS Output (Discriminant Functions)

    Class ification Function Coefficients

    6.699 6.601

    .397 .238

    2.965 2.797

    -532.689 -515.224

    MUM_HEIG

    WGHTGAIN

    ANT_VST

    (Constant)

    0 normal

    w eight 1 low w eight

    WEI_CODE

    Fisher's linear disc riminant functions

    1d

    Normal birth weight category;

    = -532.689 + 6.699(mothers height) + 0.397(weight gain) +

    2.965(antenatal visits)

    Low birth weight category;

    = -515.224 + 6.601(mothers height) + 0.238(weight gain) +

    2.797(antenatal visits)2

    d

  • 7/27/2019 Logistic Regression Notes

    41/50

    KOLOKIUM STATISTIK 2004,FTMSK 41

    Classification Results

    170 96 266

    28 71 99

    63.9 36.1 100.0

    28.3 71.7 100.0

    169 97 266

    28 71 99

    63.5 36.5 100.0

    28.3 71.7 100.0

    WEI_CODE

    normal weight

    low weight

    normal weight

    low weight

    normal weight

    low weight

    normal weight

    low weight

    Count

    %

    Count

    %

    Original

    Cross-validated

    normal weight low weight

    Predicted Group

    Membership

    Total

    Discriminant Analysis Results (Contd)

    Cross-validation error rate of themodel=0.34

  • 7/27/2019 Logistic Regression Notes

    42/50

    KOLOKIUM STATISTIK 2004,FTMSK 42

    Evaluating the performance of the

    discriminant functions The estimated discriminant functions is then used

    to predict the group membership of the 50

    observations in the validation data set If

    we classify the observation into (low birth weight)

    )()( jj xdxd 21

    1

  • 7/27/2019 Logistic Regression Notes

    43/50

    KOLOKIUM STATISTIK 2004,FTMSK 43

    Evaluate Discriminant Functions Performance

    Through Error Rate

    Classification Table.

    Predicted Total

    Normal weight Low weight

    Observed Normal weight 22 12 34

    Low weight 6 10 16

    Total 28 22 50

    360

    50

    1263

    .

    E

    S f h M d l P f

  • 7/27/2019 Logistic Regression Notes

    44/50

    KOLOKIUM STATISTIK 2004,

    FTMSK

    44

    Summary of the Models Performances

    Statistical model Significant variables Error rate

    1. Multiple linearregression

    1.

    Mothers height2. Weight gain

    3. Antenatal visits

    4. Parity

    0.30

    2. Multiple logistic

    regression

    1. Mothers height

    2. Weight gain

    3. Antenatal visits4. History of abortion(s)

    0.24

    3. Discriminant 1. Mothers height

    2. Weight gain

    3. Antenatal visits

    0.36

    Comparing the models with same significant predictor variables,

    error rates for Multiple Regression,Logistic Regression and

    Discriminant Analysis are 0.28, 0.26 and 0.36 respectively.

    Note:

  • 7/27/2019 Logistic Regression Notes

    45/50

    KOLOKIUM STATISTIK 2004,

    FTMSK

    45

    Conclusion of Study

    The significant predictor variables affecting birth

    weight of babies are weight gain, number of

    antenatal visits, parity, mothers height and historyof abortions

    The logistic regression model is found to be the

    best model in this study as it has the lowest error

    rate

  • 7/27/2019 Logistic Regression Notes

    46/50

    KOLOKIUM STATISTIK 2004,

    FTMSK

    46

    Some interesting research papers

    (1) Logistic Regression for Data Mining and High-

    Dimensional Classification

    Paul Komarek (PhD thesis, Carnegie Mellon University,

    2004, 138 pages)

    (www.autonlab.org/autonweb/showPaper.jsp?ID=komarek:Ir

    _thesis

    (2) Predicting Housing Value: A Comparison of Multiple

    Regression and Artificial Neural Networks

    Nghiep Nguyen & Al Cripps

    Journal of Real Estate Research,Vol 22, p313-336, 2001.

    http://www.autonlab.org/autonweb/showPaper.jsp?ID=komarek:Ir_thesishttp://www.autonlab.org/autonweb/showPaper.jsp?ID=komarek:Ir_thesishttp://www.autonlab.org/autonweb/showPaper.jsp?ID=komarek:Ir_thesishttp://www.autonlab.org/autonweb/showPaper.jsp?ID=komarek:Ir_thesis
  • 7/27/2019 Logistic Regression Notes

    47/50

    KOLOKIUM STATISTIK 2004,

    FTMSK

    47

    Some interesting research papers

    (3)Application of f-regression to fuzzy classification problem

    Boris Izyumov

    Proceedings of 3rd International Conference on Fuzzy

    Logic and Technology (EUS,2003),Zittau, Germany

    (2003),pp781-766

    (4) Assessing and Predicting Information and Communication

    Technology Literacy in Education Undergraduates

    JoAnne Davies (Phd thesis, Department of Educational

    Psychology, Edmonton, Alberta, 2002)

  • 7/27/2019 Logistic Regression Notes

    48/50

    KOLOKIUM STATISTIK 2004,

    FTMSK

    48

    Some interesting research papers

    (5) Discriminant Analysis for recognition of human faceimages

    Kamran Etemad & Rama ChellapaJ. Optical Soc. Of America, Vol 14, No 8, 1997

  • 7/27/2019 Logistic Regression Notes

    49/50

    FACULTY OF INFORMATION TECHNOLOGY &

  • 7/27/2019 Logistic Regression Notes

    50/50

    KOLOKIUM STATISTIK 2004 50

    PETALWID

    3.02.52.01.51.0.50.0

    7

    6

    5

    4

    3

    2

    1

    0

    TYPE

    iris virginica

    iris versicolor

    iris setosa

    -6 -4 -2 0 2 4 6 8

    0.0

    0.1

    0.2

    0.3

    0.4

    FACULTY OF INFORMATION TECHNOLOGY &

    QUANTITATIVE SCIENCES, UiTM