chapter_14 advanced regression models

Upload: mgahabib

Post on 04-Apr-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Chapter_14 Advanced Regression Models

    1/49

    557

    Chapter 14

    ADVANCED REGRESSION MODELS

    Raghuram Iyengar, University of Pennsylvania

    Sunil Gupta, Columbia University

    Introduction

    The previous chapter covered the basics of the powerful and yet simple technique of

    Ordinary Least Squares (OLS). It was noted that the mathematical relationship between the

    dependent variable for an observation yt at time t and a vector of independent variables xt can be

    written in the following manner.

    yt = xt + t (1)

    Here, xt is the transpose of vector xt and is a vector of parameters. Also, yt is continuous from -

    to and t is the random error that is typically assumed to be normally distributed.

    Several scenarios fit the assumption of a continuous dependent variable that ranges from

    - to . In cases when yt is strictly positive (e.g. sales), we can transform it as ln(yt) to make it

    lie between - to and continue to use OLS. But what happens if the dependent variableis

    discrete (e.g. buy / no buy) or choice of a brand (e.g., Brand A, B, C or D) and we want to

    analyze the effect of brand prices on these decisions? The purpose of this chapter is to show

    methods that can be used in such scenarios.

    We begin with Discriminant Analysis. This is followed by a discussion of logistic

    regression and the multinomial logit model. Thereafter, we focus on the multinomial probit

    model. The chapter ends with a discussion on Tobit models.

    Discriminant Analysis

  • 7/30/2019 Chapter_14 Advanced Regression Models

    2/49

    558

    Consider the following example where a dependent variable is binary a buy / no buy

    decision. A company that has introduced a product in the market wishes to describe the people

    that are buying its product. Figure 14.1 shows the demographic information that the company has

    together with the purchasers (P) and non-purchasers (N). The figure suggests that purchasers of

    this product are older and richer. Thus, age and income discriminate among the purchasers and

    non-purchasers. However, it is not clear which of the two variables is more important and how

    we can predict a new person to be a purchaser or non-purchaser based on his/her income and age.

    Such questions can be answered by using discriminant analysis.

    [Figure 14.1 about here]

    Discriminant analysis is a method to analyze which independent variables discriminate

    among groups and to classify observations into predetermined groups based on these variables.

    These predetermined groups can be either binary (eg., buy or no buy) or more than two. In the

    latter case, the analysis is termed as multiple discriminant analysis. For the sake of simplicity, we

    begin with a two-group discriminant analysis.

    In a discriminant analysis, an index is built using the measured characteristics as the

    independent variables. Thus for an observation at time t,

    ft = x1t1 + x2t2 + x3t3 + + xKtK= xt (2)

    Here, ft is the index. It is also called the discriminant function. There are K measured

    characteristics (x1t, x2t, ..., xKt). The vector xt is the transpose of vector xt that contains these K

    variables. There are also K parameters (1, 2,, K), which are the weights corresponding to

    these variables. These weights are also termed as the discriminant coefficients.

    The goal of discriminant analysis is to estimate weights such that the index values for the

    two groups are as far as possible. In other words, the weights are derived such that the variation

  • 7/30/2019 Chapter_14 Advanced Regression Models

    3/49

    559

    in f scores between the two groups is as large as possible, while the variation in the f scores

    within the groups is as small as possible. That is, the weights are derived so that the following

    ratio is maximized.

    VariationGroup-Within

    VariationGroup-Between(3)

    Maximizing the above ratio makes the two groups as distinct as possible with respect to the

    index values. More mathematically oriented readers can see Chapter 11 of Johnson and Wichern

    (2002) for a description of how the above quantity is maximized.

    Discriminant analysis is related to and yet is distinct from linear regression. In both

    methods, there is a weighted linear combination of independent variables that is used to predict a

    dependent variable. Also, like linear regression, discriminant analysis suffers from

    multicollinearity of the independent variables. The primary difference between the two methods

    is that in a linear regression, the dependent variable is typically assumed to range from - to

    whereas in a discriminant analysis, the dependent variable is group membership i.e., is discrete.

    For an application where the group membership is in two groups, a linear regression can be run

    with a dummy variable representing group membership as the dependent variable. The estimates

    from such a regression will be proportional to the weights that are obtained from a discriminant

    analysis. When the number of groups is, however, greater than two, then a regression will not

    yield the same results.

    Discriminant analysis is also different from cluster analysis (see Chapter 18, this book).

    In discriminant analysis, the groups are predetermined and the analysis is focused on which

    variables best discriminate among these groups. In a cluster analysis, the group memberships are

    unknown and the focus of the analysis is to form these groups.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    4/49

    560

    Consider the following example of a two-group discriminant analysis. Table 14.1

    contains the data on the fifty US states and they are broken down into two groups 15 states that

    are South and 35 that are Non-South (Lehmann, Gupta, & Steckel, 1997). These groups are

    compared on observable characteristics such as income, population and others. A univariate F-

    test compares the differences in means across the two groups on each of the independent

    variables. The big differences between the two groups appear to be in income, tax per capita and

    mineral production.

    A discriminant analysis was run and Table 14.1 contains the discriminant coefficients.

    These are the weights of the independent variables. Another column shows standardized

    discriminant coefficients. These coefficients are similar to the standardized regression

    coefficients in an OLS regression. They correct for any scale issues associated with the

    independent variables. We can calculate these coefficients by first standardizing the independent

    variables and then running a discriminant analysis or by first running a discriminant analysis and

    then multiplying each discriminant coefficient by the standard deviation of the respective

    independent variable. Both methods yield standardized coefficients and these can be used to

    ascertain how a change of one standard deviation in each independent variable will affect the

    discriminant function.

    [Table 14.1 about here]

    From the estimated unstandardized coefficients, we find that population is most important

    variable for discrimination followed by average income. Upon standardizing the variables, we

    observe a different set of variables that are important. We find that while population is still the

    most important, college enrollment and manufacturing output are clearly more relevant for

  • 7/30/2019 Chapter_14 Advanced Regression Models

    5/49

    561

    discrimination among the states than is average income. Thus, a failure to account for differences

    in scale can lead to erroneous conclusions about the relative importance of variables.

    Measures of Fit

    There are several measures of fit that are used to analyze how good is the model for

    discrimination.

    Chi-Squared Value

    A Chi-Squared value tests whether overall the variables help discriminate among the two

    groups. This is very similar to the F-test for overall significance in a regression setting. Here, the

    Chi-Squared value is 42.71. For testing the significance, we look at the critical value for 11

    degrees of freedom (the number of independent variables). This value is 31.3 at the 0.001 level.

    Thus, the variables clearly help in discrimination.

    Canonical Correlation

    The canonical correlation is the correlation resulting from a regression of the independent

    variables on a dummy dependent variable. Its squared value is the R2from this regression. In this

    example, the canonical correlation is 0.80. Thus, the R2 is 0.64.

    Wilks Lambda

    Wilks Lambda is the ratio of within-group variance to the total variance. Here, it is

    essentially 1- R2. Thus, the Wilks Lamda is 1-0.64 = 0.36.

    The Hit-Miss Table

    The Hit-Miss table provides an indication of how good is the discriminant function in

    classifying observations. Table 14.2 is such a hit-miss table. Here we find that 32 out of the 35

    non south states and 14 out of the 15 south states are correctly classified. Thus, the overall

    classification rate is (32+ 14)/ 50 i.e., 92 %.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    6/49

    562

    [Table 14.2 about here]

    Multiple Discriminant Analysis

    A multiple discriminant analysis is carried out when the observations are preclassified

    into more than two groups. The basic idea is first to find a single function that spreads all groups

    as far apart as possible. Then, a second function is found that best explains any differences

    among groups and so on. If there are K groups, then K-1 discriminant functions are found.

    To illustrate multiple discriminant analysis, we consider an example described in

    Lehmann, Gupta, and Steckel (1997). In this example, there are five groups of consumers

    depending on how much they spend in dollars on their monthly expenditure for food. Table 14.3

    shows the five groups together with the averages of the independent variables. The means appear

    to indicate the larger spenders are more educated, are younger, have higher incomes, have larger

    family sizes and shop more extensively. Table 14.3 also shows F tests for the variables for the

    significance of differences among the five groups. These tests suggest that family size and

    income are the most important (i.e. have the highest F value).

    [Table 14.3 about here]

    A discriminant analysis is run. In the analysis, a few variables are dropped as they do not

    contribute to discrimination among the groups. We then obtain the standardized and

    unstandardized discriminant coefficients. As there are 5 groups, we have 5-1=4 discriminant

    functions. Table 14.4 shows the unstandardized and standardized coefficients. The discriminant

    functions are ranked according to their usefulness for discrimination. In other words, the first

    function is the most important for discriminating amongst the five groups; the second one is the

    second most important and so on.

    [Table 14.4 about here]

  • 7/30/2019 Chapter_14 Advanced Regression Models

    7/49

    563

    From the results on the standardized coefficients, we find that the most important

    variables in the first function are family size, income and how often they shop. The second

    function is related to age and family size. Table 14.5 gives the group means for the groups based

    on the four discriminant functions. Figure 14.2 plots these means for the first and the second

    functions. We can see that there is a big spread of the means of the five groups along the

    horizontal axis (Function-1) and less so along the vertical axis (Function 2).

    [Table 14.5 about here]

    [Figure 14.2 about here]

    Measure of Fit

    As a measure of fit of the model, we use the hit-miss table. Table 14.6 is such a hit miss

    table for the five categories. From the results, we see that the overall classification rate is

    (20+106+90+84+40)/ (34+284+293+181+61) *100 = 39.86 %.

    [Table 14.6 about here]

    Discriminant analysis rests on two statistical assumptions. One, the independent variables

    are assumed to be jointly normally distributed and two, the covariances are assumed to be the

    same across all groups. When these assumptions are violated then the statistical interpretation of

    the results becomes very difficult. For instance, while in practice, dummy variables are

    frequently used as independent variables, in theory it is a problem. This is because if a dummy

    independent variable is used then the independent variables are not normally distributed. To

    alleviate such statistical difficulties, the method of logistic regression is used. We motivate this

    method with a managerial problem that all direct marketers face.

    Logistic Regression

  • 7/30/2019 Chapter_14 Advanced Regression Models

    8/49

    564

    Catalog companies regularly keep track of Recency, Frequency and Monetary (RFM)

    variables. There is an interest in relating these RFM measures to purchase behavior a buy / no

    buy decision. These measures can then be used for predicting purchase and for making any

    strategic intervention decisions to increase retention. Table 14.7 contains the summary statistics

    of such a data where Recency is measured in months since last purchase and Monetary is in

    dollar amount. Choice is a variable which takes a value 1 if a consumer made a purchase and

    0 if she did not.1

    [Table 14.7 about here]

    One strategy for estimating the relationship between choice and the RFM measures

    would be to use an OLS with choice as the dependent variable (yt) and RFM measures as the

    independent variables (xt). Table 14.8 shows the results for the OLS regression. The results

    suggest that Recency and Monetary are significant whereas Frequency is not. Further, the R2 is

    about 0.61 and the adjusted R2is around 0.60. Despite the high R

    2, OLS is not appropriate for

    several reasons. Figure 14.3 plots the predictions of Choice and the true value of Choice for the

    100 data points. We see that there are instances where the predictions for choice are either less

    than zero or greater than one! While this is not surprising given that OLS assumes that the

    dependent variable is continuous between - to , in the current context these predictions are

    clearly inconsistent with the data. For instance, how do we interpret a prediction of 1.32 and

    compare it with a prediction of 1.82? Are both indicating a purchase decision i.e. should we

    assume both are just 1 (buy)? Similarly, it is not clear how to interpret a prediction of -0.18 when

    a value of 0 reflects no purchase. This example shows that when an assumption of the OLS

    technique (in this case, the continuous distribution of the dependent variable) is violated, its

    results cannot be interpreted.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    9/49

    565

    [Table 14.8 about here]

    [Figure 14.3 about here]

    The dependent variable in the above example is discrete. Such choice scenarios are

    extremely common. For instance, pharmaceutical companies are interested in predicting whether

    a physician would prescribe their drug or not and the factors that might increase the prescription

    rate. Similarly, managers in industries with an online presence are interested in identifying

    factors that can predict which consumers will purchase online (Bellman, Lohse, & Johnson,

    1999). While these questions can also be addressed by discriminant analysis, there are other

    scenarios such as when the dependent variableis market share (i.e. lies between 0 and 1) and we

    want to quantify the effect of price and promotions on it, which needs a different method that can

    accommodate such responses.

    Model for Logistic Regression

    A logistic regression analysis begins with a dependent variable, which is either discrete

    (eg. buy / no buy) or lies between 0 and 1 (eg. market share). If we are modeling a discrete

    decision such as buy / no buy then we specify the probabilities of the two possible events i.e.

    P(Buy) and P(No Buy). As P(Buy) and P(No Buy) are probabilities, they are between 0 and 1

    and they should sum up to 1. Next, we revisit the example with the discrete choice (buy / no buy)

    and RFM measures that we discussed earlier. We then briefly discuss how the same framework

    can be applied for analyzing market shares.

    In the RFM example, the two events are purchase and no purchase. Using the measures

    of P(Buy) and P(No Buy), we can specify the odds of buying as

    P(Buy))(1

    P(Buy)

    =

    P(NoBuy)

    P(Buy)Odds(Buy) = (4)

  • 7/30/2019 Chapter_14 Advanced Regression Models

    10/49

    566

    The odds of buying are constrained between 0 and + and take a value 1 if both outcomes are

    equally likely i.e. P(Buy) = 0.5 and P(No Buy) = 0.5. We can make the odds lie between - and

    + by taking the natural log transform. Thus,

    )P(NoBuy)

    P(Buy)Log(uy))Log(Odds(B = (5)

    As log odds lie between - and +, we can relate it to any independent variables and

    interpret the effects of the variables in a manner similar to that in OLS; only now the effect of the

    variables would be on the log odds of the dependent variable. Thus, we can write the following

    equation relating the log odds of purchase for an observation t with the independent variables

    (xt) as.

    Log(Odds(Buy))t = xt (6)

    This can be rewritten as

    )).xexp(1/(1P(Buy) tt += (7)

    Recall that P(Buy) is the probability of a purchase and hence should always be between 0

    and 1. The above expression ensures that this will be the case irrespective of the values of the

    covariates.

    We can now use the above model for our example. Table 14.9 shows the results of two

    logistic regression models using Maximum Likelihood Estimation (MLE) the intercept only

    model, where xi contains only the intercept and the full model, where xi contains the intercept

    and the RFM variables. The results of the full model show that the RFM variables are significant.

    Further, an increase of 1 month in Recency causes an increase of 3.34 in the log odds of Buying.

    We can also calculate the effect on the odds of buying. This would be exp(3.34) or 28.28 i.e., the

    effect of increasing the Recency by 1 month increases the odds of buying by 28.28. A similar

  • 7/30/2019 Chapter_14 Advanced Regression Models

    11/49

    567

    analysis can be done for the other variables. Note that the RFM estimates are close to the true

    values of the sensitivities (see Footnote 1). Also note that the frequency sensitivity is significant

    in this analysis while it was not so using OLS. Thus, OLS can mask the true relationship between

    variables and its results can lead to erroneous interpretations for cases when the dependent

    variable is not continuous.

    [Table 14.9 about here]

    Figure 14.4 plots the predicted probabilities of Buying with the true value of Choice.

    Notice that, in contrast to the predictions of the OLS regression (Figure 14.3), all predictions lie

    between 0 and 1. Also, unlike the case of the OLS regression, a higher predicted value has the

    interpretation of a higher probability of purchase. To see how this probability of purchase varies

    with a change in one of the covariates, see Figure 14.5. In this figure, we plot the predicted

    probability of purchase with change in frequency. For generating this figure, we fixed the

    recency and monetary variables at their average values. The figure shows that probability of

    purchase has an S shape curve when the frequency increases.

    [Figure 14.4 about here]

    [Figure 14.5 about here]

    In the above example, we modeled the purchase decision and then related it to RFM

    measures. As the purchase variable was discrete, we specified the probability of purchase and no

    purchase measures that lie between 0 and 1. We then specified the odds of purchase and took a

    log transform to make it lie between - and +. We can apply the above framework to analyze

    market shares (MS) as well. For instance, a brand manager might want to quantify the effect of

    the region-specific prices and promotions on the market share in these regions. In this case, we

  • 7/30/2019 Chapter_14 Advanced Regression Models

    12/49

    568

    can begin the analysis by directly specifying the odds of market share since it already lies

    between 0 and 1. Thus,

    'ttt x)

    MS1

    MSLog(S))Log(Odds(M =

    = (8)

    Here, for a region t, the vector xt will contain the prices and promotions for that region.

    Measures of Fit

    There are several measures of model fit that are used for testing the suitability of logistic

    regression models. Most of these measures are based around the log-likelihood measure, which

    is as follows.

    LL() = t

    )Ln(Lt (9)

    Here, is the entire set of MLE parameters (intercept and the other explanatory variables).

    Likelihood Ratio Test

    The most commonly used likelihood ratio test has the following test statistic:

    -2(LL(C ) - LL()) (10)

    Here, LL(C ) refers to the likelihood of the data when only an intercept model is run. Suppose

    there are K covariates in the model (including the intercept) then the above statistic is distributed

    2 with K-1 degrees of freedom (Theil, 1971). Thus, the test statistic measures whether the

    increase in the likelihood caused by the inclusion of the explanatory variables (over and above

    the intercept) is significantly better than the likelihood from a model containing only the

    intercept.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    13/49

    569

    In our example, -2 LL() is 30.489 while -2LL(C ) is 137.628. Thus, the test statistic

    takes a value of 107.139. The degrees of freedom are 4-1=3. The critical value of a 2 with 3

    degrees of freedom at the 0.001 level is 16.26. Thus, the likelihood of a model that has the RFM

    measures is significantly better than a model with just the intercept.

    Akaike Information Criterion (AIC)

    AIC provides a way of adjusting the log-likelihood of a model for the number of

    parameters in the model. This adjustment corrects for over fitting of the data. The expression for

    this statistic is as follows.

    AIC = -2 LL() + 2K (11)

    Here, K is the dimension of . Lower values of AIC denote a better model. Thus, a model with

    very large number of variables might have a low likelihood but it will also be penalized for the

    number of variables.

    In our example, we can calculate the AIC with the intercept only model (AICint) and the

    AIC associated with a model containing the intercept and RFM measures (AICfull). These are as

    follows.

    AICint = 137.628 + 2(1) = 139.628, (12a)

    AICfull = 30.489 + 2(4) = 38.489. (12b)

    Thus, the full model has a better (i.e. lower) AIC as compared to the intercept only model.

    Likelihood Ratio Index (2)

    The likelihood ratio index is similar to the R2 in the regular regression models. It is

    described as follows.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    14/49

    570

    2

    = 1 LL() /LL(C ) (13)

    Here, LL() is -15.24 (= -30.48/2) and LL( C ) is 68.81 (= 137.628/2). Thus, the value of2 is

    0.78.

    As the R2, the 2 of a model will always increase or atleast stay the same when new

    variables are added. There is another statistic, the adjusted likelihood ration index ( 2 ) that

    penalizes for the increase in the number of parameters. This statistic is similar to the adjusted R2 .

    2 = 1- (LL() -K)/(LL(C )-1) (14)

    In our example, this statistic will be the following.

    2 = 1- (15.24+4)/(68.81+1) = 0.72 (15)

    Hit Rate

    Another measure that is typically used to test the fit of a model is the hit rate. For

    computing this measure, we take the predicted probabilities of the events from the logistic

    regression and employ a cut off value for making discrete predictions for the occurrence of an

    event. We then compare the predicted events with the actual events to determine the percentage

    of times in the dataset the two are the same.

    In our example, the two events are buy / no buy. The results from the logistic regression

    estimation provide the probability of purchase. We put a cut-off at 0.5 i.e. for an observation if

    the predicted probability of purchase is above 0.5, then we predict a purchase for that

    observation else we predict a no purchase. We then compare these predictions with actual events.

    We find that, using the full model, we correctly classify 94 out of the 100 observations. Thus, the

    hit rate is 94 %.

    The measure of hit rate as a statistic for model accuracy has a few limitations. First, the

    cutoff is arbitrary. Here we took a cut off of 0.5. We could have chosen any other cutoff value as

  • 7/30/2019 Chapter_14 Advanced Regression Models

    15/49

    571

    well. Second, the hit rate is not very useful when the data is skewed. Suppose we have a dataset

    where there are many observations with no purchase and few observations with purchase. Then a

    model that predicts no purchase for all observations will do well on the hit rate.

    In most applications, the data is also typically split into a calibration sample and a hold

    out sample. The model is estimated on the calibration sample and then is used to predict the

    observations in the hold out sample. Almost always, the hit rate within the hold out sample is

    lower than the hit rate within the calibration sample.

    Thus far, we have considered instances when the dependent variable is binary (or is

    between 0-1, e.g. market share) and logistic regression is readily applicable. There are also

    scenarios where the dependent variable can take multiple values. For instance, in the

    antihistamine category, there are 4 major drugs - Claritin, Zyrtec, Allegra and Clarinex. A doctor

    might prescribe one of these drugs to a patient. It is of much interest to pharmaceutical

    companies to quantify the factors which can predict when a doctor is most likely to prescribe

    their drug. Analysis of situations that have a multinomial dependent variable is not possible with

    a logistic regression. Next, we describe a method that can analyze such situations.

    Multinomial Logit Model

    Consider the case of a consumer packaged goods manufacturer in the grocery industry.

    The company is interested in predicting which brands their customers will choose on a shopping

    occasion and how prices and promotions might affect this choice. For example, Figure 14.6

    shows the variation in market share of a brand with changes in promotion. In this figure, we find

    that there is an increase in market share (shown in blue) whenever there is a dip in prices (shown

    in red). Further, the presence of various promotional vehicles such as feature, display and

    coupons affects these shares. A quantitative analysis of such a problem can help retailers

  • 7/30/2019 Chapter_14 Advanced Regression Models

    16/49

    572

    understand the effect of brand promotions (Gupta, 1988), aid in appropriately setting retail prices

    and determine the product portfolio that they should carry (Draganska & Jain, 2005). A

    multinomial logit model is the most popular model to analyze such scenarios. Next, we develop

    this model within a random utility framework.

    [Figure 14.6 about here]

    Random Utility Theory

    Assume that a consumer assigns a level of attractiveness to each discrete alternative in

    her choice set. This attractiveness number for an alternative, a single index, conveys how much

    the consumer likes that alternative. Thus, all the information present in the attributes of the

    alternative is collapsed into this single index. This alternative-specific index is typically called

    utility.

    For an alternative j and time t, we will specify the utility (Ujt) to be composed of two

    components. One component is called the systematic component (denoted by Vjt). This is

    deterministic and contains the effects of covariates on the utility. The second component is called

    the random component. This contains any other random factors that affect consumers choice.

    Thus,

    Ujt = Vjt + jt (16a)

    or,

    Ujt = x jt + jt (16b)

    Here, for time t, xjt contains covariates associated with alternative j and is a vector of

    parameters.

    We assume that decision makers choose the alternative that gives them the maximum

    utility. Also, for all alternatives the random componentsare independent and identically Gumbel

  • 7/30/2019 Chapter_14 Advanced Regression Models

    17/49

    573

    distributed. This particular choice of the error distribution leads to the following expression for

    the probability of choice of an alternative j out of the possible J alternatives in a choice set.

    (17)

    The above expression is intuitive to understand. The numerator can be interpreted as the strength

    of alternative j while the denominator is the sum of the strengths of all alternatives. Thus, the

    probability expression essentially is the relative strength of alternative j. For a detailed

    description of how this probability expression is attained from the assumptions of the error

    distributions, see Ben-Akiva and Lerman (1985) or Train (2003).

    The above expression also shows that the logistic regression model is a subset of the

    multinomial logit model with binary outcomes. Thus, we can also arrive at the expressions for

    the probabilities of the logistic regression by beginning with a random utility specification for the

    binary outcomes.

    We can apply the above model to an example from grocery industry. The data for this

    example, made available by A.C. Nielsen, was collected during January, 1993 to May, 1995. We

    use a sample of 300 people that purchased in the Breakfast Foods category. There are four major

    brands in this category.2 For each brand, we have the price and promotion variation over time,

    which enter the vector xjt. In this application, promotion is a dummy variable created by

    combining various promotional vehicles such as feature and display. Table 14.10 shows the

    summary statistics for the Breakfast Foods data.

    == j)P(Choice

    =

    J

    k

    e

    e

    1

    kt

    jt

    x

    x

  • 7/30/2019 Chapter_14 Advanced Regression Models

    18/49

    574

    We estimate the parameters of the multinomial logit model using MLE on this data. Prior

    to looking at the results, a set of identification conditions have to be discussed. These are

    restriction conditions that must be imposed such that the model is identifiable i.e. only one set of

    parameters will be maximizing the likelihood. The restriction corresponds to setting any one of

    the brand intercepts to be zero. This is because only differences in utility matter in specifying

    which brand a consumer will choose. This can be seen from the following illustration. Suppose

    the four brands have the following utilities: U1t=10, U2t=20, U3t=25 and U4t=30. Then, a

    consumer will choose Brand 4 as that has the maximum utility. Now, suppose we add 5 units to

    each brand-specific utility. Then, the utilities for the four brands will be the following: U1t=15,

    U2t=25, U3t=30 and U4t=35. This addition of 5 units will not change the chosen brand. A

    consumer will still choose Brand 4. Thus, the absolute values of the utilities do not matter. It is

    only the relative differences in the utilities among the brands that do. We will arbitrarily set the

    intercept of Brand 4 to be zero. Thus, the intercepts of the other three brands will be interpreted

    as being relative to Brand 4. This is similar to the interpretation of a dummy variable in a

    regression.

    [Table 14.10 about here]

    Table 14.11 contains the MLE estimates of two multinomial logit models. The brand-

    specific intercepts only model contains the estimates for a model that contains only the

    alternative specific intercepts. The full model contains both the intercepts and the price and

    promotion covariates. The estimates of the full model show that the coefficient of price is

    negative (as it should be) whereas the coefficient for promotion is positive (again as expected).

    While these results are intuitive, a more managerially relevant goal is to estimate the impact of

    changing the price (or promotion) of brand j on the probability of choice of brand j as well as

  • 7/30/2019 Chapter_14 Advanced Regression Models

    19/49

    575

    on the probability of choosing any other brand k. A variable used for quantifying such effects is

    elasticity.

    [Table 14.11 about here]

    Elasticity from the Logit Model

    The systematic component for a brand j contains price and promotion. Thus,

    .Prom*Price*V jtjtj prompricejt ++= (18)

    Here, j is the intercept for alternative j, price is the price sensitivity and prom is the promotion

    sensitivity. The elasticity of any dependent variable with respect to an independent variable is the

    percent change in the dependent variable following a 1% change in the independent variable. As

    an example, suppose the price of brand j is changed, then the own-price elasticity can be

    ascertained by estimating the percent change in the probability of purchasing brand j after a 1%

    change in its price. Similarly, the cross-price elasticity on a brand k can be evaluated by

    considering the percent change in the probability of purchasing brand k following a 1% change

    in price of brand j.

    For the multinomial logit model, the expressions for the own-price and cross-price

    elasticity are closed-form and are determined by the multinomial logit probabilities. These

    expressions are as follows.

    pricejj

    j

    Price)P(j)1(P(j)

    Price

    Price

    P(j) =

    =pricejj (19a)

    pricejj

    j

    P(j)PriceP(k)

    Price

    Price

    P(k) =

    =pricekj (19b)

  • 7/30/2019 Chapter_14 Advanced Regression Models

    20/49

    576

    Here, pricejj denotes the own-price elasticity of brand j and reflects the percentage change in

    probability of buying brand j with a 1% change in the price of brand j. And, pricekj is the cross-

    price elasticity of brand k and reflects the percentage change in the probability of buying brand

    k with a 1% change in the price of brand j is changed. Notice that the cross-price elasticity for

    brand k does not depend on the attributes of brand k. Thus, the cross-price elasticity arising

    from a change in brand j is the same for all other brands. This property, termed as uniform

    cross-elasticity, is a consequence of the expression of the multinomial logit probabilities.

    Table 14.12 contains the price elasticity measures for the full model. To estimate these

    elasticity measures, we calculate the own and cross price elasticities for each brand and for every

    observation. Then, we average these measures over all observations in the dataset. We can use

    these numbers to interpret the impact of changing prices of a brand on own shares as well as

    shares of other brands. We find that a 1% increase in the price of Brand 1 lowers the probability

    of choosing Brand 1 by about 4.5 %. Similarly, a 1% increase in the price of Brand 1 increases

    the probability of choosing the others brands by 0.85 %. A similar analysis can be conducted for

    the other brands.

    [Table 14.12 about here]

    The elasticity measures also show an interesting property. From the summary statistics,

    we know that Brand 3 has the highest share. If we now consider the elasticity measures, we

    notice that Brand 3 has the lowest own-price elasticity and the highest cross-price elasticity. This

    is a limitation of the elasticity measures resulting from the multinomial logit model i.e., high

    market share brands show low own-price elasticity and high cross-price elasticity.

    Note that we showed elasticity measures for the multinomial logit model. Similar

    measures can also be calculated for the logistic regression model.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    21/49

    577

    Fit Measures

    In this application, we can calculate all the fit measures that we specified in the section

    on logistic regression.

    Likelihood Ratio Test

    A typical likelihood ratio test involves comparing a model with only alternative specific

    intercepts with a model where there are alternative-specific intercepts together with other

    explanatory variables.

    Let LL(C ) refer to the likelihood of the data when only intercepts are included in a

    model while LL() denotes the likelihood when the model contains intercepts together with the

    price and promotion covariates. In our example, from Table 14.11, -2 LL() is 3033.92 while -

    2LL(C ) is 4321.88. Thus, the test statistic takes a value of 1287.96. The degrees of freedom are

    5-3=2. The critical value of a 2 with 2 degrees of freedom at the 0.001 level is 13.81. Thus, the

    likelihood of a model that contains the price and promotion covariates is significantly better than

    a model without.

    Likelihood Ratio Index (2)

    The likelihood ratio index is described as follows.

    2

    = 1 LL() /LL(C ) (20)

    In the current application, LL() is -1516.96 and LL(C ) is -2160.94. Thus, 2 is 0.30.

    As explained earlier, the adjusted 2 has the following expression.

    2 = 1- (LL() -K)/(LL(C )-P) (21)

    Here, K is the total number of parameters including the intercepts and other covariates while P is

    the number of intercepts. Thus, 2 is 0.29.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    22/49

    578

    Hit Rate

    In a multinomial logit model, the probabilities of choice of each alternative have a closed

    form expression. The predicted probabilities for choosing each alternative can then be easily

    calculated by inserting the MLE estimates in the probability expressions. We can then predict the

    alternative that is most likely to be chosen (brand with the highest probability) and compare it

    with the brand that is actually chosen. If the two are the same, we have a hit (i.e. a correction

    prediction) else the prediction is wrong. We calculate the hit rate for the intercepts only model

    and the full model. Table 14.11 reports these results. We find that the hit rate for an intercepts

    only model is around 48.3 % while the hit rate for the full model is considerably higher at 63.2

    %.

    Independence of Irrelevant Alternatives (I.I.A.)

    The multinomial logit model has several properties. One property that we discussed was

    the uniform cross-elasticity. Another property that has been especially emphasized is that of

    I.I.A. The property can be best illustrated by revisiting the expressions for the probabilities from

    the logit model.

    Suppose we consider the probability of choice of two alternatives, i and j, denoted by P(i)

    and P(j) respectively, then,

    x

    x

    x

    x

    x

    x

    )(

    )(

    jt

    it

    1

    kt

    jt

    1

    kt

    it

    =

    =

    =

    =

    e

    e

    e

    e

    e

    e

    jP

    iP

    J

    k

    J

    k (22)

  • 7/30/2019 Chapter_14 Advanced Regression Models

    23/49

    579

    Equation (22) shows that the ratio of the probabilities of choosing two alternatives, i and j, is

    independentof the presence of other alternatives and is only dependent on the systematic utilities

    of the two alternatives. Thus, even if a new alternative very similar to i enters into the market,

    it will not make a different in the relative probabilities of choosing i and j. This result is a

    direct consequence of the independence assumption among the errors of the alternative-specific

    utilities. This assumption can be pretty tenuous in many contexts. The following problem

    illustrates one such context.

    There is a famous problem, called the red bus/ blue bus problem, which illustrates the

    I.I.A. issue. The problem is as follows. Suppose consumers are choosing between a car and a

    blue bus as means of transportation and suppose they equally like both modes of transport. The

    probability of choosing either a car or a blue bus is 0.5. In other words,

    P(choose car) / P(choose blue bus) = 1. (23)

    Recall, the I.I.A. property dictates that this ratio should remain the same irrespective of the

    choice set. Now, suppose a red bus, similar to the blue bus in all respects except the color, is

    introduced as a means of transport. Then, we would expect that consumers will be equally likely

    to choose a red or a blue bus. This equality together with the above equality will imply the

    following.

    P(choose car) = P(choose blue bus) = P(choose red bus) = 1/3. (24)

    This result is not appealing as consumers will mostly likely consider both bus types as

    one alternative. If this is the case, then it implies the following probabilities are more reasonable.

    P(choose car) = 1/2 ; P(choose red bus) = P(choose blue bus) = 1/4. (25)

    Thus, the I.I.A. property can constrain the probabilities in such a way that in some

    contexts, we can get results that are unrealistic. There are several ways of correcting this

  • 7/30/2019 Chapter_14 Advanced Regression Models

    24/49

    580

    problem. One alternative is to allow for a tree structure for consumer choice. We can achieve this

    with a nested-logit model (Ben-Akiva, 1973) that allows for correlation among the utilities of

    alternatives only within a nest. A second alternative is to allow for heterogeneity in customers

    parameters then, at the aggregate level, the IIA property disappears (see Chapter 19 this book). A

    third method is to allow for the brand utilities to be correlated as is done by the multinomial

    probit model. We discuss this last method later.

    Sampling of Alternatives

    In the above analysis, we just had four alternatives. There are many instances, however,

    where the number of alternatives can be much larger. For example, if retail store managers want

    to evaluate the effect of price and promotion at the UPC level rather than focusing at the brand

    level, then the number of alternatives can be in hundreds. In that case, evaluating the

    denominator (the sum of strengths of all alternatives) in the probability expression of choosing a

    particular alternative will be infeasible. One method for circumventing this problem is to sample

    a set of alternatives from the entire set of possible alternatives and then evaluate the probabilities.

    The following example illustrates this method.

    Suppose we wish to model consumers choosing a mutual fund from all available mutual

    funds. There are many mutual funds that consumers can choose from - the latest figures suggest

    that there are more than 8000 mutual funds in the US alone (Investment Company Institute,

    2005). We definitely cannot use all 8000 or more of these funds while evaluating the probability

    of choosing a specific one. What we can do is to randomly sample a small number of these

    mutual funds, for example 10, to form the set of alternatives. While sampling, for each

    observation we have to ensure that the mutual fund chosen for that observation is in the

    constructed set of alternatives (else how could the consumer have chosen that mutual fund if it

  • 7/30/2019 Chapter_14 Advanced Regression Models

    25/49

    581

    were not in her set of alternatives?). To ensure this, we include the mutual fund that was chosen

    on that observation and randomly sample 9 others from the rest of alternatives. We can then

    estimate the parameters of the model in exactly the same manner as described above in the

    Breakfast Foods example. The MLE parameter estimates from using a set of alternatives

    constructed from such a random sampling scheme will be exactly the same as those from using

    all the alternatives. For a more detailed description of sampling, look at Ben-Akiva and Lerman

    (1985).

    There are other methods for sampling of alternatives, such as importance sampling. This

    sort of sampling scheme is typically used when there is a need to over sample an alternative. For

    example, in the previous example of mutual funds, suppose we find that many consumers are

    choosing a few mutual funds then a sampling scheme should take these skewed choices into

    account when selecting samples.An importance sample scheme does exactly that. Here, while

    estimating the model, a correction factor is included to account for the non-random sampling.

    Train, Ben-Akiva, and Atherton (1989) show an application of such a sampling scheme in the

    context of consumers choosing long distance plans and minutes of consumption.

    Multinomial Probit Model

    There are several instances when there is a need to allow the utility errors of the

    alternatives to be correlated. For example, consumers typically choose between different modes

    of transport such as bus, car, train and others. They can also be using a combination of these

    alternatives for commuting e.g., a mix of car and train (Currim, 1982). In such a scenario, the

    errors in the utility of choosing a car, train and the alternative representing a combination of car

    and train can be correlated (i.e., cannot be assumed to be independent). Clearly, we need a model

    that is flexible enough to capture any possible correlation.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    26/49

    582

    A multinomial probit model allows for the utility errors to be correlated and have

    different variances (i.e., different scales for different alternatives). It also places several

    identification restrictions (Keane, 1992). We show these restrictions in the simplest setting a

    choice model with three alternatives. The utilities for the three alternatives are given as follows.

    U1t=1 + x1t1 + 1t

    U2t=2 + x2t2 + 2t

    U3t=3 + x3t3 + 3t (26)

    Here, we have intentionally separated the intercepts with the other covariates to show the

    identification conditions. Also note, we have assumed a general model where the parameters for

    the covariates are alternative-specific. The errors are assumed to have the following

    distributional specification.

    t

    t

    t

    3

    2

    1

    ~

    333231

    232221

    131211

    ,0N

    (27)

    As only differences in utilities matter, we can rewrite the above utilities in the following manner.

    Y1t = U1tU3t =(1-3) + (x1t1 -x3t3) + ( 1t - 3t)

    Y2t = U2tU3t =(2-3) + (x2t1 -x3t3) + ( 2t - 3t) (28)

    Let 1tbe ( 1t - 3t) and 2t be ( 2t - 3t) then the joint distributional specification is as

    follows.

    t

    t

    2

    1

    ~

    2221

    1211,0N

    (29)

  • 7/30/2019 Chapter_14 Advanced Regression Models

    27/49

    583

    We now state the identification conditions. First, note that the differences in the intercepts, (1-

    3) and (2-3), enter Y1t and Y2t. Thus, it is only these differences among the intercepts and not

    their absolute values that are estimable. We can, therefore, without loss of generality set 3 as 0.

    Second, unlike a linear regression where the dependent variable is observable, utilities are latent

    (i.e., unobservable) and we have to set its scale. We do this by setting one of the variances (11 or

    12) to 1. Let 11be set to 1 then, only 12 and 22 are estimable. Here, the parameter12 captures

    any correlation between the differenced utilities and, therefore, the IIA problem is no longer a

    concern. In empirical applications, the estimate of12 will suggest whether there is correlation

    present among the utilities. If in an application, the estimate is significantly different from zero

    then it implies that a multinomial logit model is inappropriate for that application as it does not

    allow for utilities to be correlated.

    Note that not all parameters of the original covariance matrix of the non-differenced

    utilities are identified. In general, if there are J alternatives then the original covariance matrix

    contains J*(J+1)/2 parameters. Of these, upon taking the difference of utilities and putting the

    identification conditions, only ((J-1)*J/2)-1 parameters are identified (Train, 2003). In the above

    formulation, we had 3 alternatives thus the original covariance matrix has (3)(4)/2 = 6

    parameters. Of these, only (3-1)3/2-1=2 parameters are identified.

    We now consider an application where a trinomial probit choice model is applied. This

    application is from Keane (1982). The application considers the employment choices of men and

    models three choices manufacturing (M), nonmanufacturing (NM) and unemployment. The

    data for this model is from a national longitudinal survey of men. Table 14.13 contains a

    description of the independent variables. In this application, the intercept for unemployment is

    set to zero and the variance for the utility for M is set to 1. Note that independent variables are

  • 7/30/2019 Chapter_14 Advanced Regression Models

    28/49

    584

    allowed to have different effects on M and NM. For example, the model allows education to

    have different effects on manufacturing and non-manufacturing. Finally, there are two sets of

    parameters one set of parameters is estimated when the correlation among the utilities (12) is

    set to 0 and the variance of NM is set to 1 (Model-1). The other set of parameters is attained

    when both the correlation and variance are estimated (Model-2).

    [Table 14.13 about here]

    The results show that Model-2 is marginally better than Model-1 in terms of the log-

    likelihood. Further, the correlation is positive and significantly different from zero. Also note

    that there are differences in the estimated parameters of Model-1 and Model-2. This implies that

    allowing for a correlation among the utilities clearly affects the estimation of other parameters in

    the model. The positive correlation also suggests that a multinomial logit model may not have

    been appropriate for this setting as it would have failed to capture this correlation.

    The multinomial probit model provides a flexible way of capturing the correlations that

    might be present among the utilities. This alleviates the IIA problem that is inherent in the

    multinomial logit model.

    Tobit Analysis

    Tobit models are a part of general class of models for analyzing censored data. These

    types of data are encountered when for a large number of observations, the dependent variable is

    clustered around a certain value (Tobin, 1958). For example, in a large scale study of the number

    of hours that married women work, it was found that about 66% of respondents reported zero

    hours (Greene & Quester, 1982). We will show that analyzing such censored data without

    accounting for censoring will always lead to biased estimates.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    29/49

    585

    There are many other scenarios where such a censoring is observed. For example, in

    grocery settings consumers either dont purchase a brand or have positive quantity (Jedidi,

    Ramaswamy, & Desarbo, 1993; Tellis, 1988). Technically then, any demand modeling must

    employ a tobit modeling framework as quantity is inherently non-negative (i.e., is censored) and

    has a cut off at zero. There are several ways of modeling such a demand situation. If the focus is

    on modeling the demand of a single alternative then a censored regression is typically the chosen

    method. We will show an example of this methodology. If, however, the focus is on modeling

    both the choice of an alternative and quantity demanded subsequent to the choice, then a two

    stage regression is usually adopted (Tellis, 1988). In this framework, the choice of alternative

    and quantity demanded are assumed to be interconnected i.e. the errors in the utility of

    alternatives are correlated with the error in the demand model. This correlation captures any

    selectivity bias (Heckman, 1979). For example, consumers may buy more of their preferred

    brand but less of a brand that is chosen on a promotion.

    We now illustrate a censored regression analysis. In general, a censored regression can be

    expressed as follows.

    (30)

    Here, for an observation t, the random variable qt* is a partially observable variable. The error, t

    is normally distributed, N(0, 2). The observed value of this variable is qt , the quantity observed

    for observation t, when it is greater than zero. The observed value is zero if q t* is less than zero.

    In other words,

    (31)

    t.t x*t

    q +=

    =

    0qif0

    0qifqq

    *t

    *t

    *t

    t

  • 7/30/2019 Chapter_14 Advanced Regression Models

    30/49

    586

    The expected value of qt is E(qt) = E(qt|qt* > 0) P(qt* > 0). Thus,

    )xE(x)0*

    t

    q|

    t

    E(q tt t>+=>

    +=

    t

    tx

    x

    x

    t

    (32)

    Here, the ( .) and ( . ) are the density and the cumulative distribution function respectively of

    the standard normal distribution. The above equation is similar to a standard OLS model with an

    additional term that corrects for censoring. We can estimate the above model together with the

    correction factor to yield unbiased estimates. Notice, that if we did not include the correction

    factor then a regular OLS estimation will lead to biased estimates due to the omitted variable.

    Bomberger (1993) used the above censored regression model to estimate the impact of

    income and wealth on household deposits. There were 4262 households in the dataset out of

    which 290 households had no deposits. Therefore, for these households the dependent variable is

    censored at zero.

    Bomberger estimates the above model and compare the results with an OLS regression.

    Table 14.14 shows these results. We can make several observations from these estimates. First,

    the intercept has very different value in the two equations. Second, we find that wealth is

    marginally significant is the tobit model while it is non-significant in the OLS regression. This

    implies that a failure to properly model the censoring can alter both the sign and significance of

    the estimates.

    [Table 14.4 about here]

  • 7/30/2019 Chapter_14 Advanced Regression Models

    31/49

    587

    Conclusions

    In this chapter, we discussed several methods that are applicable in scenarios wherein the

    dependent variable is either discrete (e.g. choice of brand) or constrained in such a manner (e.g.

    market share) that a linear regression with OLS estimation fails to be the best alternative.

    We began the chapter with a discussion on discriminant analysis. We showed that this

    method is applicable for a discrete dependent variable (predetermined groups). In this context,

    we also showed how to determine the independent variables that best discriminate among groups

    and how to calculate their relative importance for discrimination.

    Next, we discussed logisitic regression. We showed that this method is suitable for both

    binary discrete dependent variables (e.g. buy/ no buy situation) and dependent variables that are

    between 0-1 (e.g. market share). Thus, this method is applicable for a wider set of situations than

    a two-group discriminant analysis.

    We extended the logistic regression model to multinomial choice models that are suitable

    for scenarios with a dependent variable that can take multiple values. In this context, we showed

    the multinomial logit and probit models. The former model is the most frequently used choice

    model as it provides closed form probability expressions. It has a few limitations it suffers

    from the IIA property and the elasticity expressions are constrained to show a particular

    substitution pattern among alternatives (e.g. own price elasticity is smaller for higher share

    brands). The multinomial probit model alleviates the IIA problem but at the expense of closed

    form probability expressions. We noted that for applications where the unobserved factors

    affecting the available alternatives are correlated (e.g. the red bus / blue bus problem) then a

    multinomial probit model is more appropriate than a multinomial logit model.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    32/49

    588

    We ended the chapter with a discussion of censored regression models or tobit models.

    These models are a combination of a binary probit and a multiple regression and are applicable

    in a wide range of scenarios where there is censoring of the data (e.g. demand of a good).

  • 7/30/2019 Chapter_14 Advanced Regression Models

    33/49

    589

    ENDNOTES

    1This is a synthetic dataset. For generating this data, we set the sensitivities to Recency,

    Frequency and Monetary at 2.3, 0.3 and 0.1 respectively.

    2We use brand and alternative interchangeably in this example. Here the four alternatives

    correspond to four different brands.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    34/49

    590

    REFERENCES

    Bellman, S., Lohse, G. L., & Johnson, E. J. (1999). Predictors of online buying behavior.

    Communications of the ACM, 42(12), 32-38.

    Ben-Akiva, M. (1973). Structure of Passenger Travel Demand Models, Ph.D. Dissertation.

    Department of Civil Engineering, MIT, Cambridge, MA.

    -----, & Lerman, S. (1985).Discrete choice analysis: Theory and application to travel demand.

    Cambridge, MA: MIT Press.

    Bomberger, W. A. (1993). Income, wealth and household demand for deposits. The American

    Economic Review, 84(4), 1034-1044.

    Currim, I. S. (1982). Predictive testing of consumer choice models not subject to independence

    of irrelevant alternatives. Journal of Marketing Research, 19, 208-222.

    Draganska, M., & Jain, D. (2005). Product line length as a competitive tool. Journal of

    Economics and Management Strategy, 14(1), 1-28.

    Greene, W. H., & Quester, A. (1982). Divorce risk and wives labor supply behavior. Social

    Science Quarterly, 63, 16-27.

    Gupta, S. (1988). Impact of sales promotions on when, what, and how much to buy.Journal of

    Marketing Research, 25, 342-355.

    Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 46, 931-961.

    Investment Company Institute (2005).ICI Factbook. Retrieved October 16, 2005, from

    http://www.ici.org/.

    Jedidi, K., Ramaswamy, V., & DeSarbo, W. S. (1993). A maximum likelihood method for latent

    class regression involving a censored dependent variable, Psychometrika, 58(3), 375-394.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    35/49

    591

    Johnson, R. A. & Wichern, D. W. (2002). Applied multivariate statistical analysis. Upper Saddle

    River, NJ: Prentice Hall.

    Keane, M. P. (1992). A note on identification in the multinomial probit model.Journal of

    Business & Economic Statistics, 10(2), 193-200.

    Lehmann, D.R., Gupta, S., & Steckel, J. H. (1998).Marketing Research. New York: Addison-

    Wesley.

    Tellis, G. J. (1988). Advertising exposure, loyalty and brand purchase: A two-stage model of

    choice.Journal of Marketing Research, 25, 134-144.

    Theil, H. (1971). Principles of econometrics. New York: Wiley.

    Tobin, J. (1958). Estimation of relationship for limited dependent variables.Econometrica, 26,

    24-36.

    Train, K. (2003). Discrete choice models with simulation. Cambridge, MA: Cambridge

    University Press.

    -----, Ben-Akiva, M., & Atherton T. (1989). Consumption patterns and self-selecting tariffs.

    Review of Economics and Statistics,71(1), 62-73.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    36/49

    592

    Table 14.1

    Southern Versus Non-Southern States

    Variable Means Discriminant FunctionVariable South Non-South One-WayF

    Un-standardized

    Standardized

    Average IncomePopulationPopulation ChangePercent UrbanTax Per CapitaGovernment Expen.College Enrollment

    Mineral ProductionForest AcresManuf. OutputFarm Receipts

    4.954.451.3757.00464.13286.13165.20

    2006.2715.936.77

    1801.73

    5.914.191.1958.37618.23281.54192.45

    610.4914.708.66

    1943.37

    17.130.050.300.0326.970.000.14

    6.840.050.400.06

    -0.430.92-0.370.02-0.010.00-0.01

    0.000.01-0.27-0.00

    -0.324.15-0.390.33-0.870.75-2.03

    0.030.15-2.65-0.40

    Chi-SquareDegrees of FreedomCanonical CorrelationWilks Lambda

    42.71110.800.36

    Source: Lehmann, Gupta and Steckel,Marketing Research. Page 668 (Addison-Wesley EducationalPublishers Inc., 1998)

    Table 14.2

    Hit Miss Table

    Predicted Group

    South Non-SouthActual GroupSouth

    Non-South

    14

    3

    1

    32

    Source: Lehmann, Gupta and Steckel,Marketing Research.Page 668 (Addison-Wesley Educational Publishers Inc., 1998)

  • 7/30/2019 Chapter_14 Advanced Regression Models

    37/49

    593

    Table 14.3

    Averages for the Five Food Expenditure Groups

    Group

    Variables1

    < $152

    $15-$293

    $30-$444

    $45-$595

    > $60

    Education of wifeEducation of husbandAgeIncomeFamily sizeHow often they shopNumber of brands

    shopped forInformation soughtSample size

    3.322.794.091.622.091.911.82

    1.9134

    4.113.753.462.062.522.182.25

    1.91284

    4.294.083.062.753.132.272.34

    1.81293

    4.474.572.503.474.142.292.25

    1.84181

    4.494.692.723.755.112.622.72

    1.8761

    Source: Lehmann, Gupta and Steckel,Marketing Research. Page 670 (Addison-Wesley EducationPublishers Inc., 1998)

    Table 14.4

    Discriminant Functions

    Unstandardized Coeff. Standardized Coeff.

    Variables1 2 3 4 1 2 3 4

    Education of wifeAgeIncome

    Family sizeHow often they shopNumber of brandsshopped forConstant

    0.02-0.01-0.25

    -0.58-0.29-0.01

    3.19

    0.010.55-0.29

    0.400.350.26

    -3.62

    -0.560.200.21

    0.21-0.80-0.28

    2.93

    0.42-0.13-0.62

    0.38-0.25-0.30

    0.36

    0.02-0.01-0.41

    -0.77-0.20-0.01

    -

    0.010.81-0.43

    0.560.240.37

    -

    -0.700.290.30

    0.29-0.58-0.37

    -

    0.52-0.20-0.89

    0.53-0.18-0.43

    -

    Source: Lehmann, Gupta and Steckel,Marketing Research. Page 687 (Addison-Wesley EducationPublishers Inc., 1998)

  • 7/30/2019 Chapter_14 Advanced Regression Models

    38/49

    594

    Table 14.5

    Means of Groups

    FunctionsGroups 1 2 3 4

    12345

    1.030.590.04-0.71-1.44

    0.160.06-0.06-0.190.47

    0.65-0.06-0.070.09-0.01

    -0.060.05-0.070.05-0.01

    Source: Lehmann, Gupta and Steckel,Marketing Research.Page 687 (Addison-Wesley Education Publishers Inc., 1998)

    Table 14.6

    Hit Miss Table (Multiple Discriminant Analysis)

    Predicted Group

    Actual Group

    1 2 3 4 5

    1

    2

    3

    4

    5

    20

    86

    50

    7

    2

    13

    106

    65

    7

    1

    1

    59

    90

    33

    6

    0

    24

    57

    84

    12

    0

    9

    31

    50

    40

    Source: Lehmann, Gupta and Steckel,Marketing Research.Page 687 (Addison-Wesley Education Publishers Inc., 1998)

  • 7/30/2019 Chapter_14 Advanced Regression Models

    39/49

    595

    Table 14.7

    Summary Statistics for the RFM Data

    Variable Mean Std. Dev.Recency

    Frequency

    Monetary

    Choice

    3.87

    7.80

    73.14

    0.45

    2.06

    2.74

    28.87

    Table 14.8

    Parameter Estimates for OLS Regression

    Variable Estimate Std. Error

    Intercept*

    Recency*

    Frequency

    Monetary*

    -0.92

    0.15

    0.02

    0.01

    0.13

    0.01

    0.01

    0.001

    R2

    Adjusted R2

    0.61

    0.60*Significant at the 0.05 significance level.

  • 7/30/2019 Chapter_14 Advanced Regression Models

    40/49

    596

    Table 14.9

    Parameter Estimates for Logistic Regression

    Intercept Only Model Full Model

    Variable Estimate Std. Error Estimate Std. Error

    Intercept

    Recency

    Frequency

    Monetary

    -0.20

    -

    -

    -

    0.21

    -

    -

    -

    -30.29*

    3.34*

    0.59*

    0.17*

    8.55

    0.93

    0.24

    0.05

    -2LL 137.628 30.489

    *Significant at the 0.05 level.

    Table 14.10

    Summary Statistics for Breakfast Foods Data

    Brand Average Price($)

    Promotion Market Share(%)

    Brand 1

    Brand 2

    Brand 3

    Brand 4

    1.75

    1.58

    1.91

    1.94

    0.07

    0.04

    0.09

    0.01

    22.19

    17.18

    48.36

    12.28

  • 7/30/2019 Chapter_14 Advanced Regression Models

    41/49

    597

    Table 14.11

    Parameters estimates for Multinomial Logit Model

    Intercepts Only Model Full Model

    Variable Estimate Std. Error Estimate Std. Error

    Intercept_Brand1

    Intercept_Brand2

    Intercept_Brand3

    Price

    Promotion

    0.59

    0.34

    1.37

    -

    -

    0.08

    0.09

    0.07

    -

    -

    0.06

    -0.64

    1.91

    -3.03

    0.44

    0.11

    0.11

    0.10

    0.11

    0.14

    -2LL

    Hit Rate

    4321.88

    48.3 %

    3033.92

    63.2 %

  • 7/30/2019 Chapter_14 Advanced Regression Models

    42/49

    598

    Table 14.12

    Price Elasticities from the Full Multinomial Logit Model

    Brand 1 2 3 4

    1

    2

    3

    4

    -4.47

    0.67

    2.64

    0.53

    0.85

    -4.14

    2.64

    0.53

    0.85

    0.67

    -3.17

    0.53

    0.85

    0.67

    2.64

    -5.37

    Change inprice of j

    Change in probability of k

  • 7/30/2019 Chapter_14 Advanced Regression Models

    43/49

    599

    Table 14.13

    Parameters estimates for Multinomial Probit Model

    Model 1 Model 2

    Variable M NM M NM

    Non labor income

    Unemployment rate

    Time trend

    Years of education

    Labor experience

    Square of Exper.

    Dummy for race

    Dummy for marriage

    Number of kids

    Intercept

    0.01(0.01)-0.08(0.01)-0.02(0.01)0.01(0.01)0.02

    (0.01)-0.01(0.00)0.10(0.05)0.47(0.04)0.12(0.02)-0.06(0.14)

    -0.05(0.01)-0.05(0.01)0.05(0.01)0.11(0.01)-0.03

    (0.01)0.00(0.01)0.09(0.05)0.95(0.09)-0.18(0.03)-0.13(0.12)

    0.00(0.01)-0.09(0.02)-0.01(0.01)0.03(0.01)0.01

    (0.01)0.00(0.01)0.15(0.06)0.51(0.07)0.09(0.03)0.46(0.18)

    -0.03(0.03)-0.08(0.02)0.04(0.03)0.10(0.06)-0.02

    (0.02)0.00(0.01)0.14(0.07)0.91(0.39)-0.11(0.12)0.31(0.35)

    Correlation

    Variance

    LL

    0.00 (fixed)

    1.00 (fixed)

    -10,300.71

    0.64(0.37)1.16

    (0.58)-10,299.65

    Source:Keane, M. P. (1992). A Note on Identification in the Multinomial Probit Model. Journal ofBusiness & Economic Statistics, 10, 2, Page 199

  • 7/30/2019 Chapter_14 Advanced Regression Models

    44/49

    600

    Table 14.14

    Parameters estimates for OLS and Tobit Analysis

    Variable OLS TobitEstimate Std. Error Estimate Std. Error

    Intercept

    Income

    Wealth

    -698.6

    0.1015

    -0.00001

    1036.10

    0.0065

    0.0004

    -9733

    0.145

    0.0002

    3033

    0.002

    0.0001

    Source: Bomberger, W. A. (1993). Income, Wealth and Household Demand for Deposits.The American Economic Review. 84, 4, Page 1038.

    Figure 14.1

    Purchasers and Non-purchasers Versus Age and Income

    1

    2

    3

    4

    5

    6

    1 2 3 4 5 6

    Age

    Income

    P

    P

    PP

    P

    P

    P

    P

    P

    PP

    P

    P

    NN

    N

    N

    NN

    N

    NN

    N N

    NNN

  • 7/30/2019 Chapter_14 Advanced Regression Models

    45/49

    601

    Figure 14.2

    Group Means on First Two Discriminant Functions

    Group Means

    -0.7

    -0.5

    -0.3

    -0.1

    0.1

    0.3

    0.5

    0.7

    -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

    Function 1

    Function2

    12

    3

    4

    5

  • 7/30/2019 Chapter_14 Advanced Regression Models

    46/49

    602

    Figure 14.3

    OLS Predictions Versus Actual Choice

    -0.5

    0

    0.5

    1

    1.5

    2

    0 20 40 60 80 100 120

    Predictions

    Choice

  • 7/30/2019 Chapter_14 Advanced Regression Models

    47/49

    603

    Figure 14.4

    Logistic Regression Predictions Versus Actual Choice

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    0 20 40 60 80 100 120

    Predictions

    Choice

  • 7/30/2019 Chapter_14 Advanced Regression Models

    48/49

    604

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

    Frequency

    Probabilityo

    fpurchase

    Figure 14.5

    Probability of Purchase Versus Frequency

  • 7/30/2019 Chapter_14 Advanced Regression Models

    49/49

    5 10 15 20 25 30

    Week

    0.2

    0.4

    0.6

    0.8

    1

    tek

    raM

    erahS

    0.25

    0.5

    0.75

    1

    ecirP

    FDC

    FDC

    FD

    FDC

    FDC

    FD

    FD

    Figure 14.6

    Variation in Market Share with changes in Marketing Mix

    F = Feature, D = Display, C = Store Coupon