chapter_14 advanced regression models

7/30/2019 Chapter_14 Advanced Regression Models

1/49

557

Chapter 14

ADVANCED REGRESSION MODELS

Raghuram Iyengar, University of Pennsylvania

Sunil Gupta, Columbia University

Introduction

The previous chapter covered the basics of the powerful and yet simple technique of

Ordinary Least Squares (OLS). It was noted that the mathematical relationship between the

dependent variable for an observation yt at time t and a vector of independent variables xt can be

written in the following manner.

yt = xt + t (1)

Here, xt is the transpose of vector xt and is a vector of parameters. Also, yt is continuous from -

to and t is the random error that is typically assumed to be normally distributed.

Several scenarios fit the assumption of a continuous dependent variable that ranges from

- to . In cases when yt is strictly positive (e.g. sales), we can transform it as ln(yt) to make it

lie between - to and continue to use OLS. But what happens if the dependent variableis

discrete (e.g. buy / no buy) or choice of a brand (e.g., Brand A, B, C or D) and we want to

analyze the effect of brand prices on these decisions? The purpose of this chapter is to show

methods that can be used in such scenarios.

We begin with Discriminant Analysis. This is followed by a discussion of logistic

regression and the multinomial logit model. Thereafter, we focus on the multinomial probit

model. The chapter ends with a discussion on Tobit models.

Discriminant Analysis


2/49

558

Consider the following example where a dependent variable is binary a buy / no buy

decision. A company that has introduced a product in the market wishes to describe the people

that are buying its product. Figure 14.1 shows the demographic information that the company has

together with the purchasers (P) and non-purchasers (N). The figure suggests that purchasers of

this product are older and richer. Thus, age and income discriminate among the purchasers and

non-purchasers. However, it is not clear which of the two variables is more important and how

we can predict a new person to be a purchaser or non-purchaser based on his/her income and age.

Such questions can be answered by using discriminant analysis.

[Figure 14.1 about here]

Discriminant analysis is a method to analyze which independent variables discriminate

among groups and to classify observations into predetermined groups based on these variables.

These predetermined groups can be either binary (eg., buy or no buy) or more than two. In the

latter case, the analysis is termed as multiple discriminant analysis. For the sake of simplicity, we

begin with a two-group discriminant analysis.

In a discriminant analysis, an index is built using the measured characteristics as the

independent variables. Thus for an observation at time t,

ft = x1t1 + x2t2 + x3t3 + + xKtK= xt (2)

Here, ft is the index. It is also called the discriminant function. There are K measured

characteristics (x1t, x2t, ..., xKt). The vector xt is the transpose of vector xt that contains these K

variables. There are also K parameters (1, 2,, K), which are the weights corresponding to

these variables. These weights are also termed as the discriminant coefficients.

The goal of discriminant analysis is to estimate weights such that the index values for the

two groups are as far as possible. In other words, the weights are derived such that the variation


3/49

559

in f scores between the two groups is as large as possible, while the variation in the f scores

within the groups is as small as possible. That is, the weights are derived so that the following

ratio is maximized.

VariationGroup-Within

VariationGroup-Between(3)

Maximizing the above ratio makes the two groups as distinct as possible with respect to the

index values. More mathematically oriented readers can see Chapter 11 of Johnson and Wichern

(2002) for a description of how the above quantity is maximized.

Discriminant analysis is related to and yet is distinct from linear regression. In both

methods, there is a weighted linear combination of independent variables that is used to predict a

dependent variable. Also, like linear regression, discriminant analysis suffers from

multicollinearity of the independent variables. The primary difference between the two methods

is that in a linear regression, the dependent variable is typically assumed to range from - to

whereas in a discriminant analysis, the dependent variable is group membership i.e., is discrete.

For an application where the group membership is in two groups, a linear regression can be run

with a dummy variable representing group membership as the dependent variable. The estimates

from such a regression will be proportional to the weights that are obtained from a discriminant

analysis. When the number of groups is, however, greater than two, then a regression will not

yield the same results.

Discriminant analysis is also different from cluster analysis (see Chapter 18, this book).

In discriminant analysis, the groups are predetermined and the analysis is focused on which

variables best discriminate among these groups. In a cluster analysis, the group memberships are

unknown and the focus of the analysis is to form these groups.


4/49

560

Consider the following example of a two-group discriminant analysis. Table 14.1

contains the data on the fifty US states and they are broken down into two groups 15 states that

are South and 35 that are Non-South (Lehmann, Gupta, & Steckel, 1997). These groups are

compared on observable characteristics such as income, population and others. A univariate F-

test compares the differences in means across the two groups on each of the independent

variables. The big differences between the two groups appear to be in income, tax per capita and

mineral production.

A discriminant analysis was run and Table 14.1 contains the discriminant coefficients.

These are the weights of the independent variables. Another column shows standardized

discriminant coefficients. These coefficients are similar to the standardized regression

coefficients in an OLS regression. They correct for any scale issues associated with the

independent variables. We can calculate these coefficients by first standardizing the independent

variables and then running a discriminant analysis or by first running a discriminant analysis and

then multiplying each discriminant coefficient by the standard deviation of the respective

independent variable. Both methods yield standardized coefficients and these can be used to

ascertain how a change of one standard deviation in each independent variable will affect the

discriminant function.

[Table 14.1 about here]

From the estimated unstandardized coefficients, we find that population is most important

variable for discrimination followed by average income. Upon standardizing the variables, we

observe a different set of variables that are important. We find that while population is still the

most important, college enrollment and manufacturing output are clearly more relevant for


5/49

561

discrimination among the states than is average income. Thus, a failure to account for differences

in scale can lead to erroneous conclusions about the relative importance of variables.

Measures of Fit

There are several measures of fit that are used to analyze how good is the model for

discrimination.

Chi-Squared Value

A Chi-Squared value tests whether overall the variables help discriminate among the two

groups. This is very similar to the F-test for overall significance in a regression setting. Here, the

Chi-Squared value is 42.71. For testing the significance, we look at the critical value for 11

degrees of freedom (the number of independent variables). This value is 31.3 at the 0.001 level.

Thus, the variables clearly help in discrimination.

Canonical Correlation

The canonical correlation is the correlation resulting from a regression of the independent

variables on a dummy dependent variable. Its squared value is the R2from this regression. In this

example, the canonical correlation is 0.80. Thus, the R2 is 0.64.

Wilks Lambda

Wilks Lambda is the ratio of within-group variance to the total variance. Here, it is

essentially 1- R2. Thus, the Wilks Lamda is 1-0.64 = 0.36.

The Hit-Miss Table

The Hit-Miss table provides an indication of how good is the discriminant function in

classifying observations. Table 14.2 is such a hit-miss table. Here we find that 32 out of the 35

non south states and 14 out of the 15 south states are correctly classified. Thus, the overall

classification rate is (32+ 14)/ 50 i.e., 92 %.


6/49

562


Multiple Discriminant Analysis

A multiple discriminant analysis is carried out when the observations are preclassified

into more than two groups. The basic idea is first to find a single function that spreads all groups

as far apart as possible. Then, a second function is found that best explains any differences

among groups and so on. If there are K groups, then K-1 discriminant functions are found.

To illustrate multiple discriminant analysis, we consider an example described in

Lehmann, Gupta, and Steckel (1997). In this example, there are five groups of consumers

depending on how much they spend in dollars on their monthly expenditure for food. Table 14.3

shows the five groups together with the averages of the independent variables. The means appear

to indicate the larger spenders are more educated, are younger, have higher incomes, have larger

family sizes and shop more extensively. Table 14.3 also shows F tests for the variables for the

significance of differences among the five groups. These tests suggest that family size and

income are the most important (i.e. have the highest F value).


A discriminant analysis is run. In the analysis, a few variables are dropped as they do not

contribute to discrimination among the groups. We then obtain the standardized and

unstandardized discriminant coefficients. As there are 5 groups, we have 5-1=4 discriminant

functions. Table 14.4 shows the unstandardized and standardized coefficients. The discriminant

functions are ranked according to their usefulness for discrimination. In other words, the first

function is the most important for discriminating amongst the five groups; the second one is the

second most important and so on.



7/49

563

From the results on the standardized coefficients, we find that the most important

variables in the first function are family size, income and how often they shop. The second

function is related to age and family size. Table 14.5 gives the group means for the groups based

on the four discriminant functions. Figure 14.2 plots these means for the first and the second

functions. We can see that there is a big spread of the means of the five groups along the

horizontal axis (Function-1) and less so along the vertical axis (Function 2).



Measure of Fit

As a measure of fit of the model, we use the hit-miss table. Table 14.6 is such a hit miss

table for the five categories. From the results, we see that the overall classification rate is

(20+106+90+84+40)/ (34+284+293+181+61) *100 = 39.86 %.


Discriminant analysis rests on two statistical assumptions. One, the independent variables

are assumed to be jointly normally distributed and two, the covariances are assumed to be the

same across all groups. When these assumptions are violated then the statistical interpretation of

the results becomes very difficult. For instance, while in practice, dummy variables are

frequently used as independent variables, in theory it is a problem. This is because if a dummy

independent variable is used then the independent variables are not normally distributed. To

alleviate such statistical difficulties, the method of logistic regression is used. We motivate this

method with a managerial problem that all direct marketers face.

Logistic Regression


8/49

564

Catalog companies regularly keep track of Recency, Frequency and Monetary (RFM)

variables. There is an interest in relating these RFM measures to purchase behavior a buy / no

buy decision. These measures can then be used for predicting purchase and for making any

strategic intervention decisions to increase retention. Table 14.7 contains the summary statistics

of such a data where Recency is measured in months since last purchase and Monetary is in

dollar amount. Choice is a variable which takes a value 1 if a consumer made a purchase and

0 if she did not.1


One strategy for estimating the relationship between choice and the RFM measures

would be to use an OLS with choice as the dependent variable (yt) and RFM measures as the

independent variables (xt). Table 14.8 shows the results for the OLS regression. The results

suggest that Recency and Monetary are significant whereas Frequency is not. Further, the R2 is

about 0.61 and the adjusted R2is around 0.60. Despite the high R

2, OLS is not appropriate for

several reasons. Figure 14.3 plots the predictions of Choice and the true value of Choice for the

100 data points. We see that there are instances where the predictions for choice are either less

than zero or greater than one! While this is not surprising given that OLS assumes that the

dependent variable is continuous between - to , in the current context these predictions are

clearly inconsistent with the data. For instance, how do we interpret a prediction of 1.32 and

compare it with a prediction of 1.82? Are both indicating a purchase decision i.e. should we

assume both are just 1 (buy)? Similarly, it is not clear how to interpret a prediction of -0.18 when

a value of 0 reflects no purchase. This example shows that when an assumption of the OLS

technique (in this case, the continuous distribution of the dependent variable) is violated, its

results cannot be interpreted.


9/49

565



The dependent variable in the above example is discrete. Such choice scenarios are

extremely common. For instance, pharmaceutical companies are interested in predicting whether

a physician would prescribe their drug or not and the factors that might increase the prescription

rate. Similarly, managers in industries with an online presence are interested in identifying

factors that can predict which consumers will purchase online (Bellman, Lohse, & Johnson,

1999). While these questions can also be addressed by discriminant analysis, there are other

scenarios such as when the dependent variableis market share (i.e. lies between 0 and 1) and we

want to quantify the effect of price and promotions on it, which needs a different method that can

accommodate such responses.

Model for Logistic Regression

A logistic regression analysis begins with a dependent variable, which is either discrete

(eg. buy / no buy) or lies between 0 and 1 (eg. market share). If we are modeling a discrete

decision such as buy / no buy then we specify the probabilities of the two possible events i.e.

P(Buy) and P(No Buy). As P(Buy) and P(No Buy) are probabilities, they are between 0 and 1

and they should sum up to 1. Next, we revisit the example with the discrete choice (buy / no buy)

and RFM measures that we discussed earlier. We then briefly discuss how the same framework

can be applied for analyzing market shares.

In the RFM example, the two events are purchase and no purchase. Using the measures

of P(Buy) and P(No Buy), we can specify the odds of buying as

P(Buy))(1

P(Buy)

=

P(NoBuy)

P(Buy)Odds(Buy) = (4)


10/49

566

The odds of buying are constrained between 0 and + and take a value 1 if both outcomes are

equally likely i.e. P(Buy) = 0.5 and P(No Buy) = 0.5. We can make the odds lie between - and

+ by taking the natural log transform. Thus,

)P(NoBuy)

P(Buy)Log(uy))Log(Odds(B = (5)

As log odds lie between - and +, we can relate it to any independent variables and

interpret the effects of the variables in a manner similar to that in OLS; only now the effect of the

variables would be on the log odds of the dependent variable. Thus, we can write the following

equation relating the log odds of purchase for an observation t with the independent variables

(xt) as.

Log(Odds(Buy))t = xt (6)

This can be rewritten as

)).xexp(1/(1P(Buy) tt += (7)

Recall that P(Buy) is the probability of a purchase and hence should always be between 0

and 1. The above expression ensures that this will be the case irrespective of the values of the

covariates.

We can now use the above model for our example. Table 14.9 shows the results of two

logistic regression models using Maximum Likelihood Estimation (MLE) the intercept only

model, where xi contains only the intercept and the full model, where xi contains the intercept

and the RFM variables. The results of the full model show that the RFM variables are significant.

Further, an increase of 1 month in Recency causes an increase of 3.34 in the log odds of Buying.

We can also calculate the effect on the odds of buying. This would be exp(3.34) or 28.28 i.e., the

effect of increasing the Recency by 1 month increases the odds of buying by 28.28. A similar


11/49

567

analysis can be done for the other variables. Note that the RFM estimates are close to the true

values of the sensitivities (see Footnote 1). Also note that the frequency sensitivity is significant

in this analysis while it was not so using OLS. Thus, OLS can mask the true relationship between

variables and its results can lead to erroneous interpretations for cases when the dependent

variable is not continuous.


Figure 14.4 plots the predicted probabilities of Buying with the true value of Choice.

Notice that, in contrast to the predictions of the OLS regression (Figure 14.3), all predictions lie

between 0 and 1. Also, unlike the case of the OLS regression, a higher predicted value has the

interpretation of a higher probability of purchase. To see how this probability of purchase varies

with a change in one of the covariates, see Figure 14.5. In this figure, we plot the predicted

probability of purchase with change in frequency. For generating this figure, we fixed the

recency and monetary variables at their average values. The figure shows that probability of

purchase has an S shape curve when the frequency increases.



In the above example, we modeled the purchase decision and then related it to RFM

measures. As the purchase variable was discrete, we specified the probability of purchase and no

purchase measures that lie between 0 and 1. We then specified the odds of purchase and took a

log transform to make it lie between - and +. We can apply the above framework to analyze

market shares (MS) as well. For instance, a brand manager might want to quantify the effect of

the region-specific prices and promotions on the market share in these regions. In this case, we


12/49

568

can begin the analysis by directly specifying the odds of market share since it already lies

between 0 and 1. Thus,

'ttt x)

MS1

MSLog(S))Log(Odds(M =

= (8)

Here, for a region t, the vector xt will contain the prices and promotions for that region.

Measures of Fit

There are several measures of model fit that are used for testing the suitability of logistic

regression models. Most of these measures are based around the log-likelihood measure, which

is as follows.

LL() = t

)Ln(Lt (9)

Here, is the entire set of MLE parameters (intercept and the other explanatory variables).

Likelihood Ratio Test

The most commonly used likelihood ratio test has the following test statistic:

-2(LL(C ) - LL()) (10)

Here, LL(C ) refers to the likelihood of the data when only an intercept model is run. Suppose

there are K covariates in the model (including the intercept) then the above statistic is distributed

2 with K-1 degrees of freedom (Theil, 1971). Thus, the test statistic measures whether the

increase in the likelihood caused by the inclusion of the explanatory variables (over and above

the intercept) is significantly better than the likelihood from a model containing only the

intercept.


13/49

569

In our example, -2 LL() is 30.489 while -2LL(C ) is 137.628. Thus, the test statistic

takes a value of 107.139. The degrees of freedom are 4-1=3. The critical value of a 2 with 3

degrees of freedom at the 0.001 level is 16.26. Thus, the likelihood of a model that has the RFM

measures is significantly better than a model with just the intercept.

Akaike Information Criterion (AIC)

AIC provides a way of adjusting the log-likelihood of a model for the number of

parameters in the model. This adjustment corrects for over fitting of the data. The expression for

this statistic is as follows.

AIC = -2 LL() + 2K (11)

Here, K is the dimension of . Lower values of AIC denote a better model. Thus, a model with

very large number of variables might have a low likelihood but it will also be penalized for the

number of variables.

In our example, we can calculate the AIC with the intercept only model (AICint) and the

AIC associated with a model containing the intercept and RFM measures (AICfull). These are as

follows.

AICint = 137.628 + 2(1) = 139.628, (12a)

AICfull = 30.489 + 2(4) = 38.489. (12b)

Thus, the full model has a better (i.e. lower) AIC as compared to the intercept only model.

Likelihood Ratio Index (2)

The likelihood ratio index is similar to the R2 in the regular regression models. It is

described as follows.


14/49

570

2

= 1 LL() /LL(C ) (13)

Here, LL() is -15.24 (= -30.48/2) and LL( C ) is 68.81 (= 137.628/2). Thus, the value of2 is

0.78.

As the R2, the 2 of a model will always increase or atleast stay the same when new

variables are added. There is another statistic, the adjusted likelihood ration index ( 2 ) that

penalizes for the increase in the number of parameters. This statistic is similar to the adjusted R2 .

2 = 1- (LL() -K)/(LL(C )-1) (14)

In our example, this statistic will be the following.

2 = 1- (15.24+4)/(68.81+1) = 0.72 (15)

Hit Rate

Another measure that is typically used to test the fit of a model is the hit rate. For

computing this measure, we take the predicted probabilities of the events from the logistic

regression and employ a cut off value for making discrete predictions for the occurrence of an

event. We then compare the predicted events with the actual events to determine the percentage

of times in the dataset the two are the same.

In our example, the two events are buy / no buy. The results from the logistic regression

estimation provide the probability of purchase. We put a cut-off at 0.5 i.e. for an observation if

the predicted probability of purchase is above 0.5, then we predict a purchase for that

observation else we predict a no purchase. We then compare these predictions with actual events.

We find that, using the full model, we correctly classify 94 out of the 100 observations. Thus, the

hit rate is 94 %.

The measure of hit rate as a statistic for model accuracy has a few limitations. First, the

cutoff is arbitrary. Here we took a cut off of 0.5. We could have chosen any other cutoff value as


15/49

571

well. Second, the hit rate is not very useful when the data is skewed. Suppose we have a dataset

where there are many observations with no purchase and few observations with purchase. Then a

model that predicts no purchase for all observations will do well on the hit rate.

In most applications, the data is also typically split into a calibration sample and a hold

out sample. The model is estimated on the calibration sample and then is used to predict the

observations in the hold out sample. Almost always, the hit rate within the hold out sample is

lower than the hit rate within the calibration sample.

Thus far, we have considered instances when the dependent variable is binary (or is

between 0-1, e.g. market share) and logistic regression is readily applicable. There are also

scenarios where the dependent variable can take multiple values. For instance, in the

antihistamine category, there are 4 major drugs - Claritin, Zyrtec, Allegra and Clarinex. A doctor

might prescribe one of these drugs to a patient. It is of much interest to pharmaceutical

companies to quantify the factors which can predict when a doctor is most likely to prescribe

their drug. Analysis of situations that have a multinomial dependent variable is not possible with

a logistic regression. Next, we describe a method that can analyze such situations.

Multinomial Logit Model

Consider the case of a consumer packaged goods manufacturer in the grocery industry.

The company is interested in predicting which brands their customers will choose on a shopping

occasion and how prices and promotions might affect this choice. For example, Figure 14.6

shows the variation in market share of a brand with changes in promotion. In this figure, we find

that there is an increase in market share (shown in blue) whenever there is a dip in prices (shown

in red). Further, the presence of various promotional vehicles such as feature, display and

coupons affects these shares. A quantitative analysis of such a problem can help retailers


16/49

572

understand the effect of brand promotions (Gupta, 1988), aid in appropriately setting retail prices

and determine the product portfolio that they should carry (Draganska & Jain, 2005). A

multinomial logit model is the most popular model to analyze such scenarios. Next, we develop

this model within a random utility framework.


Random Utility Theory

Assume that a consumer assigns a level of attractiveness to each discrete alternative in

her choice set. This attractiveness number for an alternative, a single index, conveys how much

the consumer likes that alternative. Thus, all the information present in the attributes of the

alternative is collapsed into this single index. This alternative-specific index is typically called

utility.

For an alternative j and time t, we will specify the utility (Ujt) to be composed of two

components. One component is called the systematic component (denoted by Vjt). This is

deterministic and contains the effects of covariates on the utility. The second component is called

the random component. This contains any other random factors that affect consumers choice.

Thus,

Ujt = Vjt + jt (16a)

or,

Ujt = x jt + jt (16b)

Here, for time t, xjt contains covariates associated with alternative j and is a vector of

parameters.

We assume that decision makers choose the alternative that gives them the maximum

utility. Also, for all alternatives the random componentsare independent and identically Gumbel


17/49

573

distributed. This particular choice of the error distribution leads to the following expression for

the probability of choice of an alternative j out of the possible J alternatives in a choice set.

(17)

The above expression is intuitive to understand. The numerator can be interpreted as the strength

of alternative j while the denominator is the sum of the strengths of all alternatives. Thus, the

probability expression essentially is the relative strength of alternative j. For a detailed

description of how this probability expression is attained from the assumptions of the error

distributions, see Ben-Akiva and Lerman (1985) or Train (2003).

The above expression also shows that the logistic regression model is a subset of the

multinomial logit model with binary outcomes. Thus, we can also arrive at the expressions for

the probabilities of the logistic regression by beginning with a random utility specification for the

binary outcomes.

We can apply the above model to an example from grocery industry. The data for this

example, made available by A.C. Nielsen, was collected during January, 1993 to May, 1995. We

use a sample of 300 people that purchased in the Breakfast Foods category. There are four major

brands in this category.2 For each brand, we have the price and promotion variation over time,

which enter the vector xjt. In this application, promotion is a dummy variable created by

combining various promotional vehicles such as feature and display. Table 14.10 shows the

summary statistics for the Breakfast Foods data.

== j)P(Choice

=

J

k

e

e

1

kt

jt

x

x


18/49

574

We estimate the parameters of the multinomial logit model using MLE on this data. Prior

to looking at the results, a set of identification conditions have to be discussed. These are

restriction conditions that must be imposed such that the model is identifiable i.e. only one set of

parameters will be maximizing the likelihood. The restriction corresponds to setting any one of

the brand intercepts to be zero. This is because only differences in utility matter in specifying

which brand a consumer will choose. This can be seen from the following illustration. Suppose

the four brands have the following utilities: U1t=10, U2t=20, U3t=25 and U4t=30. Then, a

consumer will choose Brand 4 as that has the maximum utility. Now, suppose we add 5 units to

each brand-specific utility. Then, the utilities for the four brands will be the following: U1t=15,

U2t=25, U3t=30 and U4t=35. This addition of 5 units will not change the chosen brand. A

consumer will still choose Brand 4. Thus, the absolute values of the utilities do not matter. It is

only the relative differences in the utilities among the brands that do. We will arbitrarily set the

intercept of Brand 4 to be zero. Thus, the intercepts of the other three brands will be interpreted

as being relative to Brand 4. This is similar to the interpretation of a dummy variable in a

regression.


Table 14.11 contains the MLE estimates of two multinomial logit models. The brand-

specific intercepts only model contains the estimates for a model that contains only the

alternative specific intercepts. The full model contains both the intercepts and the price and

promotion covariates. The estimates of the full model show that the coefficient of price is

negative (as it should be) whereas the coefficient for promotion is positive (again as expected).

While these results are intuitive, a more managerially relevant goal is to estimate the impact of

changing the price (or promotion) of brand j on the probability of choice of brand j as well as


19/49

575

on the probability of choosing any other brand k. A variable used for quantifying such effects is

elasticity.


Elasticity from the Logit Model

The systematic component for a brand j contains price and promotion. Thus,

.Prom*Price*V jtjtj prompricejt ++= (18)

Here, j is the intercept for alternative j, price is the price sensitivity and prom is the promotion

sensitivity. The elasticity of any dependent variable with respect to an independent variable is the

percent change in the dependent variable following a 1% change in the independent variable. As

an example, suppose the price of brand j is changed, then the own-price elasticity can be

ascertained by estimating the percent change in the probability of purchasing brand j after a 1%

change in its price. Similarly, the cross-price elasticity on a brand k can be evaluated by

considering the percent change in the probability of purchasing brand k following a 1% change

in price of brand j.

For the multinomial logit model, the expressions for the own-price and cross-price

elasticity are closed-form and are determined by the multinomial logit probabilities. These

expressions are as follows.

pricejj

j

Price)P(j)1(P(j)

Price

Price

P(j) =

=pricejj (19a)

pricejj

j

P(j)PriceP(k)

Price

Price

P(k) =

=pricekj (19b)


20/49

576

Here, pricejj denotes the own-price elasticity of brand j and reflects the percentage change in

probability of buying brand j with a 1% change in the price of brand j. And, pricekj is the cross-

price elasticity of brand k and reflects the percentage change in the probability of buying brand

k with a 1% change in the price of brand j is changed. Notice that the cross-price elasticity for

brand k does not depend on the attributes of brand k. Thus, the cross-price elasticity arising

from a change in brand j is the same for all other brands. This property, termed as uniform

cross-elasticity, is a consequence of the expression of the multinomial logit probabilities.

Table 14.12 contains the price elasticity measures for the full model. To estimate these

elasticity measures, we calculate the own and cross price elasticities for each brand and for every

observation. Then, we average these measures over all observations in the dataset. We can use

these numbers to interpret the impact of changing prices of a brand on own shares as well as

shares of other brands. We find that a 1% increase in the price of Brand 1 lowers the probability

of choosing Brand 1 by about 4.5 %. Similarly, a 1% increase in the price of Brand 1 increases

the probability of choosing the others brands by 0.85 %. A similar analysis can be conducted for

the other brands.


The elasticity measures also show an interesting property. From the summary statistics,

we know that Brand 3 has the highest share. If we now consider the elasticity measures, we

notice that Brand 3 has the lowest own-price elasticity and the highest cross-price elasticity. This

is a limitation of the elasticity measures resulting from the multinomial logit model i.e., high

market share brands show low own-price elasticity and high cross-price elasticity.

Note that we showed elasticity measures for the multinomial logit model. Similar

measures can also be calculated for the logistic regression model.


21/49

577

Fit Measures

In this application, we can calculate all the fit measures that we specified in the section

on logistic regression.

Likelihood Ratio Test

A typical likelihood ratio test involves comparing a model with only alternative specific

intercepts with a model where there are alternative-specific intercepts together with other

explanatory variables.

Let LL(C ) refer to the likelihood of the data when only intercepts are included in a

model while LL() denotes the likelihood when the model contains intercepts together with the

price and promotion covariates. In our example, from Table 14.11, -2 LL() is 3033.92 while -

2LL(C ) is 4321.88. Thus, the test statistic takes a value of 1287.96. The degrees of freedom are

5-3=2. The critical value of a 2 with 2 degrees of freedom at the 0.001 level is 13.81. Thus, the

likelihood of a model that contains the price and promotion covariates is significantly better than

a model without.

Likelihood Ratio Index (2)

The likelihood ratio index is described as follows.

2

= 1 LL() /LL(C ) (20)

In the current application, LL() is -1516.96 and LL(C ) is -2160.94. Thus, 2 is 0.30.

As explained earlier, the adjusted 2 has the following expression.

2 = 1- (LL() -K)/(LL(C )-P) (21)

Here, K is the total number of parameters including the intercepts and other covariates while P is

the number of intercepts. Thus, 2 is 0.29.


22/49

578

Hit Rate

In a multinomial logit model, the probabilities of choice of each alternative have a closed

form expression. The predicted probabilities for choosing each alternative can then be easily

calculated by inserting the MLE estimates in the probability expressions. We can then predict the

alternative that is most likely to be chosen (brand with the highest probability) and compare it

with the brand that is actually chosen. If the two are the same, we have a hit (i.e. a correction

prediction) else the prediction is wrong. We calculate the hit rate for the intercepts only model

and the full model. Table 14.11 reports these results. We find that the hit rate for an intercepts

only model is around 48.3 % while the hit rate for the full model is considerably higher at 63.2

%.

Independence of Irrelevant Alternatives (I.I.A.)

The multinomial logit model has several properties. One property that we discussed was

the uniform cross-elasticity. Another property that has been especially emphasized is that of

I.I.A. The property can be best illustrated by revisiting the expressions for the probabilities from

the logit model.

Suppose we consider the probability of choice of two alternatives, i and j, denoted by P(i)

and P(j) respectively, then,

x

x

x

x

x

x

)(

)(

jt

it

1

kt

jt

1

kt

it

=

=

=

=

e

e

e

e

e

e

jP

iP

J

k

J

k (22)


23/49

579

Equation (22) shows that the ratio of the probabilities of choosing two alternatives, i and j, is

independentof the presence of other alternatives and is only dependent on the systematic utilities

of the two alternatives. Thus, even if a new alternative very similar to i enters into the market,

it will not make a different in the relative probabilities of choosing i and j. This result is a

direct consequence of the independence assumption among the errors of the alternative-specific

utilities. This assumption can be pretty tenuous in many contexts. The following problem

illustrates one such context.

There is a famous problem, called the red bus/ blue bus problem, which illustrates the

I.I.A. issue. The problem is as follows. Suppose consumers are choosing between a car and a

blue bus as means of transportation and suppose they equally like both modes of transport. The

probability of choosing either a car or a blue bus is 0.5. In other words,

P(choose car) / P(choose blue bus) = 1. (23)

Recall, the I.I.A. property dictates that this ratio should remain the same irrespective of the

choice set. Now, suppose a red bus, similar to the blue bus in all respects except the color, is

introduced as a means of transport. Then, we would expect that consumers will be equally likely

to choose a red or a blue bus. This equality together with the above equality will imply the

following.

P(choose car) = P(choose blue bus) = P(choose red bus) = 1/3. (24)

This result is not appealing as consumers will mostly likely consider both bus types as

one alternative. If this is the case, then it implies the following probabilities are more reasonable.

P(choose car) = 1/2 ; P(choose red bus) = P(choose blue bus) = 1/4. (25)

Thus, the I.I.A. property can constrain the probabilities in such a way that in some

contexts, we can get results that are unrealistic. There are several ways of correcting this


24/49

580

problem. One alternative is to allow for a tree structure for consumer choice. We can achieve this

with a nested-logit model (Ben-Akiva, 1973) that allows for correlation among the utilities of

alternatives only within a nest. A second alternative is to allow for heterogeneity in customers

parameters then, at the aggregate level, the IIA property disappears (see Chapter 19 this book). A

third method is to allow for the brand utilities to be correlated as is done by the multinomial

probit model. We discuss this last method later.

Sampling of Alternatives

In the above analysis, we just had four alternatives. There are many instances, however,

where the number of alternatives can be much larger. For example, if retail store managers want

to evaluate the effect of price and promotion at the UPC level rather than focusing at the brand

level, then the number of alternatives can be in hundreds. In that case, evaluating the

denominator (the sum of strengths of all alternatives) in the probability expression of choosing a

particular alternative will be infeasible. One method for circumventing this problem is to sample

a set of alternatives from the entire set of possible alternatives and then evaluate the probabilities.

The following example illustrates this method.

Suppose we wish to model consumers choosing a mutual fund from all available mutual

funds. There are many mutual funds that consumers can choose from - the latest figures suggest

that there are more than 8000 mutual funds in the US alone (Investment Company Institute,

2005). We definitely cannot use all 8000 or more of these funds while evaluating the probability

of choosing a specific one. What we can do is to randomly sample a small number of these

mutual funds, for example 10, to form the set of alternatives. While sampling, for each

observation we have to ensure that the mutual fund chosen for that observation is in the

constructed set of alternatives (else how could the consumer have chosen that mutual fund if it


25/49

581

were not in her set of alternatives?). To ensure this, we include the mutual fund that was chosen

on that observation and randomly sample 9 others from the rest of alternatives. We can then

estimate the parameters of the model in exactly the same manner as described above in the

Breakfast Foods example. The MLE parameter estimates from using a set of alternatives

constructed from such a random sampling scheme will be exactly the same as those from using

all the alternatives. For a more detailed description of sampling, look at Ben-Akiva and Lerman

(1985).

There are other methods for sampling of alternatives, such as importance sampling. This

sort of sampling scheme is typically used when there is a need to over sample an alternative. For

example, in the previous example of mutual funds, suppose we find that many consumers are

choosing a few mutual funds then a sampling scheme should take these skewed choices into

account when selecting samples.An importance sample scheme does exactly that. Here, while

estimating the model, a correction factor is included to account for the non-random sampling.

Train, Ben-Akiva, and Atherton (1989) show an application of such a sampling scheme in the

context of consumers choosing long distance plans and minutes of consumption.

Multinomial Probit Model

There are several instances when there is a need to allow the utility errors of the

alternatives to be correlated. For example, consumers typically choose between different modes

of transport such as bus, car, train and others. They can also be using a combination of these

alternatives for commuting e.g., a mix of car and train (Currim, 1982). In such a scenario, the

errors in the utility of choosing a car, train and the alternative representing a combination of car

and train can be correlated (i.e., cannot be assumed to be independent). Clearly, we need a model

that is flexible enough to capture any possible correlation.


26/49

582

A multinomial probit model allows for the utility errors to be correlated and have

different variances (i.e., different scales for different alternatives). It also places several

identification restrictions (Keane, 1992). We show these restrictions in the simplest setting a

choice model with three alternatives. The utilities for the three alternatives are given as follows.

U1t=1 + x1t1 + 1t

U2t=2 + x2t2 + 2t

U3t=3 + x3t3 + 3t (26)

Here, we have intentionally separated the intercepts with the other covariates to show the

identification conditions. Also note, we have assumed a general model where the parameters for

the covariates are alternative-specific. The errors are assumed to have the following

distributional specification.

t

t

t

3

2

1

~

333231

232221

131211

,0N

(27)

As only differences in utilities matter, we can rewrite the above utilities in the following manner.

Y1t = U1tU3t =(1-3) + (x1t1 -x3t3) + ( 1t - 3t)

Y2t = U2tU3t =(2-3) + (x2t1 -x3t3) + ( 2t - 3t) (28)

Let 1tbe ( 1t - 3t) and 2t be ( 2t - 3t) then the joint distributional specification is as

follows.

t

t

2

1

~

2221

1211,0N

(29)


27/49

583

We now state the identification conditions. First, note that the differences in the intercepts, (1-

3) and (2-3), enter Y1t and Y2t. Thus, it is only these differences among the intercepts and not

their absolute values that are estimable. We can, therefore, without loss of generality set 3 as 0.

Second, unlike a linear regression where the dependent variable is observable, utilities are latent

(i.e., unobservable) and we have to set its scale. We do this by setting one of the variances (11 or

12) to 1. Let 11be set to 1 then, only 12 and 22 are estimable. Here, the parameter12 captures

any correlation between the differenced utilities and, therefore, the IIA problem is no longer a

concern. In empirical applications, the estimate of12 will suggest whether there is correlation

present among the utilities. If in an application, the estimate is significantly different from zero

then it implies that a multinomial logit model is inappropriate for that application as it does not

allow for utilities to be correlated.

Note that not all parameters of the original covariance matrix of the non-differenced

utilities are identified. In general, if there are J alternatives then the original covariance matrix

contains J*(J+1)/2 parameters. Of these, upon taking the difference of utilities and putting the

identification conditions, only ((J-1)*J/2)-1 parameters are identified (Train, 2003). In the above

formulation, we had 3 alternatives thus the original covariance matrix has (3)(4)/2 = 6

parameters. Of these, only (3-1)3/2-1=2 parameters are identified.

We now consider an application where a trinomial probit choice model is applied. This

application is from Keane (1982). The application considers the employment choices of men and

models three choices manufacturing (M), nonmanufacturing (NM) and unemployment. The

data for this model is from a national longitudinal survey of men. Table 14.13 contains a

description of the independent variables. In this application, the intercept for unemployment is

set to zero and the variance for the utility for M is set to 1. Note that independent variables are


28/49

584

allowed to have different effects on M and NM. For example, the model allows education to

have different effects on manufacturing and non-manufacturing. Finally, there are two sets of

parameters one set of parameters is estimated when the correlation among the utilities (12) is

set to 0 and the variance of NM is set to 1 (Model-1). The other set of parameters is attained

when both the correlation and variance are estimated (Model-2).


The results show that Model-2 is marginally better than Model-1 in terms of the log-

likelihood. Further, the correlation is positive and significantly different from zero. Also note

that there are differences in the estimated parameters of Model-1 and Model-2. This implies that

allowing for a correlation among the utilities clearly affects the estimation of other parameters in

the model. The positive correlation also suggests that a multinomial logit model may not have

been appropriate for this setting as it would have failed to capture this correlation.

The multinomial probit model provides a flexible way of capturing the correlations that

might be present among the utilities. This alleviates the IIA problem that is inherent in the

multinomial logit model.

Tobit Analysis

Tobit models are a part of general class of models for analyzing censored data. These

types of data are encountered when for a large number of observations, the dependent variable is

clustered around a certain value (Tobin, 1958). For example, in a large scale study of the number

of hours that married women work, it was found that about 66% of respondents reported zero

hours (Greene & Quester, 1982). We will show that analyzing such censored data without

accounting for censoring will always lead to biased estimates.


29/49

585

There are many other scenarios where such a censoring is observed. For example, in

grocery settings consumers either dont purchase a brand or have positive quantity (Jedidi,

Ramaswamy, & Desarbo, 1993; Tellis, 1988). Technically then, any demand modeling must

employ a tobit modeling framework as quantity is inherently non-negative (i.e., is censored) and

has a cut off at zero. There are several ways of modeling such a demand situation. If the focus is

on modeling the demand of a single alternative then a censored regression is typically the chosen

method. We will show an example of this methodology. If, however, the focus is on modeling

both the choice of an alternative and quantity demanded subsequent to the choice, then a two

stage regression is usually adopted (Tellis, 1988). In this framework, the choice of alternative

and quantity demanded are assumed to be interconnected i.e. the errors in the utility of

alternatives are correlated with the error in the demand model. This correlation captures any

selectivity bias (Heckman, 1979). For example, consumers may buy more of their preferred

brand but less of a brand that is chosen on a promotion.

We now illustrate a censored regression analysis. In general, a censored regression can be

expressed as follows.

(30)

Here, for an observation t, the random variable qt* is a partially observable variable. The error, t

is normally distributed, N(0, 2). The observed value of this variable is qt , the quantity observed

for observation t, when it is greater than zero. The observed value is zero if q t* is less than zero.

In other words,

(31)

t.t x*t

q +=

=

0qif0

0qifqq

*t

*t

*t

t


30/49

586

The expected value of qt is E(qt) = E(qt|qt* > 0) P(qt* > 0). Thus,

)xE(x)0*

t

q|

t

E(q tt t>+=>

+=

t

tx

x

x

t

(32)

Here, the ( .) and ( . ) are the density and the cumulative distribution function respectively of

the standard normal distribution. The above equation is similar to a standard OLS model with an

additional term that corrects for censoring. We can estimate the above model together with the

correction factor to yield unbiased estimates. Notice, that if we did not include the correction

factor then a regular OLS estimation will lead to biased estimates due to the omitted variable.

Bomberger (1993) used the above censored regression model to estimate the impact of

income and wealth on household deposits. There were 4262 households in the dataset out of

which 290 households had no deposits. Therefore, for these households the dependent variable is

censored at zero.

Bomberger estimates the above model and compare the results with an OLS regression.

Table 14.14 shows these results. We can make several observations from these estimates. First,

the intercept has very different value in the two equations. Second, we find that wealth is

marginally significant is the tobit model while it is non-significant in the OLS regression. This

implies that a failure to properly model the censoring can alter both the sign and significance of

the estimates.



31/49

587

Conclusions

In this chapter, we discussed several methods that are applicable in scenarios wherein the

dependent variable is either discrete (e.g. choice of brand) or constrained in such a manner (e.g.

market share) that a linear regression with OLS estimation fails to be the best alternative.

We began the chapter with a discussion on discriminant analysis. We showed that this

method is applicable for a discrete dependent variable (predetermined groups). In this context,

we also showed how to determine the independent variables that best discriminate among groups

and how to calculate their relative importance for discrimination.

Next, we discussed logisitic regression. We showed that this method is suitable for both

binary discrete dependent variables (e.g. buy/ no buy situation) and dependent variables that are

between 0-1 (e.g. market share). Thus, this method is applicable for a wider set of situations than

a two-group discriminant analysis.

We extended the logistic regression model to multinomial choice models that are suitable

for scenarios with a dependent variable that can take multiple values. In this context, we showed

the multinomial logit and probit models. The former model is the most frequently used choice

model as it provides closed form probability expressions. It has a few limitations it suffers

from the IIA property and the elasticity expressions are constrained to show a particular

substitution pattern among alternatives (e.g. own price elasticity is smaller for higher share

brands). The multinomial probit model alleviates the IIA problem but at the expense of closed

form probability expressions. We noted that for applications where the unobserved factors

affecting the available alternatives are correlated (e.g. the red bus / blue bus problem) then a

multinomial probit model is more appropriate than a multinomial logit model.


32/49

588

We ended the chapter with a discussion of censored regression models or tobit models.

These models are a combination of a binary probit and a multiple regression and are applicable

in a wide range of scenarios where there is censoring of the data (e.g. demand of a good).


33/49

589

ENDNOTES

1This is a synthetic dataset. For generating this data, we set the sensitivities to Recency,

Frequency and Monetary at 2.3, 0.3 and 0.1 respectively.

2We use brand and alternative interchangeably in this example. Here the four alternatives

correspond to four different brands.


34/49

590

REFERENCES

Bellman, S., Lohse, G. L., & Johnson, E. J. (1999). Predictors of online buying behavior.

Communications of the ACM, 42(12), 32-38.

Ben-Akiva, M. (1973). Structure of Passenger Travel Demand Models, Ph.D. Dissertation.

Department of Civil Engineering, MIT, Cambridge, MA.

-----, & Lerman, S. (1985).Discrete choice analysis: Theory and application to travel demand.

Cambridge, MA: MIT Press.

Bomberger, W. A. (1993). Income, wealth and household demand for deposits. The American

Economic Review, 84(4), 1034-1044.

Currim, I. S. (1982). Predictive testing of consumer choice models not subject to independence

of irrelevant alternatives. Journal of Marketing Research, 19, 208-222.

Draganska, M., & Jain, D. (2005). Product line length as a competitive tool. Journal of

Economics and Management Strategy, 14(1), 1-28.

Greene, W. H., & Quester, A. (1982). Divorce risk and wives labor supply behavior. Social

Science Quarterly, 63, 16-27.

Gupta, S. (1988). Impact of sales promotions on when, what, and how much to buy.Journal of

Marketing Research, 25, 342-355.

Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 46, 931-961.

Investment Company Institute (2005).ICI Factbook. Retrieved October 16, 2005, from

http://www.ici.org/.

Jedidi, K., Ramaswamy, V., & DeSarbo, W. S. (1993). A maximum likelihood method for latent

class regression involving a censored dependent variable, Psychometrika, 58(3), 375-394.


35/49

591

Johnson, R. A. & Wichern, D. W. (2002). Applied multivariate statistical analysis. Upper Saddle

River, NJ: Prentice Hall.

Keane, M. P. (1992). A note on identification in the multinomial probit model.Journal of

Business & Economic Statistics, 10(2), 193-200.

Lehmann, D.R., Gupta, S., & Steckel, J. H. (1998).Marketing Research. New York: Addison-

Wesley.

Tellis, G. J. (1988). Advertising exposure, loyalty and brand purchase: A two-stage model of

choice.Journal of Marketing Research, 25, 134-144.

Theil, H. (1971). Principles of econometrics. New York: Wiley.

Tobin, J. (1958). Estimation of relationship for limited dependent variables.Econometrica, 26,

24-36.

Train, K. (2003). Discrete choice models with simulation. Cambridge, MA: Cambridge

University Press.

-----, Ben-Akiva, M., & Atherton T. (1989). Consumption patterns and self-selecting tariffs.

Review of Economics and Statistics,71(1), 62-73.


36/49

592

Table 14.1

Southern Versus Non-Southern States

Variable Means Discriminant FunctionVariable South Non-South One-WayF

Un-standardized

Standardized

Average IncomePopulationPopulation ChangePercent UrbanTax Per CapitaGovernment Expen.College Enrollment

Mineral ProductionForest AcresManuf. OutputFarm Receipts

4.954.451.3757.00464.13286.13165.20

2006.2715.936.77

1801.73

5.914.191.1958.37618.23281.54192.45

610.4914.708.66

1943.37

17.130.050.300.0326.970.000.14

6.840.050.400.06

-0.430.92-0.370.02-0.010.00-0.01

0.000.01-0.27-0.00

-0.324.15-0.390.33-0.870.75-2.03

0.030.15-2.65-0.40

Chi-SquareDegrees of FreedomCanonical CorrelationWilks Lambda

42.71110.800.36

Source: Lehmann, Gupta and Steckel,Marketing Research. Page 668 (Addison-Wesley EducationalPublishers Inc., 1998)

Table 14.2

Hit Miss Table

Predicted Group

South Non-SouthActual GroupSouth

Non-South

14

3

1

32

Source: Lehmann, Gupta and Steckel,Marketing Research.Page 668 (Addison-Wesley Educational Publishers Inc., 1998)


37/49

593

Table 14.3

Averages for the Five Food Expenditure Groups

Group

Variables1

< $152

$15-$293

$30-$444

$45-$595

> $60

Education of wifeEducation of husbandAgeIncomeFamily sizeHow often they shopNumber of brands

shopped forInformation soughtSample size

3.322.794.091.622.091.911.82

1.9134

4.113.753.462.062.522.182.25

1.91284

4.294.083.062.753.132.272.34

1.81293

4.474.572.503.474.142.292.25

1.84181

4.494.692.723.755.112.622.72

1.8761

Source: Lehmann, Gupta and Steckel,Marketing Research. Page 670 (Addison-Wesley EducationPublishers Inc., 1998)

Table 14.4

Discriminant Functions

Unstandardized Coeff. Standardized Coeff.

Variables1 2 3 4 1 2 3 4

Education of wifeAgeIncome

Family sizeHow often they shopNumber of brandsshopped forConstant

0.02-0.01-0.25

-0.58-0.29-0.01

3.19

0.010.55-0.29

0.400.350.26

-3.62

-0.560.200.21

0.21-0.80-0.28

2.93

0.42-0.13-0.62

0.38-0.25-0.30

0.36

0.02-0.01-0.41

-0.77-0.20-0.01

-

0.010.81-0.43

0.560.240.37

-

-0.700.290.30

0.29-0.58-0.37

-

0.52-0.20-0.89

0.53-0.18-0.43

-

Source: Lehmann, Gupta and Steckel,Marketing Research. Page 687 (Addison-Wesley EducationPublishers Inc., 1998)


38/49

594

Table 14.5

Means of Groups

FunctionsGroups 1 2 3 4

12345

1.030.590.04-0.71-1.44

0.160.06-0.06-0.190.47

0.65-0.06-0.070.09-0.01

-0.060.05-0.070.05-0.01

Source: Lehmann, Gupta and Steckel,Marketing Research.Page 687 (Addison-Wesley Education Publishers Inc., 1998)

Table 14.6

Hit Miss Table (Multiple Discriminant Analysis)

Predicted Group

Actual Group

1 2 3 4 5

1

2

3

4

5

20

86

50

7

2

13

106

65

7

1

1

59

90

33

6

0

24

57

84

12

0

9

31

50

40

Source: Lehmann, Gupta and Steckel,Marketing Research.Page 687 (Addison-Wesley Education Publishers Inc., 1998)


39/49

595

Table 14.7

Summary Statistics for the RFM Data

Variable Mean Std. Dev.Recency

Frequency

Monetary

Choice

3.87

7.80

73.14

0.45

2.06

2.74

28.87

Table 14.8

Parameter Estimates for OLS Regression

Variable Estimate Std. Error

Intercept*

Recency*

Frequency

Monetary*

-0.92

0.15

0.02

0.01

0.13

0.01

0.01

0.001

R2

Adjusted R2

0.61

0.60*Significant at the 0.05 significance level.


40/49

596

Table 14.9

Parameter Estimates for Logistic Regression

Intercept Only Model Full Model

Variable Estimate Std. Error Estimate Std. Error

Intercept

Recency

Frequency

Monetary

-0.20

-

-

-

0.21

-

-

-

-30.29*

3.34*

0.59*

0.17*

8.55

0.93

0.24

0.05

-2LL 137.628 30.489

*Significant at the 0.05 level.

Table 14.10

Summary Statistics for Breakfast Foods Data

Brand Average Price($)

Promotion Market Share(%)

Brand 1

Brand 2

Brand 3

Brand 4

1.75

1.58

1.91

1.94

0.07

0.04

0.09

0.01

22.19

17.18

48.36

12.28


41/49

597

Table 14.11

Parameters estimates for Multinomial Logit Model

Intercepts Only Model Full Model

Variable Estimate Std. Error Estimate Std. Error

Intercept_Brand1

Intercept_Brand2

Intercept_Brand3

Price

Promotion

0.59

0.34

1.37

-

-

0.08

0.09

0.07

-

-

0.06

-0.64

1.91

-3.03

0.44

0.11

0.11

0.10

0.11

0.14

-2LL

Hit Rate

4321.88

48.3 %

3033.92

63.2 %


42/49

598

Table 14.12

Price Elasticities from the Full Multinomial Logit Model

Brand 1 2 3 4

1

2

3

4

-4.47

0.67

2.64

0.53

0.85

-4.14

2.64

0.53

0.85

0.67

-3.17

0.53

0.85

0.67

2.64

-5.37

Change inprice of j

Change in probability of k


43/49

599

Table 14.13

Parameters estimates for Multinomial Probit Model

Model 1 Model 2

Variable M NM M NM

Non labor income

Unemployment rate

Time trend

Years of education

Labor experience

Square of Exper.

Dummy for race

Dummy for marriage

Number of kids

Intercept

0.01(0.01)-0.08(0.01)-0.02(0.01)0.01(0.01)0.02

(0.01)-0.01(0.00)0.10(0.05)0.47(0.04)0.12(0.02)-0.06(0.14)

-0.05(0.01)-0.05(0.01)0.05(0.01)0.11(0.01)-0.03

(0.01)0.00(0.01)0.09(0.05)0.95(0.09)-0.18(0.03)-0.13(0.12)

0.00(0.01)-0.09(0.02)-0.01(0.01)0.03(0.01)0.01

(0.01)0.00(0.01)0.15(0.06)0.51(0.07)0.09(0.03)0.46(0.18)

-0.03(0.03)-0.08(0.02)0.04(0.03)0.10(0.06)-0.02

(0.02)0.00(0.01)0.14(0.07)0.91(0.39)-0.11(0.12)0.31(0.35)

Correlation

Variance

LL

0.00 (fixed)

1.00 (fixed)

-10,300.71

0.64(0.37)1.16

(0.58)-10,299.65

Source:Keane, M. P. (1992). A Note on Identification in the Multinomial Probit Model. Journal ofBusiness & Economic Statistics, 10, 2, Page 199


44/49

600

Table 14.14

Parameters estimates for OLS and Tobit Analysis

Variable OLS TobitEstimate Std. Error Estimate Std. Error

Intercept

Income

Wealth

-698.6

0.1015

-0.00001

1036.10

0.0065

0.0004

-9733

0.145

0.0002

3033

0.002

0.0001

Source: Bomberger, W. A. (1993). Income, Wealth and Household Demand for Deposits.The American Economic Review. 84, 4, Page 1038.

Figure 14.1

Purchasers and Non-purchasers Versus Age and Income

1

2

3

4

5

6

1 2 3 4 5 6

Age

Income

P

P

PP

P

P

P

P

P

PP

P

P

NN

N

N

NN

N

NN

N N

NNN


45/49

601

Figure 14.2

Group Means on First Two Discriminant Functions

Group Means

-0.7

-0.5

-0.3

-0.1

0.1

0.3

0.5

0.7

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Function 1

Function2

12

3

4

5


46/49

602

Figure 14.3

OLS Predictions Versus Actual Choice

-0.5

0

0.5

1

1.5

2

0 20 40 60 80 100 120

Predictions

Choice


47/49

603

Figure 14.4

Logistic Regression Predictions Versus Actual Choice

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100 120

Predictions

Choice


48/49

604

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Frequency

Probabilityo

fpurchase

Figure 14.5

Probability of Purchase Versus Frequency


49/49

5 10 15 20 25 30

Week

0.2

0.4

0.6

0.8

1

tek

raM

erahS

0.25

0.5

0.75

1

ecirP

FDC

FDC

FD

FDC

FDC

FD

FD

Figure 14.6

Variation in Market Share with changes in Marketing Mix

F = Feature, D = Display, C = Store Coupon

chapter_14 advanced regression models

Documents