models of discrete choice - stanford university · 2006-04-16 · logic of the discriminal process...

25
Mathematical Theory Likelihood References Models of Discrete Choice Jonathan Wand Polisci 350C Stanford University April 10, 2006 (revised)

Upload: others

Post on 17-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Models of Discrete Choice

Jonathan Wand

Polisci 350CStanford University

April 10, 2006 (revised)

Page 2: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Theoretical foundations of Choice models

Two main approaches to deriving models of discrete choice:

1. Discriminal process (Thurstone, 1927)Most often associated with models of choice based onnormal distributions (probit)

2. Axiomatic derivation (Luce, 1959)Most often associated with models of choice based onlogistic distribution (logit)

Page 3: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Thurstone: discriminal process

Logic of the discriminal process (abstract)

1. Assume a metric for making comparisons across items.Thurstone deals with a single attribute, posits apsychological continuum

2. Assume that attributes of items have a random component.Thurstone used Normal distribution, but acknowledgedarbitrariness.

3. Assume that positive difference between two items leadsto the selection of the item with the higher value.A relative comparison, requiring only simple inequalitystatements.

Page 4: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Thurstone: discriminal process

Logic of the discriminal process (less abstract)

1. The utility for item i is defined as ui .2. The utility can be decomposed into two parts,

• a systematic (observed) component µi• a random (unobserved) component εi ∼ F (0, σ2)

Assume that the overall utility of an item is the sum of theobserved and unobserved components ui = µi + εi .

3. For a pair of items, choose item j over k if uj > uk .

Page 5: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Thurstone: discriminal process

General binary choice

P(j , k) = P(uj > uk )

= P(µj + εj > µk + εk )

= P(µj − µk + εj > εk )

= P(µj − µk > εk − εj)

Page 6: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Notational aside

Define the standard Normal(0,1) pdf and cdf as, respectively,

φ(ε) =1√2π

exp{−ε2

2

}

Φ(ε) =

∫ ε

−∞φ(z)∂z

Define the Type I extreme value pdf and cdf as, respectively,

λ(ε) = e−ε exp{−e−ε}

Λ(ε) = exp{−e−ε}

Page 7: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Thurstone: discriminal processCase 1. Assume F (0, σ2

i ) is a Normal distribution

Recall, for independent normal distributions

N (µi , σ2i )−N (µj , σ

2j ) = N (µi − µj , σ

2i + σ2

j )

So,

P(j , k) = P(µj − µk > εk − εj)

=

∫ µj−µk

−∞

1√2π(σ2

j + σ2k )

exp

− z2

2√

σ2j + σ2

k

∂z

= Φ

µj − µk√σ2

j + σ2k

Page 8: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Thurstone: discriminal process

Given,

P(j , k) = Φ

µj − µk√σ2

j + σ2k

and µi = xiγi , consider what can be identified?

If xj = xk = x , then

P(j , k) = Φ

xγj − γk√σ2

j + σ2k

= Φ(xβ)

where β =γj−γkqσ2

j +σ2k

.

Page 9: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Thurstone: discriminal process

Given,

P(j , k) = Φ

µj − µk√σ2

j + σ2k

and µi = xiγi , consider what can be identified?

If γj = γk = γ, then

P(j , k) = Φ

(xj − xk )γ√

σ2j + σ2

k

= Φ((xj − xk )β

)where β = γq

σ2j +σ2

k

.

Page 10: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Thurstone: discriminal processCase 2. Assume F is a Type I extreme value distribution

P(j , k) = P(uj > uk )

= P(µj − µk + εj > εk )

=

∫ ∞

−∞λ(εj)

∫ µj−µk+εj

−∞λ(εk )

=

∫ ∞

−∞λ(εj)Λ(µj − µk + εj)

=

∫ ∞

−∞λ(εj)Λ(µj − µk + εj)

=1w

∫ ∞

−∞−e−εj w exp{−e−εj w}

=1w

=1

1 + exp{−(µ1 − µ2)}

Page 11: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Thurstone: discriminal processGiven,

P(j , k) =1

1 + exp{−(µ1 − µ2)}

and µi = xiγi , consider what can be identified?

If xj = xk = x , then

P(j , k) =1

1 + exp{−x(γ1 − γ2)}=

11 + exp{−xβ}

where β = γ1 − γ2.

If γj = γk = γ, then

P(j , k) =1

1 + exp{−β(x1 − x2)}

where β = γ.

Page 12: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Axiomatic Foundations of Choice Models

Assumptions

D1 Let R ⊂ S ⊂ T ⊂ U.

D2 Let x , y , z ∈ T , arbitrary elements of choice set.

D3 Let P(x , y) be the probability of choosing x instead of y ,0 < P(x , y) < 1.

D4 PS(R) is the probability of choosing R given choice fromamong alternatives in S.

Choice Axiom

(i) PT (R) = PS(R)PT (S)

(ii) If P(x , y) = 0 for some x , y ∈ T , PT (S) = PT−{x}(S − {x})

Page 13: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Axiomatic Foundations of Choice Models

Axiom of Choice

• Defines relationship which defines how choices withinsubsets are related in the context of an individual makingprobabilistic choices.

• Can be rewritten as PT (R | S)PT (S) = PT (R)

• Two core implications,Lemma 3: Independence of Irrelevant Alternatives (IIA)Theorem 3: Probability must satisfy a ratio scale

Page 14: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Axiomatic Foundations of Choice ModelsLemma 3 (Independence from irrelevant alternatives):For x , y ∈ S,

P(x , y)

P(y , x)=

PS(x)

PS(y)

Proof:By Axiom we have

PS(x) = P(x , y)[PS(x) + PS(y)]

So

PS(x) = P(x , y)[PS(x) + PS(y)]

PS(x) = P(x , y)PS(x) + P(x , y)PS(y)

(1− P(x , y))PS(x) = P(x , y)PS(y)

P(y , x)PS(x) = P(x , y)PS(y)

P(x , y)

P(y , x)=

PS(x)

PS(y)

Page 15: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Axiomatic Foundations of Choice Models

Lemma 3 (Independence from irrelevant alternatives):

• Most famous implication: relative probability of choosingtwo alternatives is invariant to the composition of the largerset of alternatives.

• Again, only ratio that is invariant not probabilitiesthemselves

• Might also hear that log-odds of two choices are constant:log(PS(x))− log(PS(y)) = c.

• Can estimate parameters defining utility of choices evenwith only a subset. Not possible if IIA does not hold (e.g.,correlation between utilities of choices)—then to estimateany choice must model all choices.

Page 16: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Axiomatic Foundations of Choice ModelsTheorem 3: choice probability is ratio scale∃v : T → <+, unique up to multiplication by k > 0, such that

PS(x) =v(x)∑

y∈S v(y)=

11 +

∑y∈S−{x} v(y)/v(x)

McFadden and Yellot have each pointed out the connection tologit models by setting v(x) = ex .

E.g.,

P(x , y) =v(x)

v(x) + v(y)=

eµx

eµx + eµy=

11 + eµy /eµx

=1

1 + e−(µx−µy )

Yellot (1977) shows that discriminal process based on Type Idiscrete value distribution is uniquely equivalent to ChoiceAxiom.

Page 17: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Likelihood for a dichotomous choiceConsider a Bernoulli process for observing y ∈ {0, 1}, whereP(y = 1) = p, and therefore P(y = 0) = 1− p.For an individual i , the likelihood of observing yi is

Li = pyi (1− p)1−yi (1)

The log-likelihood,

Li = yi log(p) + (1− yi) log(1− p) (2)

The score of the likelihood

Li

p=

yi

p+

1− yi

1− p(−1) =

yi − pp(1− p)

=yi − E(yi)

V (yi)(3)

Here, p is a single parameter for all people, but same setup formore general parameterization.

Page 18: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Likelihood for a dichotomous choice

Normally, would like p to be a function of covariates xi .

Therefore we seek a mapping: xi → P(yi | xi).

In general, will need to specify two things:1. Index function: a(x , β) that maps k-vector to scalar

1.1 e.g., additive-linear: a(xi , β) = xiβ

1.2 e.g., non-linear a(xi , β) = xβ11 + x2β2

2. Transformation function F() with properties,2.1 F (−∞) = 02.2 F (∞) = 12.3 ∂F (z)/∂z = f (z) > 0

i.e., monotonic function that maps real line to unit interval

Page 19: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Likelihood for a dichotomous choice

Standard parameterizations for k-vector of covariates xi ,

1. Probit:1.1 a(x , β) = xβ

1.2 F (xβ) =∫ xβ

−∞1√2π

exp{− z2

2 } =∫ xβ

−∞ φ(z) = Φ(xβ)

1.3 Li = yi log(Φ(xiβ)) + (1− yi) log(1− Φ(xiβ))

2. Logit2.1 a(x , γ) = xγ2.2 F (xγ) = 1

1+exp{−xγ} = Λ(xγ)

2.3 Li = yi log(Λ(xiγ)) + (1− yi) log(1− Λ(xiγ))

Page 20: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Logit and Probit, CDF

−10 −5 0 5 10

0.0

0.2

0.4

0.6

0.8

1.0

beta = (0,1)

x

P( y

=1 |

x )

Page 21: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Logit and Cauchit (t), CDF

−10 −5 0 5 10

0.0

0.2

0.4

0.6

0.8

1.0

beta = (0,1)

x

P( y

=1 |

x )

Page 22: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Likelihood for a dichotomous choice

Let pi = P(yi | xi) = F (a(xi , β)).

Recall the log-likelihood,

Li = yi log(p) + (1− yi) log(1− p)

and re-write the score of the log-likelihood,

Si(β) =∂Li

∂β=

∂Li

∂pi

∂pi

∂β=

yi − pi

pi(1− pi)

∂F (a(xiβ)

∂β

∂a(xiβ)

∂β

Si(β) =∂Li

∂β= (yi − Fi)

fiFi(1− Fi)

xi

FOC to maximize L defines ML β̂, such that∑n

i=1 Si(β̂) = 0

Page 23: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

A quick asideLet’s remind ourselves of logic of ML using linear model.

Li = −12

log(2π)− 12(yi − xiθ)

2

Score function for an individual

Si(θ) =∂Li

∂θ= (yi − xiθ)xi

FOC, in matrix notation,

n∑i=1

Si(θ̂) = X>(Y − Xθ) = X>Y − X>X θ̂ = 0

so we have

θ̂ = (X>X )−1X>Y

Page 24: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Particularly important concepts from this lecture

1. How to derive binary choice from models using normal andextreme value distributions

2. How to interpret the meaning of the coefficients (what isidentified by reduced form parameters?)

3. The (log)likelihood and score of a Bernoulli process

4. How to parametrize a logit and probit (log) likelihood, andderivation of score

5. How the logit and probit differ (i) in motivation (2) in tailbehavior

Page 25: Models of Discrete Choice - Stanford University · 2006-04-16 · Logic of the discriminal process (less abstract) 1. The utility for item i is defined as ui. 2. The utility can

Mathematical Theory Likelihood References

Luce, R. Duncan. 1959. Individual choice behavior: atheoretical analysis. NY: Wiley.

Thurstone, L. L. 1927. A Law of Comparative Judgement.Pychological Review , 34:272–86.URL http://www.psycinfo.com/library/display.cfm?uid=1994-28135-001\&format=pdf

Yellot, John I. Jr. 1977. The Relationship between Luce’sAxiom, Thurstone’s Theory of Comparative Judgment, andthe Double Exponential Distribution. Journal of MathematicalPsychology , 15:109–144.