on sharp identi cation regions for regression under ... · situation situation: standard simple...
TRANSCRIPT
On Sharp Identi�cation Regions for Regression
Under Interval Data
Georg Schollmeyer Thomas Augustin
Department of Statistics LMU Munich
Situation
Situation: Standard simple linear model Y = β0 + β1X + ε with interval-censored
outcomes.
Y unobserved, only Y and Y observed and all we know is Y ∈ [Y,Y].
model partially identi�ed, there are many parameters (β0, β1) leading to the
same distribution of the observable variables.
●
●
●
●
● ●
● ●
●
●●
●
●
●
●●
●
●
●
●
2 4 6 8 10
2040
6080
100
x
y
−−
−
−
−−
− −
−−−
−−
−
− −
−
−
−
−
−
−
−
−
− −
−
−
−
−−
−
−
−
− −
−
−
−
−
2 4 6 8 10
020
4060
8010
012
0
x
y
"What is the entity to estimate?"
Situation
Situation: Standard simple linear model Y = β0 + β1X + ε with interval-censored
outcomes.
Y unobserved, only Y and Y observed and all we know is Y ∈ [Y,Y].
model partially identi�ed, there are many parameters (β0, β1) leading to the
same distribution of the observable variables.
●
●
●
●
● ●
● ●
●
●●
●
●
●
●●
●
●
●
●
2 4 6 8 10
2040
6080
100
x
y
−−
−
−
−−
− −
−−−
−−
−
− −
−
−
−
−
−
−
−
−
− −
−
−
−
−−
−
−
−
− −
−
−
−
−
2 4 6 8 10
020
4060
8010
012
0
x
y
"What is the entity to estimate?"
Identi�cation regions for the simple linear model
a) Sharp Marrow Region:
SMR(Y,Y) = {β | E(Y | X ) ≤ β0 + β1X ≤ E(Y | X )}
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Identi�cation regions for the simple linear model
a) Sharp Marrow Region:
SMR(Y,Y) = {β | E(Y | X ) ≤ β0 + β1X ≤ E(Y | X )}
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Identi�cation regions for the simple linear model
a) Sharp Marrow Region:
SMR(Y,Y) = {β | E(Y | X ) ≤ β0 + β1X ≤ E(Y | X )}
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Identi�cation regions for the simple linear model
b) Sharp Collection Region:
SCR(Y,Y) = {argminE((β0 + β1X − Y )2) | Y ∈ [Y,Y]}
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Identi�cation regions for the simple linear model
b) Sharp Collection Region:
SCR(Y,Y) = {argminE((β0 + β1X − Y )2) | Y ∈ [Y,Y]}
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Identi�cation regions for the simple linear model
b) Sharp Collection Region:
SCR(Y,Y) = {argminE((β0 + β1X − Y )2) | Y ∈ [Y,Y]}
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Identi�cation regions for the simple linear model
b) Sharp Collection Region:
SCR(Y,Y) = {argminE((β0 + β1X − Y )2) | Y ∈ [Y,Y]}
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Identi�cation regions for the simple linear model
b) Sharp Collection Region:
SCR(Y,Y) = {argminE((β0 + β1X − Y )2) | Y ∈ [Y,Y]}
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Sharp Setloss Region
c) Sharp Setloss Region:
LS(Y,Y, Γ) =
∫ [E(Y | x)− sup
β∈Γ(β0 + β1x)
]2
+
[E(Y | x)− inf
β∈Γ(β0 + β1x)
]2
dP(x)
SSR(Y,Y) =⋃
argminΓ⊆R2
LS(Y,Y, Γ)
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Sharp Setloss Region
c) Sharp Setloss Region:
LS(Y,Y, Γ) =
∫ [E(Y | x)− sup
β∈Γ(β0 + β1x)
]2
+
[E(Y | x)− inf
β∈Γ(β0 + β1x)
]2
dP(x)
SSR(Y,Y) =⋃
argminΓ⊆R2
LS(Y,Y, Γ)
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Sharp Setloss Region
c) Sharp Setloss Region:
LS(Y,Y, Γ) =
∫ [E(Y | x)− sup
β∈Γ(β0 + β1x)
]2
+
[E(Y | x)− inf
β∈Γ(β0 + β1x)
]2
dP(x)
SSR(Y,Y) =⋃
argminΓ⊆R2
LS(Y,Y, Γ)
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Sharp Setloss Region
c) Sharp Setloss Region:
LS(Y,Y, Γ) =
∫ [E(Y | x)− sup
β∈Γ(β0 + β1x)
]2
+
[E(Y | x)− inf
β∈Γ(β0 + β1x)
]2
dP(x)
SSR(Y,Y) =⋃
argminΓ⊆R2
LS(Y,Y, Γ)
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Sharp Setloss Region
c) Sharp Setloss Region:
LS(Y,Y, Γ) =
∫ [E(Y | x)− sup
β∈Γ(β0 + β1x)
]2
+
[E(Y | x)− inf
β∈Γ(β0 + β1x)
]2
dP(x)
SSR(Y,Y) =⋃
argminΓ⊆R2
LS(Y,Y, Γ)
−20 −10 0 10 20
−30
0−
200
−10
00
100
x
E(
|x)
Relations between SMR, SCR and SSR
SSR ⊇ SMR ⊆ SCR
SSR ⊆ SCR? "often approximately"
Relations between SMR, SCR and SSR
SSR ⊇ SMR ⊆ SCR
SSR ⊆ SCR?
"often approximately"
Relations between SMR, SCR and SSR
SSR ⊇ SMR ⊆ SCR
SSR ⊆ SCR? "often approximately"
region well-speci�ed case misspeci�ed case
SMR true parameters (β0, β1) ∈ SMRnot "well-estimable" ?
SSR true parameters (β0, β1) ∈ SSR
"well-estimable" (e.g. replace
expectations by means)
good description of the bounds
Y,Y and thus good description
of Y
SCR true parameters (β0, β1) ∈ SCR
"well-estimable" (e.g. replace
expectations by means)
set of all possible good linear
descriptions for every
Y ∈ [Y,Y]
0 1 2 3 4 5
05
1015
2025
x
E(
|x)
0 1 2 3 4 5
−10
−5
05
1015
2025
xE
( |x
)
Figure: The conditional expectations E(Y | x) (black), E(Y | x) (dashed) and E(Y | x)
(dashed) for a well-speci�ed (left) and a misspeci�ed (right) situation.
−2 −1 0 1 2 3 4 5
−2
02
46
810
12
ß1
ß0
●
−3 −2 −1 0 1 2 3 4
−6
−4
−2
02
46
8
ß1
ß0
●
Figure: The identi�cation regions SMR (red), SCR (black) and SSR (blue) for a well-
speci�ed (left) and a misspeci�ed (right) situation. The true Parameter is dotted green
(for the misspeci�ed situation the green point is the best linear predictor for y). In the
misspeci�ed case SMR is empty.
Beresteanu, A., & Molinari, F. (2008): Asymptotic properties for a class of
partially identi�ed models, Econometrica, 76, 763�814.
�erný, M. & Rada M. (2011): On the possibilistic approach to linear
regression with rounded or interval-censored data. Measurement Science
Review, 11, 34�40.
Ponomareva, M., & Tamer, E. (2011): Misspeci�cation in moment inequality
models: back to moment equalities?. Econometrics Journal, 14, 186-203.
Stoye, J. (2007): Bounds on generalized linear predictors with incomplete
outcome data. Reliable Computing, 13, 293�302.
region well-speci�ed case misspeci�ed case
SMR truth ∈ SMRnot "well-estimable"
monotone
idempotent
?
SSR truth ∈ SSR
"well-estimable"
not monotone
idempotent
good description of the bounds
Y,Y and thus good description
of Y
SCR truth ∈ SCR
"well-estimable"
monotone
not idempotent and thus
maybe too rough
set of all possible good linear
descriptions for every
Y ∈ [Y,Y]