university of castilla-la mancha, spain · 2021. 1. 21. · university of castilla-la mancha, spain...

1
Experimental Designs for different approaches of Simultaneous Equations ıctor Casero-Alonso and Jes ´ us L ´ opez Fidalgo Department of Mathematics Institute of Mathematics Applied to Science and Engineering University of Castilla-La Mancha, Spain 0. Abstract I Models with simultaneous equations (widely used in economics, sociology, medicine, engineering...) are considered. I A model with two equations is considered. I One explanatory variable (exogenous) of the first equation is the response variable (endogenous) of the second equation, where there is a controllable variable which is being designed. I Plugging second equation into the first one the designable variable is now in both equations. I Two different models: different maximum likelihood estimators and therefore information matrices and optimal designs. I Optimal designs for both approaches are computed and compared, both in a discrete and a continuous design space. I Cases of completely known correlation and a unknown correlation to be estimated are considered and compared. I A sensitivity analysis is performed to have an idea of the risk in choosing wrong nominal values of the parameters. 1. Motivating examples I Conlisk (1979): An oil company wants to study a controlled variation in the prices of gas and repairs, P g and P r . (Other controlled variables: whether trading stamps are offered with gas and repair sales respectively.) Two endogenous variables: quantity of gas and repairs sold. Two equations with all variables, exogenous and endogenous, included. I Aigner and Balestra (1988): Designing electricity pricing experiments which takes place via an intervention in one or a succession of periods. I Hahn, Hirano and Karlan (2011): Designing using propensity score, i.e. the conditional probability of treatment given some observed characteristics of the individual (covariates). I Surgery of lung carcinoma (2004) Exercise test to predict morbidity after lung resection: Riding a static bicycle during a period of time (controlled variable, exogenous). An uncontrollabled variable: % of maximum volume of expired air in the first second. Two endogenous variables: Oxygen desaturation during the test (y 1 , e.g. linear regression) and Binary response: morbidity (y 2, Logistic model). 2. Different approaches of Simultaneous Equations Poskitt and Skeels (2007): Two formulations probabilistically equivalent but conceptually quite different. Different MLEs, then information matrices and optimal designs. SES - Structural equation specification ( y = Y β + u , Y 2a + Z Π 2b + V , I y , response variable, I Y , explanatory/response variable, I Z , controllable variable, which is being designed, I β, Π 2a and Π 2b , unknown parameters, I u and V , error terms, u V N 0 0 , 1 ρ ρ 1 RFS - Reduced form specification Plugging second equation into the first one ( y 2a β + Z Π 2b β + ν, Y 2a + Z Π 2b + V . The designable variable Z is now in both equations. ν V N 0 0 , 1 ρ ρ 1 3. Optimal designs I Design space: Z = {0, 1}. I Information Matrices: For ρ known ( aa ) and ρ unknown to be estimated (the whole matrix): I M SES [ξ (z )] = 1 1 - ρ 2 1 2 2a + 2pΠ 2a Π 2b + pΠ 2 2b -2a + pΠ 2b )ρ -p2a 2b )ρ 1 -2a + pΠ 2b )ρ 1 p 0 -p2a 2b )ρ p p 0 1 0 0 1+ρ 2 1-ρ 2 I M RFS [ξ (z )] = 1 1 - ρ 2 Π 2 2a + 2pΠ 2a Π 2b + pΠ 2 2b 2a + pΠ 2b )(β - ρ) p2a 2b )(β - ρ) 0 2a + pΠ 2b )(β - ρ) 1 + β 2 - 2βρ p ( 1 + β 2 - 2βρ ) 0 p2a 2b )(β - ρ) p ( 1 + β 2 - 2βρ ) p ( 1 + β 2 - 2βρ ) 0 0 0 0 1+ρ 2 1-ρ 2 I Optimal design: ξ * (z )= 0 1 1 - p * p * , p * = arg min Φ{M [ξ (z )]}∈ [0, 1]; Φ: D–optimal, c–optimal 4. Optimal weights Similar D– and c–optimal weights p * for a given nominal values. Ex: 2a , Π 2b )=(4, 1, 0.8): D–opt c β –opt c Π 2a –opt c Π 2b –opt c ρ –opt SES RFS SES RFS SES RFS SES RFS SES RFS ρ known 0.547 .553 1 1 0 0.528 0.50092 0.5089 - - ρ unknown 0.548 0.50097 1 1 D–, c Π 2a –(only for RFS) and c Π 2b –optimal designs for Z = {0, 1} are optimal for Z =[0, 1] (GET). Bounds for p * of D–optimal designs: -20 000 -10 000 0 10 000 20 000 P 2 a -20 000 -10 000 0 10 000 20 00 P 2 b 0.4 0.5 0.6 p * -20 000 -10 000 0 10 000 20 000 P 2 a -20 000 -10 000 0 10 000 20 00 P 2 b 0.4 0.5 0.6 p * a) b) Figure: Values of p * for 2a , Π 2b ) with ρ = 0.8 for D–optimal design in: a) SES model and b) RFS model. Theorem (p * bounds) For SES and RFS models (with ρ known or unknown) the weight of D–optimal design in Z = {0, 1} for all values of ρ, Π 2a , Π 2b and β is bounded: p * (1/3, 2/3). 5. Robustness of D–optimal designs Lower bound for D–efficiency: D–eff θ * (p 0 )= |M θ * (p * )| -1 |M θ * (p 0 )| -1 = |M * 2a ,Π * 2b ) (p * )| -1 |M * 2a ,Π * 2b ) (p 0 )| -1 0.35 0.40 0.45 0.50 0.55 0.60 0.65 p * 0.80 0.85 0.90 0.95 eff 0.35 0.40 0.45 0.50 0.55 0.60 0.65 p * 0.80 0.85 0.90 0.95 eff a) b) Figure: Values of D–eff θ * (p 0 ) for SES model and different values of 2a , Π 2 ) with: a) ρ known and b) ρ unknown. Theorem (D–efficiency lower bound) The minimum D–efficiencies for SES and RFS models for all values of ρ, Π 2a , Π 2b and β are: I for ρ known: min D-eff θ * (p 0 )= 1 2 1/3 , I for ρ unknown: min D-eff θ * (p 0 )= 1 2 1/4 . Particular cases: If nominal values are θ 0 but true values are θ * ... -10 0 10 P 2 a -10 0 10 P 2 b 0.92 0.94 0.96 0.98 1.00 D-eff -10 0 10 P 2 a -10 0 10 P 2 b 0.85 0.90 0.95 1.00 D-eff Figure: D–eff (with ρ known) for a neighborhood of θ 0 : a) θ 0 =(4, 1, 0.8) (p 0 = 0.547) b) θ 0 =(4, -3, 0.8) (p 0 = 0.368) I D–eff in the point θ * = θ 0 is 1. I 3 points more with D–eff = 1 (blacks in figures) I D–eff is high for true values * 2a , Π * 2b ): a) greater (both) than 2a 0 , Π 2b 0 ), b) less (both) than 2a 0 , Π 2b 0 ). I D-eff decay for true values in the direction: a) Π 2a = -Π 2b , b) Π 2a = 0. I min D-eff in the neighborhood is: a) 91.46%, b) 83.60%. 6.Conclusions I In both models the optimal designs depend on the nominal values. I For Z = {0, 1}: I Similar SES optimal designs either for ρ known or unknown. I The same RFS optimal designs either for ρ known or unknown. I Similar optimal designs for SES and RFS, except the c Π 2a -optimality. Ex: 2a , Π 2b )=(4, 1) (ρ = .8 known) ξ * SES Π 2a = 0 1 and ξ * RFS Π 2a = 0 1 1 - 0.528 0.528 . but the relative efficiencies are quite good: eff SES (ξ * RFS )= c T M -1 SES ( θ, ξ * SES ) c c T M -1 SES ( θ, ξ * RFS ) c = 91.1% eff RFS (ξ * SES )= c T M -1 RFS ( θ, ξ * RFS ) c c T M -1 RFS ( θ, ξ * SES ) c = 95.2% I Bounds por p * of D–optimal design: p * (1/3, 2/3). I Lower bound for D–efficiency: min D-eff θ * (p 0 )= 1 2 1/3 (ρ known) or 1 2 1/4 (ρ unknown). 7. References Aigner, D. J., and Balestra, P. (1988), “Optimal experimental design for error components models,” ECONOMETRICA, 56 (4), 955-971. Conlisk, J. (1979), “Design for simultaneous equations,” J ECONOMETRICS, 11 (1), 63-76. Hahn, J., Hirano, K., and Karlan, D. (2011), “Adaptive Experimental Design using the Propensity Score,” J BUS ECON STAT, 29 (1), 96-108. opez-Fidalgo J. y Garcet-Rodr´ ıguez S. (2004). Optimal experimental designs when some independent variables are not subject to control. J AM STAT ASSOC. Vol.99. Papakyriazis, P. A. (1986), “Adaptive optimal estimation control strategies for systems of simultaneous equations,” MATH MODELLING, 7, 241-257. Poskitt, D.S. and Skeels, C.L. (2008) Conceptual frameworks and experimental design in simultaneous equations. ECON LETT 100

Upload: others

Post on 14-Aug-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University of Castilla-La Mancha, Spain · 2021. 1. 21. · University of Castilla-La Mancha, Spain 0. Abstract I Models with simultaneous equations (widely used in economics, sociology,

Experimental Designs for different approaches ofSimultaneous Equations

Vıctor Casero-Alonso and Jesus Lopez FidalgoDepartment of Mathematics

Institute of Mathematics Applied to Science and EngineeringUniversity of Castilla-La Mancha, Spain

0. Abstract

I Models with simultaneous equations (widely used in economics, sociology, medicine,engineering...) are considered.

I A model with two equations is considered.I One explanatory variable (exogenous) of the first equation is the response variable (endogenous)

of the second equation, where there is a controllable variable which is being designed.I Plugging second equation into the first one the designable variable is now in both equations.I Two different models: different maximum likelihood estimators and therefore information matrices

and optimal designs.I Optimal designs for both approaches are computed and compared, both in a discrete and a

continuous design space.I Cases of completely known correlation and a unknown correlation to be estimated are considered

and compared.I A sensitivity analysis is performed to have an idea of the risk in choosing wrong nominal values

of the parameters.

1. Motivating examples

I Conlisk (1979): An oil company wants to study a controlled variation in the prices of gas andrepairs, Pg and Pr . (Other controlled variables: whether trading stamps are offered with gas andrepair sales respectively.)Two endogenous variables: quantity of gas and repairs sold.Two equations with all variables, exogenous and endogenous, included.

I Aigner and Balestra (1988): Designing electricity pricing experiments which takes place via anintervention in one or a succession of periods.

I Hahn, Hirano and Karlan (2011): Designing using propensity score, i.e. the conditional probabilityof treatment given some observed characteristics of the individual (covariates).

I Surgery of lung carcinoma (2004) Exercise test to predict morbidity after lung resection: Riding astatic bicycle during a period of time (controlled variable, exogenous).An uncontrollabled variable: % of maximum volume of expired air in the first second.Two endogenous variables: Oxygen desaturation during the test (y1, e.g. linear regression) andBinary response: morbidity (y2, Logistic model).

2. Different approaches of Simultaneous Equations

Poskitt and Skeels (2007): Two formulations probabilistically equivalent but conceptually quitedifferent. Different MLEs, then information matrices and optimal designs.

SES - Structural equation specification{y = Yβ + u,Y = Π2a + Z Π2b + V ,

I y , response variable,I Y , explanatory/response variable,I Z , controllable variable, which is being

designed,I β, Π2a and Π2b, unknown parameters,I u and V , error terms,(

uV

)∼ N

((00

),

(1 ρρ 1

))

RFS - Reduced form specification

Plugging second equation into the first one{y = Π2aβ + Z Π2bβ + ν,

Y = Π2a + Z Π2b + V .

The designable variable Z is now in bothequations.(

νV

)∼ N

((00

),

(1 ρρ 1

))

3. Optimal designs

I Design space: Z = {0,1}.I Information Matrices:

For ρ known ( aa ) and ρ unknown to be estimated (the whole matrix):

I MSES[ξ(z)] =1

1− ρ2

1 + Π2

2a + 2pΠ2aΠ2b + pΠ22b −(Π2a + pΠ2b)ρ −p(Π2a + Π2b)ρ 1

−(Π2a + pΠ2b)ρ 1 p 0−p(Π2a + Π2b)ρ p p 0

1 0 0 1+ρ2

1−ρ2

I MRFS[ξ(z)] =1

1− ρ2

Π2

2a + 2pΠ2aΠ2b + pΠ22b (Π2a + pΠ2b)(β − ρ) p(Π2a + Π2b)(β − ρ) 0

(Π2a + pΠ2b)(β − ρ) 1 + β2 − 2βρ p(1 + β2 − 2βρ

)0

p(Π2a + Π2b)(β − ρ) p(1 + β2 − 2βρ

)p(1 + β2 − 2βρ

)0

0 0 0 1+ρ2

1−ρ2

I Optimal design: ξ∗(z) =

{0 1

1− p∗ p∗

},p∗ = arg min Φ{M[ξ(z)]} ∈ [0,1]; Φ: D–optimal, c–optimal

4. Optimal weights

Similar D– and c–optimal weights p∗ for a given nominal values. Ex: (Π2a,Π2b, ρ) = (4,1,0.8):

D–opt cβ–opt cΠ2a–opt cΠ2b–opt cρ–optSES RFS SES RFS SES RFS SES RFS SES RFS

ρ known 0.547 .553 1 1 0 0.528 0.50092 0.5089 - -ρ unknown 0.548 0.50097 1 1

D–, cΠ2a–(only for RFS) and cΠ2b–optimal designs for Z = {0,1} are optimal for Z = [0,1] (GET).

Bounds for p∗ of D–optimal designs:

-20 000

-10 000

0

10 000

20 000

P2 a

-20 000

-10 000

0

10 000

20 000

P2 b

0.4

0.5

0.6

p*

-20 000

-10 000

0

10 000

20 000

P2 a

-20 000

-10 000

0

10 000

20 000

P2 b

0.4

0.5

0.6

p*

a) b)

Figure: Values of p∗ for (Π2a,Π2b) with ρ = 0.8 for D–optimal design in: a) SES model and b) RFS model.

Theorem (p∗ bounds)

For SES and RFS models (with ρ known or unknown) the weight of D–optimal design in Z = {0,1}for all values of ρ, Π2a, Π2b and β is bounded:

p∗ ∈ (1/3,2/3).

5. Robustness of D–optimal designs

Lower bound for D–efficiency:

D–effθ∗(p0) =|Mθ∗(p∗)|−1

|Mθ∗(p0)|−1 =|M(Π∗

2a,Π∗2b,ρ)(p∗)|−1

|M(Π∗2a,Π

∗2b,ρ)(p0)|−1

0.35 0.40 0.45 0.50 0.55 0.60 0.65p*

0.80

0.85

0.90

0.95

eff

0.35 0.40 0.45 0.50 0.55 0.60 0.65p*

0.80

0.85

0.90

0.95

eff

a) b)Figure: Values of D–effθ∗(p0) for SES model and

different values of (Π2a,Π2, ρ) with: a) ρ known andb) ρ unknown.

Theorem (D–efficiency lower bound)

The minimum D–efficiencies for SES andRFS models for all values of ρ, Π2a, Π2band β are:I for ρ known: min D-effθ∗(p0) = 1

21/3 ,

I for ρ unknown: min D-effθ∗(p0) = 121/4 .

Particular cases:If nominal values are θ0 but true values are θ∗...

-10

0

10P2 a -10

0

10

P2 b

0.92

0.94

0.96

0.98

1.00

D-eff

-10

0

10P2 a

-10

0

10

P2 b

0.85

0.90

0.95

1.00

D-eff

Figure: D–eff (with ρ known) for a neighborhood of θ0:a) θ0 = (4,1,0.8) (p0 = 0.547) b) θ0 = (4,−3,0.8)

(p0 = 0.368)

I D–eff in the point θ∗ = θ0 is 1.I 3 points more with D–eff = 1 (blacks in figures)I D–eff is high for true values (Π∗2a,Π∗2b):

a) greater (both) than (Π2a0,Π2b0),b) less (both) than (Π2a0,Π2b0).

I D-eff decay for true values in the direction:a) Π2a = −Π2b,b) Π2a = 0.

I min D-eff in the neighborhood is:a) 91.46%,b) 83.60%.

6.Conclusions

I In both models the optimal designs depend on the nominal values.I For Z = {0,1}:

I Similar SES optimal designs either for ρ known or unknown.I The same RFS optimal designs either for ρ known or unknown.I Similar optimal designs for SES and RFS, except the cΠ2a-optimality.

Ex: (Π2a,Π2b) = (4,1) (ρ = .8 known) ξ∗SESΠ2a=

{01

}and ξ∗RFSΠ2a

=

{0 1

1− 0.528 0.528

}.

but the relative efficiencies are quite good:

effSES (ξ∗RFS) =cT M−1

SES

(θ, ξ∗SES

)c

cT M−1SES

(θ, ξ∗RFS

)c

= 91.1% effRFS (ξ∗SES) =cT M−1

RFS

(θ, ξ∗RFS

)c

cT M−1RFS

(θ, ξ∗SES

)c

= 95.2%

I Bounds por p∗ of D–optimal design: p∗ ∈ (1/3,2/3).

I Lower bound for D–efficiency: min D-effθ∗(p0) =1

21/3 (ρ known) or1

21/4 (ρ unknown).

7. References

Aigner, D. J., and Balestra, P. (1988), “Optimal experimental design for error components models,”ECONOMETRICA, 56 (4), 955-971.

Conlisk, J. (1979), “Design for simultaneous equations,” J ECONOMETRICS, 11 (1), 63-76.

Hahn, J., Hirano, K., and Karlan, D. (2011), “Adaptive Experimental Design using the PropensityScore,” J BUS ECON STAT, 29 (1), 96-108.

Lopez-Fidalgo J. y Garcet-Rodrıguez S. (2004). Optimal experimental designs when someindependent variables are not subject to control. J AM STAT ASSOC. Vol.99.

Papakyriazis, P. A. (1986), “Adaptive optimal estimation control strategies for systems ofsimultaneous equations,” MATH MODELLING, 7, 241-257.

Poskitt, D.S. and Skeels, C.L. (2008) Conceptual frameworks and experimental design insimultaneous equations. ECON LETT 100