a measurement error model approach to survey data...

25
A measurement error model approach to survey data integration: combining information from two surveys Jae Kwang Kim 1 Iowa State University 2017 SAE conference, Paris July 11th, 2017 1 Joint work with Seho Park

Upload: vanthuan

Post on 02-Sep-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

A measurement error model approach to surveydata integration: combining information from two

surveys

Jae Kwang Kim 1

Iowa State University

2017 SAE conference, ParisJuly 11th, 2017

1Joint work with Seho Park

Page 2: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Survey data integration

Want to combine information from multiple surveys

Three situations1 Multiple samples for one target population2 One sample each from multiple populations3 Multiple samples from multiple populations

Small area estimation is a special case of survey data integration, inthat multiple sub-populations represent multiple domains.

Kim (ISU) Survey Data Integration 7/11/2017 2 / 25

Page 3: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Motivation

USAID Bureau for Food Security (BFS) sponsors Food and NutritionTechnical Assistance III project (FANTA).

Key technical areas of focus are food security, maternal and child health,agriculture, and livelihoods strengthening.

Kim (ISU) Survey Data Integration 7/11/2017 3 / 25

Page 4: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Motivation

FANTA has two projects: Feed the Future (FTF) and Food for Peace(FFP) development projects.

FFP project was conducted by ICF International, and FTF project wasconducted by UNC MEASURE.

Two surveys were conducted in 2013 from selected departments inGuatemala: San Marcos, Totonicapan, Quiche, Quezaltenango, andHuehuetenango.

Kim (ISU) Survey Data Integration 7/11/2017 4 / 25

Page 5: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Map of Guatemala

Kim (ISU) Survey Data Integration 7/11/2017 5 / 25

Page 6: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

FFP and FTF Projects in Guatemala

Figure: Selected Departments in Guatemala

Kim (ISU) Survey Data Integration 7/11/2017 6 / 25

Page 7: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Overlap Area

Figure: FFP ZOI and FFP Project Implementation Area for Guatemala

Kim (ISU) Survey Data Integration 7/11/2017 7 / 25

Page 8: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Overlap Area

Table: Overlap Area: Departments and Municipalities

Department Municipality

San Marcos SibinalTajumulco

Totonicapan MomostenangoSanta Lucia La Reforma

Huehuetenango ChiantlaConcepcion HuistaJacaltenangoSan Antonio HuistaTodos Santos

Quetzaltenango San Juan Ostuncalco

Quiche Chichicastenango(Santa Maria) NebajUspantanCunenSan Juan Cotzal

Kim (ISU) Survey Data Integration 7/11/2017 8 / 25

Page 9: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Common Indicators

Two surveys have their own indicators and 11 common indicatorswere chosen to be studied.

The common items are about women’s nutritional status, children’swell-being status, and prevalence of poverty in household.

Kim (ISU) Survey Data Integration 7/11/2017 9 / 25

Page 10: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Common Indicators

Table: Common Indicators

Indicator Description

Daily Per Capita Expendi-tures (PCE)

Average daily per capita consumption con-stant 2010 USD

Prevalence of Poverty(PP)

Prevalence of poverty: percentage of peopleliving on less than $1.25 USD per capita perday

Mean Depth Poverty(MDP)

Average of the differences between totaldaily

Prevalence of Householdswith Hunger (HHS)

Prevalence of households with moderate orsevere hunger

Prevalence of Under-weight Women

Women that are eligible for BMI (not cur-rently pregnant and not within 2 months ofdelivery) who has BMI less than 18.5

Women’s Dietary Diver-sity Score (WDDS)

Mean number of food groups consumed bywomen of reproductive age (15-49 years)

Kim (ISU) Survey Data Integration 7/11/2017 10 / 25

Page 11: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Common Indicators

Table: Common Indicators (Cont’d)

Indicator Description

Prevalence of StuntedChildren

Prevalence of stunted children under fiveyears of age (0-59 months)

Prevalence of WastedChildren

Prevalence of wasted children under fiveyears of age (0-59 months)

Prevalence of Under-weight Children

Prevalence of underweight children underfive years of age (0-59 months)

Prevalence of Children Re-ceiving a Minimum Ac-ceptable Diet (MAD)

Prevalence of children 6-23 months receiv-ing a minimum acceptable diet

Prevalence of ExclusiveBreastfeeding (EBF)

Prevalence of exclusive breastfeeding of chil-dren under six months of age

Kim (ISU) Survey Data Integration 7/11/2017 11 / 25

Page 12: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Estimates from two surveys

Table: Daily Per Capita Expenditure

Department FFP/ICF FTF/UNC T-statisticsN Mean S.E. N Mean S.E.

San Marcos 1419 0.558 0.014 981 1.166 0.018 -23.376Totonicapan 1654 0.388 0.015 181 0.896 0.039 -5.505

Huehuetenango 877 0.456 0.023 1535 1.140 0.018 -30.587Quetzaltenango 628 0.695 0.022 60 1.325 0.112 -26.179

Quiche 1288 0.382 0.015 1350 1.045 0.015 -12.179

Kim (ISU) Survey Data Integration 7/11/2017 12 / 25

Page 13: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Estimates from two surveys

Table: Prevalence of Households with Hunger (%)

Department FFP/ICF FTF/UNC T-statisticsN Mean S.E. N Mean S.E.

San Marcos 1419 3.76 0.50 981 15.35 1.08 -9.733Totonicapan 1654 11.79 0.87 181 15.01 2.72 -1.125

Huehuetenango 877 8.91 0.91 1535 15.58 0.87 -5.323Quetzaltenango 628 6.84 0.91 60 9.94 3.96 -0.765

Quiche 1288 7.13 0.74 1350 9.73 0.77 -2.430

Kim (ISU) Survey Data Integration 7/11/2017 13 / 25

Page 14: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Data Structure

Table: Data Structure

X Ya Yb

Sample A o oSample B o o

Kim (ISU) Survey Data Integration 7/11/2017 14 / 25

Page 15: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Goal: Synthetic data imputation

Table: Data Structure

X Ya Yb

Sample A o o oSample B o o o

Kim (ISU) Survey Data Integration 7/11/2017 15 / 25

Page 16: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Methodology

Steps

1 Specify a measurement error model.

2 Derive prediction model using Bayes theorem.

3 Parameter estimation: EM algorithm.

4 Generating imputed values from the prediction model.

Kim (ISU) Survey Data Integration 7/11/2017 16 / 25

Page 17: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Step 1: Model specification

Assume that Sample A is a gold standard one. That is, Ya = Y .

Structural Equation model

Ya ∼ f1(ya | x ; θ1).

From the observations in Sample A, we can perform modeldiagnostics.

Measurement error model

Yb ∼ f2(yb | ya; θ2).

Assume nondifferentiability of measurement error model

f (yb | x , ya) = f (yb | ya)

For dichotomous y -variables, measurement error model becomesmisclassification model.

Kim (ISU) Survey Data Integration 7/11/2017 17 / 25

Page 18: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Step 2: Prediction model

Prediction model is the model for the counterfactual outcome,conditional on the observed values.

Prediction model for Yb in sample A:

p(yb | x , ya) = f2(yb | ya).

Prediction model for Ya in sample B: Using Bayes formula, we canderive

p(ya | x , yb) =f1(ya | x ; θ1)f2(yb | ya; θ2)∫f1(ya | x ; θ1)f2(yb | ya; θ2)dya

The prediction model can be used to obtain the best prediction of Yai

for i ∈ Sb.

Kim (ISU) Survey Data Integration 7/11/2017 18 / 25

Page 19: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Step 3: Parameter estimation - EM algorithm

E-step: compute

Q1(θ1 | data; θ(t)) =∑i∈Sa

wi ,a log f1(yai | xi )

+∑i∈Sb

wi ,bE{log f1(Ya | xi ) | xi , ybi ; θ(t)}

Q2(θ2 | data; θ(t)) =∑i∈Sa

wi ,aE{log f2(Yb | yai ) | x , yai ; θ(t))

+∑i∈Sb

wi ,bE{log f2(ybi | Ya) | x , ybi ; θ(t))},

where the conditional expectations are computed from the predictionmodel in Step 2.

M-step: update the parameters by maximizing Q1 and Q2 wrt θ1 andθ2, respectively.

Kim (ISU) Survey Data Integration 7/11/2017 19 / 25

Page 20: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Step 4: Best prediction

Using the measurement error model, we can predict yai byyai = E (Ya | xi , ybi ) for i ∈ SB .

A prediction estimation of µ = E (Ya) can be obtained by

µ∗ =

∑i∈SA wi ,ayai +

∑i∈SB wi ,byai∑

i∈SA wi ,a +∑

i∈SB wi ,b

Reference: Kim, Berg, and Park (2016). Statistical Matching usingfractional imputation. Survey Methodology, 42, 19–40.

Kim (ISU) Survey Data Integration 7/11/2017 20 / 25

Page 21: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Application to FANTA project

1 Model for PCE

yai = xiβ + ei

ybi = α0 + α1yai + ui

where ei ∼ N(0, σ2e ) and ui ∼ N(0, σ2u).

2 Model for HHS prevalence

yai ∼ Bernoulli(πi )

ybi ∼ Bernoulli{pyai + q(1− yai )}

where logit(πi ) = xiβ and p, q ∈ (0, 1).

Kim (ISU) Survey Data Integration 7/11/2017 21 / 25

Page 22: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Model Diagnostics for PCE model

-2 -1 0 1 2

-2-1

01

2

Fitted Values Vs Residuals

Fitted Values

Residuals

-4 -2 0 2 4

-2-1

01

2

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Kim (ISU) Survey Data Integration 7/11/2017 22 / 25

Page 23: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Result: PCE Indictor

Department FFP FTF Combined

San Marcos 0.558 1.165 0.563(0.030) (0.038) (0.026)

Totonicapan 0.388 0.895 0.331(0.030) (0.085) (0.028)

Quiche 0.382 1.045 0.396(0.030) (0.031) (0.026)

Huehuetenango 0.456 1.140 0.479(0.044) (0.036) (0.027)

Quetzaltenango 0.695 1.325 0.795(0.044) (0.232) (0.043)

Kim (ISU) Survey Data Integration 7/11/2017 23 / 25

Page 24: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Results for HHS indicator

Department FFP FTF Combined

San Marcos 3.76 15.35 3.77(1.01) (2.22) (1.00)

Totonicapan 11.79 15.01 12.08(1.70) (6.00) (1.60)

Quiche 7.13 9.73 7.19(1.50) (1.57) (1.42)

Huehuetenango 8.91 15.58 8.75(1.90) (2.00) (1.90)

Quetzaltenango 6.84 9.94 6.85(1.80) (8.25) (1.70)

Kim (ISU) Survey Data Integration 7/11/2017 24 / 25

Page 25: A measurement error model approach to survey data ...sae2017.ensai.fr/wp-content/uploads/2017/07/Survey-Data... · Concepcion Huista Jacaltenango San Antonio Huista ... Quetzaltenango

Concluding remark

Survey data integration using measurement error model is considered.

Prediction of the counterfactual outcome is obtained by Bayestheorem.

Parameter estimation involves EM algorithm.

Bayesian approach can be developed (not discussed here).

Extension to GLMM model for the structural equation model is underprogress.

Kim (ISU) Survey Data Integration 7/11/2017 25 / 25