random group variance adjustments when hot deck imputation is used to compensate for nonresponse...
TRANSCRIPT
![Page 1: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/1.jpg)
Random Group Variance Adjustments
When Hot Deck Imputation Is Used to Compensate for Nonresponse
Richard A. Moore
Company Statistics Division
US Census Bureau
Presented by
Samson Adeshiyan
![Page 2: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/2.jpg)
2
2002 Survey Of Business Owners(SBO) Primary Goal
• Provide Business Ownership Statistics– State– Industry – Demographic Group
• Race --- Native American, Asian, Black, Hawaiian/Pacific Islander, White, Public
• Ethnicity --- Hispanic, Non-Hispanic• Gender --- Female, Equal, Male
![Page 3: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/3.jpg)
3
SBO Primary Publication Level Statistics
• Black-owned Grocery Stores in North Dakota (ND)– Number– Aggregate Sales– Aggregate Payroll– Aggregate Employment
![Page 4: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/4.jpg)
4
What Do We Have?(Econ Census and Tax Returns)
• 5.5 mil. companies with paid employees– Receipts, Payroll, Employment– Geographic Codes– Industry Codes
• 17.5 mil. companies without paid employees– Receipts– Industry and Geography Codes
![Page 5: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/5.jpg)
5
What Are We MissingFor Each Business?
• Race of Ownership
• Ethnicity of Ownership
• Gender of Ownership
• Obtain this from a stratified sample of 2.5 million businesses
![Page 6: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/6.jpg)
6
Distribution At the US Level23 Million Companies
• Women --- 28%• Hispanic --- 7%
• Black --- 5%• Asian --- 5%• Native American --- 1%• Hawaiian/Pacific Islander --- 0.1%
![Page 7: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/7.jpg)
7
Problem 1: Need Sufficient Representation in the SampleBlack-Owned Groceries in ND
• 2002 Estimates– 78 Black-owned businesses in ND– 15 of these in Retail– Only 4 are Grocery Stores
• Can’t list groceries in ND in random order and sample systematically
![Page 8: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/8.jpg)
8
“Modeled Guess” Codes from Admin Info For Each Company
• Response from a Previous SBO• Population Distribution by ZIP Code• State/Industry Distribution in 1997 SBO• Owner’(s) Social Security Number when
Available – Race/Hispanic/Gender Codes on SSN Application– Surnames (e.g. LOPEZ or WANG)– Country of Birth (e.g. Korea or CUBA)– Decennial Responses
![Page 9: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/9.jpg)
9
Example
• Name …. Michelle Wie’s Pro Shop
• Modeled Guess …. Asian Female
• Likelihood-Race ……. 0.8912
• Likelihood-Hisp ……. 0.0012
• Likelihood-Female …. 0.9500
![Page 10: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/10.jpg)
10
Warning: Model is not 100% accurate
• Michelle Wie’s Pro Shop – Responds As White, Non-Hispanic,Male– Tabbed As White, Non-Hispanic,Male
• If Business response is inconsistent with modeled likelihoods, tabulate by the responses
• If a business does not respond, don’t directly infer responses from likelihoods
![Page 11: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/11.jpg)
11
Problem 2:Differential Response Rates
Between Demographic Groups
Owner Likelihood-Hispanic Response
Jose Martinez 0.985 Hispanic
John Martinez 0.940 ???
Jose’s Sub Shop 0.123 Non-Hispanic
Juanita Martin 0.060 Non-Hispanic
John Martin 0.040 Non-Hispanic
![Page 12: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/12.jpg)
12
Likelihoods Aid in Non-Response Adjustment
Likelihood-Hispanic Response Weight
1 0.985 Hispanic 4.0
2 0.940 ??? 4.0
3 0.123 Non-Hispanic 4.0
4 0.060 Non-Hispanic 4.0
5 0.040 Non-Hispanic 4.0
Response Rate Adjusted Hispanic-owned Est…5.0 (4.0 * 5/4)
Hot Deck Imputed Hispanic-owned Est … 8.0 (4.0 + 4.0)
![Page 13: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/13.jpg)
13
For Variance:Random Group Replication (RG)
• Considerable number of cases where the modeled guess disagrees with the actual response– Cases tabbed from other stratum– Considerable variability in the weights of
the tabulated cases
![Page 14: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/14.jpg)
14
Likelihoods Aid in Non-Response Adjustment
Like Response Weight RGRcts
1 0.98 Hispanic 4.0 1 10
2 0.94 ??? 4.0 2 1
3 0.12 Non-Hispanic 4.0 3 5
4 0.06 Non-Hispanic 4.0 4 6
5 0.04 Non-Hispanic 4.0 5 8
Imputed Hispanic Firms Est = 8 Imp Hispanic Receipts = 44
![Page 15: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/15.jpg)
15
For variance calculation:Wt Adjustment Method
Factors on Responding Firms• Firms
– Respondents Estimate = 4– Post Impute Estimate = 8– Weight Adjustment Factor = 2.0
• Receipts– Respondents Estimate 40– Post Impute Estimate = 44– Weight Adjustment Factor = 1.1
![Page 16: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/16.jpg)
16
Oh-Scheuren Adjustment Factor (1983)
r = # respondents
i = # imputed cases
n = i + r = total number of cases
V1 = variance with impute treated as reported
V2 = V1 * (n/r + i/n)
![Page 17: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/17.jpg)
17
Oh-Scheuren MethodProblems with Comparison
• Research developed for Jackknife not Random Group
• Calculate response rates for cell
• Best response for our example– Not Missing Random– True response rate is 4 of 5– Response rate for Hispanics is 1 of 2
![Page 18: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/18.jpg)
18
Donor Imputation Method(RG # Also Donated)
Likelihood Response Weight RG Receipts
1 0.98 Hispanic 4.0 1 10
2 0.94 ??? 4.0 2 1
1 0.98 Hispanic 4.0 1 10
2 0.94 Hispanic 4.0 1 1
Imputed Hispanic Firms Est = 8
Imputed Hispanic Receipts = 44
Only RG #1 is non -zero.
Same Estimates. Higher Variances.
![Page 19: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/19.jpg)
19
Advantages of Donating RG #
• No need to add multiple factors to record
• No need to calculate factors
• No problems for microdata users
![Page 20: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/20.jpg)
20
Compare the Ratios of the Variance of the three Methods
R1 = VAR(Oh-Scheuren) / VAR (Weighted Adjustment)
R2 = VAR(Donor) / VAR (Weighted Adjustment)
Mean for R1 and R2 across publication cells
Std Dev for each of the means of R1 and R2
Null Hypothesis: Ri = 1 (90% confidence)
![Page 21: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/21.jpg)
21
Ratio of Variances --- Firm Counts
* Not Statistically Significant from 1.00 at 90%
# Imputes Oh-Sch/ Wt Donor/Wt
1 to 3 1.148 0.984*
4 to 5 1.176 0.963
6 to 9 1.136 0.941
10 to 19 1.087 1.069
20 to 49 1.069 1.205
50 or more 1.053 1.367
![Page 22: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/22.jpg)
22
Ratio of Variances --- Receipts
* Not Statistically Significant from 1.00 at 90%
# Imputes Oh-Sch/ Wt Donor/Wt
1 to 3 1.230 0.958*
4 to 5 1.286 0.876
6 to 9 1.540 0.963*
10 to 19 1.541 0.914
20 to 49 1.499 0.900
50 or more 1.512 0.951
![Page 23: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/23.jpg)
23
Ratio of Variances --- Firm Counts
* Not Statistically Significant from 1.00 at 90%
Response Rate
Oh-Sch/ Wt Donor/Wt
45 to 55% 0.930 1.193
55 to 65% 1.076 1.182
65 to 75% 1.153 1.101
75 to 85% 1.130 1.043
85 to 95% 1.153 1.032*
![Page 24: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/24.jpg)
24
Ratio of Variances --- Receipts
* Not Statistically Significant from 1.00 at 90%
Response Rate
Oh-Sch/ Wt Donor/Wt
45 to 55% 1.790 0.902
55 to 65% 1.520 0.904
65 to 75% 1.465 0.940
75 to 85% 1.218 0.945
85 to 95% 1.153 0.954
![Page 25: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/25.jpg)
25
Are the differences acceptable?
Firm Count Variance Ratios Differ by 10%
Receipts Variances Differ up to 70%
=>
Firm Count Relative SEs Differ by about 5%
Receipts Relative SEs Differ by up to 30%
![Page 26: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/26.jpg)
26
Asian-Owned Retail Operationsin New Hampshire in 2002
Estimate Published RSE
Max Change
in RSE
Firms 210 23% + 1%
Receipts $70 Mil 19% + 6%
![Page 27: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census](https://reader030.vdocuments.net/reader030/viewer/2022032709/56649ed95503460f94be81de/html5/thumbnails/27.jpg)
27
Lingering Question
Is the donation of the RG Number sufficient or do we need to augment the resulting variance with a factor (similar to the Oh-Scheuren factor)?