migration of a large survey onto a micro-economic platform val cox april 2014
TRANSCRIPT
![Page 1: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/1.jpg)
Migration of a large survey onto a micro-economic platform
Val CoxApril 2014
![Page 2: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/2.jpg)
Micro-economic Platform (MEP)
Standardises and automates processes
- Provides more efficient processing, more analysis
Enables Statistics NZ to gain more from available data
- Basic principle: use administrative data wherever possible, with surveys filling the gaps
- Objective: bring core information about every business in the economy into the Longitudinal Business DB to allow Statistics NZ to respond quickly to changing needs for economic statistics2
![Page 3: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/3.jpg)
Aim of paper
To discuss the challenges of building a non-response imputation package for a large survey on the MEP
- Rationalises the use of Banff for outlier detection and imputation
SEVANI (System for Estimation of Variance due to Nonresponse and Imputation) to estimate sampling and non-sampling errors
3
![Page 4: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/4.jpg)
Annual Enterprise Survey(AES)
Provides statistics on the financial performance and position of New Zealand businesses
- Captures about 90% of New Zealand's GDP
Uses four different major data sources
- Three administrative (covers 72% of the population)
- One postal survey
4
![Page 5: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/5.jpg)
AES before MEP
5
![Page 6: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/6.jpg)
Editing strategy of AES on MEP
Guided by the Methodological Standard for E&I
Key objective of standard
- Editing is fit-for-purpose and enables continuous improvement of processes and data quality
Key principles used- Automate editing processes where possible
- Use Statistics NZ standard editing tools, wherever possible, to achieve standardisation
6
![Page 7: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/7.jpg)
Editing system of AES in MEP
Uses Banff to automate and standardise editing and imputation processes
Uses analytical views to assess the quality of the edited data
7
![Page 8: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/8.jpg)
Challenges and solutions
A. Sheer volume of data
- 28 questionnaires, 113 industries and 180 variables
Solution: Use of a “thin slice” approach- Restrict dataset to one questionnaire and one
industry to show all stages of E&I are working
- Once successful, expand dataset to include more industries until all 28 questionnaires are replicated
- Successful in determining optimal level of automation for correcting failed edits
8
![Page 9: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/9.jpg)
Challenges and solutions
B. Determining which variable is erroneous when groups of variables must add or subtract to a total
- Banff “errorloc” procedure always recommends to change one variable by a large amount
- Change is done by “deterministic” procedure
Solution: Assign weights to variables- Assign lower weights to more reliable variables so
Banff doesn’t change their values
Examples: totals, gross profit, since respondents use this to determine the tax they pay
9
![Page 10: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/10.jpg)
Challenges and solutions
C. Outlier detection
- Old system detects outlier in 3 key variables but unlinks whole unit (all variables)
- Banff does univariate outlier detection
Solution: Compared 2 E&I runs of data
- 1st run had only the 3 key variables set as outliers and 2nd had all variables included in outlier steps
- Decision: Choose variables to be set as outliers based on the effect on the totals
10
![Page 11: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/11.jpg)
Challenges and solutions
D. Running imputation one variable at a time would have been very time-consuming
Solution: Group variables- By imputation method (4 methods)
- By industry (some industries have different characteristics)
- By type of variable (e.g. some variables can be negative)
11
![Page 12: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/12.jpg)
Challenges and solutions
E. Imputation failed for some variables
- Some imputation cells were too small
Solution: Merged small imputation cells- Each imputation stage was run twice, the first
without cell merging and the second with cell merging, resulting in 8 imputation stages
- Use of a “catch-all” stage at the end (9th stage) to carry out mean imputation by industry
12
![Page 13: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/13.jpg)
Challenges and solutions
F. Challenges with no solutions
- Analysis of improvements in the E&I was slow as it took several hours to run E&I and write back to the main data storage area to view data in a cube
- Attempt to replicate published results as closely as possible created a dilemma: When to stop trying?
- What was the “right” answer?
13
![Page 14: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/14.jpg)
SEVANIProvided a standardised and automated method to report on estimates of variances due to sampling as well as non-response and imputation
Challenges:- Can produce output for one variable at a time- SEVANI required a lot of parameters to set-up
- MEP is unit-based so can’t easily output SEVANI results
Solution:- Use of a macro to identify variable names- Created a SAS code to set-up parameters- Output SEVANI results outside MEP
14
![Page 15: Migration of a large survey onto a micro-economic platform Val Cox April 2014](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649c6f5503460f94922021/html5/thumbnails/15.jpg)
Next steps
Educate the users of the new system on MEP
Identify potential areas to make improvements in the editing and imputation system
Create a new MEP collection for Charities data to include its own editing and imputation system
15