statistical expertise for sound decision making quality assurance for census data processing...
TRANSCRIPT
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 1
Quality Assurance for Census Data Processing
Jean-Michel Durr
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 2
Overview• Data processing cycle
• Quality Assurance for Processing:– Objectives– QA Framework:
• Quality management system• Setting the minimum standard• Continuous quality improvement
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 3
Data processing cycle
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 4
The data processing cycle• Sequence of activities between the
enumeration phase and the dissemination phase
• Data processing cycle involves many different interdependant activities
• Largely depends on the technology used: for ex. Coding may take place before of after data capture
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 528/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 6
Data processing cycle• Receipt and registration:
– Forms received at the processing centres are registered to ensure that all enumeration areas are accounted for
– Need to coordinate with managers in field operations to monitor the deliveries
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 7
Data processing cycle• Preliminary checking:
– Regardless of the technology employed, some type of checking of the forms is necessary
– Can vary from superficial checks to ensure that the forms are in adequate condition to be read by scanners to transcription of damaged forms and manual editing of responses
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 8
Data processing cycle• Coding:
– Coding assigns classification codes to responses on the census form
– Coding can be an automated system, computer assisted, clerical or a combination of all three
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 9
Data processing cycle• Data capture:
– system used to capture information from the census form and create a computer data file
– Can include:• Key entry• Optical mark recognition• Intelligent character recognition• PDA/Internet
28/1/2011
Statistical Expertise for Sound Decision Making
10
Data processing cycle• Editing:
– Procedure for detecting errors in and between data records, during and after data collection and capture, and on adjusting individual items
– Systematic inspection of invalid and inconsistent responses, and subsequent manual or automatic correction, according to predetermined rules
28/1/2011 Fourth meeting of the TCG - Lubjana
Statistical Expertise for Sound Decision Making
11
Data processing cycle• Validation:
– Validation is the final check of data to ensure that the quality of the data meets agreed minimum standards
– Tabulations of the final database:• To ensure internal coherence• To compare with other sources• (see demographic methods)
28/1/2011 Fourth meeting of the TCG - Lubjana
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 12
Quality Assurance
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 13
Quality Assurance• Objectives:
– During the processing of census data, assuming that the criterion of relevance has already been met, the emphasis should be on:
• Data accuracy• Budget• Timeliness
– Necessary trade off between the three
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 14
Quality Assurance Framework• Quality management system
• Setting the minimum standard
• Continuous quality improvement
28/1/2011
Statistical Expertise for Sound Decision Making
15
Quality Management System• Units of work selected:
– Too costly to control all units => use of sampling
– Not only clerical work but also automated processes (OCR) should be included
– Outsourced or not
• Method of operation• Rejected units of work
28/1/2011 Fourth meeting of the TCG - Lubjana
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 16
Use of sampling• Basic rules:
• Sampling rates relatively high at the beginning gradually decreasing as operators become more proficient
• All operators should have their first workload (e.g., EA) sampled
• More proficient operators subject to a lower sampling rate
• All operators should have some of their work sampled over the complete life cycle of the process
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 17
Use of sampling• Basic rules:
• Sampling rates may be increased towards the end of a process so that the quality of work does not suffer as staff lose interest in the process as it comes to an end
• Complex processes (e.g., coding occupation or industry) should be sampled at a higher rate than simpler processes (e.g., coding birthplace or religion)
28/1/2011
Statistical Expertise for Sound Decision Making
18
Use of sampling• Basic rules:
– Initial sampling units should be based on operational efficiency:
• If the basic workload is an EA, the sample should first be based on a percentage of EAs.
– The sample can then be further refined to a percentage of households within those EAs…
28/1/2011 Fourth meeting of the TCG - Lubjana
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 19
Method of operation• Will depend on the process (data capture,
coding…):– A sample can be reprocessed by another
operator, then comparison and inspection by a supervisor to determine the correct code
– A sample can be directly controlled by a supervisor
– A sample of forms captured by OCR can be captured by key entry and compared
– A computer programme for editing and imputation can be controlled using tabulations
28/1/2011
Statistical Expertise for Sound Decision Making
20
Rejected units• In general, rejected units of work are not
reprocessed • Except in cases where they do not meet
the defined minimum standard• Because the benefit is generally not
justified by the cost:– Coding discrepancy rate 10%, sample rate
10%, correcting all the discrepancies would only reduce the overall discrepancy rate for that topic to 9 per cent.
28/1/2011 Fourth meeting of the TCG - Lubjana
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 21
Setting the standard• Measurable criteria for each part of the process
where the output can be flagged as either “pass” or “fail”
• Based on results of previous censuses, tests, other surveys or international comparisons
• Trade off accuracy-timeliness-costs: avoid over-quality
• Important to prioritize:– Some variables are more important– Some sub-populations are more important
• Define limits: Correct / Acceptable
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 22
Setting the standard• Examples:
– Receipt: 100% EAs received– Data capture (OCR):
• 99,9% for sex, date of birth• 98% for other variables
– Coding:• 95% for occupation• 99% for municipality
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 23
Continuous Quality Improvement
• Core component of QA, different from QC• Data processing phase lasts enough to be
improved• Four steps:
– 1. Measure quality– 2. Identify the most important problems– 3. Identify the root causes of these important
quality problems– 4. Implement corrective action
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 24
Step 1: Measure Quality• Regular reports:
– At individual or team level– Every week or fortnight– Time series of indicators to show the
trend:• Rates of discrepancy• % of work units accepted
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 25
Ex. 1: Clerical coding
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 26
Ex. 2: OCR data capture before/after correction
28/1/2011
1 2 3 4 5 6 7 8 9 10 11
0.000
0.020
0.040
0.060
0.080
0.100
0.120
0.140
0.160
0.180
0.200
0.220
0.240
0.260
Sex
Company Limit Comp. corr
LS
Err
or r
ate
(in %
)
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 27
Ex. 3: OCR data capture
28/1/2011
1 2 3 4 5 6 7 8 9 10 110.000
0.020
0.040
0.060
0.080
0.100
0.120
0.140
0.160
0.180
0.200
0.220
0.240
0.260
Activity Status
Company Limit
LS
Err
or r
ate
(in %
)
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 28
Step 2: Identify most important pb.• Most frequent discrepancies• Most problematic:
– Error on the first digit is more problematic than on the last digit of a 4 digits classification
• Reports should provide information
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 29
Step 3: Identify root causes• Staff working in a process are in the
best position to advise about how that process can be improved
• Case reporting forms to describe problems and provide suggestions
• Quality improvement team / facilitator
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 30
Step 4: Implement corrective action• Possible corrective actions:
– Changes to procedures– Changes to the processing systems– Retraining or additional training– Reminders about particular procedures
sent to staff– Changes to coding indexes
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 31
Step 4: Implement corrective action
– Before any corrective action is implemented, the implications must be carefully considered
– Decision should be made at a high management level
– Could be done through a quality management steering committee
28/1/2011
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 32
References• UN Handbook on Census
Management for Population and Housing Censuses
28/1/2011
• Eurostat Handbook on Data Quality Assessment Methods and Tools
Statistical Expertise for Sound Decision Making
Fourth meeting of the TCG - Lubjana 33
Thank you !
28/1/2011