1 selective data editing development & implementation q 2010 helsinki jörgen svensson process...
TRANSCRIPT
1
Selective data editingSelective data editingDevelopment & implementationDevelopment & implementation
Q 2010 Helsinki
Jörgen Svensson
Process Owner
Statistics Sweden (SCB)
Standardization at SCB
• Decentralized production
• Development of CBM:s
• Editing costly, 33% of budgets
• Data collection departments, 2006
• Standardization – the Lotta project, in 2006
22
3
Nine case studiesNine case studies
Purpose of the project:
• Try using selective data editing
• What is the potential gain using the method?
• Would it be possible to develop and use a common tool?
4
Some results from case Some results from case studiesstudies
SurveyReduction
%
Short term employment, private sector 60
Business activity indicators 50
Price indices in producer & import stages 50
Short term statistics, wages & salaries, private sector
40
Wage & salary structures in the private sector 25
Foreign trade (5)
Structural business statistics ---
SUSPICION
• SUSP(j, k) = Suspicion of variable j for unit k
• SUSP(j, k) = 0 if variable value falls within acceptance interval
• SUSP(j, k) → 1 as value deviates from acceptance limit
• 0 ≤ SUSP(j,k) ≤ 1
POTENTIAL IMPACT
• POTIMP = Potential impact
• POTIMP is weighted absolute difference between observed and predicted value :
• POTIMP(j ,k,d) =
for variable j, unit k in domain d wk is sampling weight, k(d) is domain indicator
• SELEKT supports several ways to establish predicted value: from time series data and from cross sectional analysis within homogenous groups of units
Flagging suspected errorsFlagging suspected errors
log(Potential impact)
log(Suspicion)
20
Flagged
LOCAL SCORE
Local (item) score LScore (j,k,d):
LScore (j,k,d) = SUSP(j,k)*|POTIMP(j,k,d)|*Cello(j,d)
Cello(j,d) is inversely proportional to the standard error based on previous data
GLOBAL SCORE
• Global (unit) score GScore(k) is obtained by aggregation of local scores
• LScore (k, j, d) → LScore (k , j) → GScore(k)
• → = Summation , Euclidian Summation or Maximum
• Only those units with GScore larger than a pre-decided threshold are followed up
SELEKT, EDIT SELEKT, EDIT and process dataand process data
1010
Implementation of SELEKT
So far three surveys:
• Business activity indicators
• Wage & salary structures in the private sector
• Commodity flow survey
1111
1212
Documentation
A General Methodology for Selective Data Editing