unleash the power of abs statistics through methodological ... · unleash the power of abs...

45
Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam Chief Methodologist March, 2017 Views expressed in this talk are those of the author and do not necessarily represent those of the Australian Bureau of Statistics 1

Upload: others

Post on 22-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Unleash the power of ABS statistics through methodological innovation

By Dr Siu-Ming Tam Chief Methodologist

March, 2017

Views expressed in this talk are those of the author and do not necessarily represent those of the Australian Bureau of Statistics

1

Page 2: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Outline

• ABS transformation

• Methodology transformation

– Big Data challenges

– Data integration and data fusion

– Data access

• Methodological innovation

– Borrowing strength over time

– Measurement error model

2

Page 3: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Drivers of change in ABS statistics

3

Need

for faster

decisions

Data

deluge

More

evidence

based

decision

making

Ageing

systems and

manual

processes

Growing

expectations

New

statistical

possibilities

and

opportunities

Page 4: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

ABS Transformation - Who are we transforming for?

Our partners

• Greater responsiveness

• Improved collaboration

• Quicker to market

• Less red tape

Our community

• Improved data matching

• Informed use of statistics

• Evidence based policy and programs

• Less burden on households and businesses

Our organisation

• Ongoing sustainability

• Greater influence and reach

• More dynamic – able to respond to future challenges

Our people

• Greater flexibility

• More satisfying work

• New skills and opportunities

• More diverse and engaged culture

Page 5: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Six dimensions of transformation to achieve ABS goals

Page 6: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Outline

• ABS transformation

• Methodology transformation

– Big Data challenges

– Data integration and data fusion

– Data access

• Methodological innovation

– Borrowing strength over time

– Measurement error model

6

Page 7: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Methodology Architecture (MA) - Tam (2014)

• Being part of EA, MA is a transformation plan for methodology

• Vision for ABS MA – To provide a set of methods that underpins the products and process

vision of the ABS Transformation program

• MA is supported by 5 key “rules of engagement” – Innovate

– Industralise

– Build capability

– Contemporise

– Build support

7

Page 8: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Methodology Transformation

8

Page 9: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Transformational change in methodologies

• From classical to contemporary statistical methods

– from Designed data to Found data

– from direct measurements to modelled data

– from single source to multiple sources

– from siloed data sets to integrated data sets

• From limited access of URFs to more liberal access

9

Page 10: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Outline

• ABS transformation

• Methodology transformation

– Big Data challenges

– Data integration and data fusion

– Data access

• Methodological innovation

– Borrowing strength over time

– Measurement error model

10

Page 11: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Data Deluge - Big Data opportunities

11

Page 12: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Big Data and Big Challenges - (Tam and Clarke, 2015)

• ABS objective

• Harness Big Data sources to create a richer, more dynamic and focused statistical picture of Australia for better informed decision-making

• Challenges • Business benefit • Privacy and public trust • Technological feasibility • Data acquisition • Data integrity • Methodological

soundness • How to make valid

statistical inferences • Tam (2015)

12

Page 13: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Big Data = Big Sources, but

not entirely foreign to official statisticians e.g. Administrative records, Scanner Data

• Behaviour metrics and online opinion – potentially large inherent statistical biases

13

Administrative Records

Tax Records

Medical Records

Bank Records

Commercial Transactions

Credit Card Transactions

Scanner Transactions

Online Purchases

Sensor Data

Satellite Imagery

Ground Sensor Data

Location Data

Behaviour Metrics

Search Engine Queries

Web Pages Views and Navigation

Media Subscriptions

Online Opinion

Social Media Comments

Twitter Feeds

Page 14: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Data Analytics – What problems are they trying to solve?

Page 15: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Machine Learning methods

15

Page 16: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Statistical methods

16

Page 17: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Big Inference – One possible approach - (Tam, 1987, 2015)

Using Big Data

• Use a sample to calibrate the Big Data (treated as “covariates”) using ground truths

• Calibrate using a linear model (with time varying coefficients – Dynamic Model)

• Estimate parameters (using Frequentist/Bayesian approaches)

• Predict the non-sampled values using the covariates

• Or use the Generalised Regression Estimation (GREG) framework for estimation (parameters estimated using design-based methods)

The simple case – no missing data nor covariates

17

Page 18: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

An ABS Pilot Study

To determine the feasibility of

Distinguishing crop types

Estimating area of land under each crop

Predict crop yield

from Earth Observations data

Page 19: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Barley or Wheat?

Region

Average Proportion Correctly Classified

Crop classification SE QLD 78.5%

Crop Presence Mallee 83%

Summary of Indicative Results

Page 20: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Survey Process Augmented by Big Data* *Big Data process augmented by survey data – Paul Biemer (2016)

20

Frame Population Sample

Randomization Observations

Data Integration and Processing

Modelling & Adjustment

Estimation Statistical Inference

Validation

Page 21: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Missing Covariate, and Missing Data challenges

21

Page 22: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Outline

• ABS transformation

• Methodology transformation

– Big Data challenges

– Data integration and data fusion

– Data access

• Methodological innovation

– Borrowing strength over time

– Measurement error model

22

Page 23: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Data integration vis-a-vis fusion

• Integration – Felligi and Sunter (1969)

• Fusion – Kim et al (2016)

23

One file – Sample AUB Note – ABS uses the EM algorithm to estimate the “m” and “u” probabilities (Samuels, 2012)

Page 24: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Outline

• ABS transformation

• Methodology transformation

– Big Data challenges

– Data integration and data fusion

– Data access

• Methodological innovation

– Borrowing strength over time

– Measurement error model

24

Page 25: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Data Utility versus Disclosure Risk for Unit Record Files

Disclosure Risk Data Utility

Protections

Ability in using the data to draw valid conclusions

Spontaneous Recognition

Matching risk

Higher risk for unit record

than aggregated data

Perturbation

Cell Suppression

Collapsing of Categories

Sampling

Record masking

Substitution of Values

25

Page 26: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

From “4 Safes” to the “Five Safes” Framework - (Richie, 2014)

Safe people

Safe project

Safe setting

Safe data

Safe output

Can the person be trusted to use the data appropriately?

Is the specific use of the data appropriate?

How does the mode of access limit the risk of disclosure?

How much protections are to be applied to the data?

How much controls are applied to ensure the output is non-disclosive?

A multidimensional approach to disclosure risk assessment Key Equation: Pr(D) = Pr(D|A)Pr(A)

26

Page 27: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Outline

• ABS transformation

• Methodology transformation

– Big Data challenges

– Data integration and data fusion

– Data access

• Methodological innovation

– Borrowing strength over time

– Measurement error model

27

Page 28: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Temporal modelling

• Shapes of curve is not constant over time

• Temporal modelling would seem appropriate – Dynamic linear model for production data

– Dynamic logistic regression for binary data

• Modelling the “beta” in GREG over time

– State Transition Equation

28

Page 29: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

State Space Modelling for Satellite Imagery data Tam(1987, 2015)

29

Page 30: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Outline

• ABS transformation

• Methodology transformation

– Big Data challenges

– Data integration and data fusion

– Data access

• Methodological innovation

– Borrowing strength over time

– Measurement error model

30

Page 31: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

31

Page 32: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Where survey sampling errors go if they are not removed ?

32

Page 33: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Better signal extraction using Structural Time Series model

• Explicit modelling of sampling error for survey estimates

– 𝑦𝑡 = ϑ𝑡 + 𝑢𝑡; ϑ𝑡 = 𝑇𝑡 + 𝑆𝑡 + 𝐼𝑡 – Modelling for trend (eg local linear trend), seasonal

effects (eg dummy seasonal) • Option 1 – put ϑ 𝑡|𝑡through the linear filters (eg X13 ARIMA)

to decompose into trend, seasonal effects and irregular (ABS option)

• Option 2 – use T 𝑡|𝑡 and 𝑆 𝑡|𝑡 as trend and seasonal effects estimates

• Benefit for seasonally adjusted estimates for areas with relatively small sample sizes

33

|ˆt t

Page 34: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

ACT Employment and Unemployment Estimates

Page 35: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Small Domain Estimation

35

Page 36: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

SDE with repeated surveys

36

Page 37: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Multiple data sources to improve survey estimates

• Borrowing strength from multiple sources (Harvey and Chung, 2000; Zhang and Honchar, 2016)

• Using Unemployment Benefit Claimant Counts to improve the LFS estimates – Exploit the correlation in the error covariance matrix

– Bivariate State Space Model (aka Seemingly Unrelated Time Series Equations model - SUTSE)

• Quality assurance tool for ABS LFS unemployment estimates

37

Page 38: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Seemingly Unrelated Time Series Equations Model (SUTSE)

Page 39: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Case study of Unemployment – LFS estimates vs benefit claimant count

Page 40: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

L_lfs_unemp_o-Slope

1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

-0.02

0.00

0.02

0.04L_lfs_unemp_o-Slope

L_cc_total-Slope

1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016

-0.025

0.000

0.025

0.050L_cc_total-Slope

Smoothed estimates of LFS unemployement trend slope (ν 1,𝑡|𝑇) vs Claimant Count Trend slope (ν 2,𝑡|𝑇)

Page 41: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Case study of Unemployment (cont.)

761.1 763.3

0.0

100.0

200.0

300.0

400.0

500.0

600.0

700.0

800.0

LFS SSM predicted

March 2016

95% low and high

Prediction total unemployment using known CC data

Page 42: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Concluding remarks

• Methodology innovation is fundamental to support ABS transformation

• Methodological transformational change – More measured use of statistical models – Reforming data access

• Methodological innovation – Use measurement error model for aggregate stats – Use of SSM for a number applications involving time

• Change programs – Need to build support and buy in from subject matter

colleagues

Page 43: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Selected References

Biemer. P. (2016). Key note address to the 2016 International Survey Error Workshop. Felligi, I, and Sunter, A.B. (1969). A theory of record linkage. Journal of the American Statistical Association, 64, 1183-1210. Harvey, A. and Chung, C.H. (2000). Estimating the underlying change in unemployment in the UK. Journal of the Royal Statistical Society, Series A,3, 303-339 Kim, J.K, Berg, E. and Park, T. (2016). Statistical matching using fractional imputation. Survey Methodology, 40, 19-40. Ritchie, F. (2014). Access to Sensitive Data: Satisfying Objectives Rather than Constraints . Journal of Official Statistics, 30, pp. 533-545. Samuels, C. (2012). Using the EM algorithm to estimate the parameters of the Fellegi-Sunter model for data linking. Tam, S-M. (1987). Analysis of a repeated survey using a dynamic linear model. International Statistical Review, 55, 63-73.

43

Page 44: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Selected references (cont’d)

Tam, S-M. (2014). Methodology architecture – a roadmap for new methodological directions in the Australian Bureau of Statistics. Journal of Official Statistics, 30, 371-375.

Tam, S-M. (2015). A Statistical Framework for Analysing Big Data. Survey Statistician, 72, 36-51

Tam, S-M. and Clarke, F. (2015) Big Data, Official Statistics and Some Initiatives by the ABS. International Statistical Review, 83, 436-448

Thomsen, I.B. (1973). A note on the efficiency of weighing subclass means to remove the effects of non-response when analysing survey data. Statistics Norway. Unpublished manuscript

Zhang, M. and Honchar, O. (2016). Predicting survey estimates by state space models using multiple data sources. Australian Bureau of Statistics. Unpublished manuscript.

44

Page 45: Unleash the power of ABS statistics through methodological ... · Unleash the power of ABS statistics through methodological innovation By Dr Siu-Ming Tam ... Big Data and Big Challenges

Questions?

45

[email protected]