sas government analytics leadership forum · through record linkage instead of collection - result:...

15
SAS Government Analytics Leadership Forum April 2018 Anil Arora, Chief Statistician of Canada

Upload: others

Post on 12-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

SAS Government

Analytics Leadership

Forum

April 2018

Anil Arora, Chief Statistician of Canada

Page 2: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

• Translating data into evidence for 100

years

• Using statistical science and sophisticated

methods to produce reliable information

about Canadians

• A lot goes on behind the scenes to

produce the census…

Statistics Canada

2

Page 3: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

3

Census: Behind the Scenes

Page 5: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

5

Statistics Canada is undertaking a significant

transformation and leading efforts to be more responsive to the data needs of policy leaders by:

Moving beyond a survey-first approach with new methods and integrating data from a variety of existing sources

Making data easier to access and use by adopting new tools to analyze and visualize data

Enabling Canadians to use data to make evidence-based decisions

Page 6: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

6

Design and collection

Optimize designs and processes (samples, collection, coding, record

linkage)

Processing and inference

Statistical error detection and correction, weighting, weight

adjustments, use of statistical models

Analysis

Time series analysis, statistical data validation and confrontation, data

interpretation

Consumption

Supporting quality decisions by citizens, their governments and businesses based on evidence

Statistical analysis is at the center of every

step in the cycle of translating data to

evidence

G-SAM, G-CODE, G-LINK

BANFF. CANCEIS

G-SERIESG-CONFID

All processing systems (G-SAM, etc.) are coded in SAS

Statistical analysis is critical to producing high quality

information in the most cost efficient manner

Dissemination

Measurement of accuracy, statistical disclosure control

(privacy)

Page 7: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

Leading-edge methods to

integrate new data types:

7

Model-based crop yield estimates

Page 8: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

Responding to rapidly evolving

policy needs:

January 11, 2018 print edition

8

8

Page 9: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

9

Statistics

Canada’s

linkable file

environment

Administrative data files

from departments,

agencies and crown

corporations

Existing survey and

administrative data files

at Statistics Canada

Basic descriptive

statistics

Before-after analysis

Cohort analysis

Linked file for ongoing

research

Integrating data to enable the

Horizontal Review of Innovation

and Clean Tech

✓ Gathering data efficiently and strategically

✓ Leveraging existing data holdings across

government

✓ Creating a new research dataset to allow

further analysis

Page 10: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

Evolving with the times

SAS first introduced at

Statistics Canada in the

late 1980’s

From: Character-

based green screens on the

mainframe

Primitive Windows user

interfaces

Enterprise Guide

Moving to: Visual

Analytics, Enterprise Miner and

Viya

Canadian Housing Statistics

Program• Trans Union data (43 mil. records)

linked to tax information (165 mil.) • 233 million possible pairs created• Runs in about 40 hours on the SAS

Grid• Would not be possible on a

dedicated Windows Server

10

Page 11: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

StatCan SAS Grid

11

- Started as a research project made up of 4 workstations

- Evolved to be the largest SAS Grid implementation in Canada:

- 16 Grid nodes each having 16 cores

- 256 compute cores and 60 Terabytes (TBs) of Shared File System

large record linkages

complex estimation processes

multi-dimensional tabulations

Continued improvement: using the StatCan SAS Grid and the new SAS application G-Tab Census, one can see a reduction in time of 95% when compared to creating the same table using the 2016 Tabulation system

Allows many processes to run concurrently:

Page 12: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

• Capacity to store, process and analyze Big Data

• Planned use-cases:

• CPI alternate data source

• Canadian Housing Statistics Program linkage

• Admin Data Lake

Pure Data Analytic (Netezza)

12

Page 13: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

Immigration-related variables:

Traditional: data was added through record linkage instead of collection

- Result: 24,000 hour reduction of respondent burden

AI: to fill in missing values, Machine Learning identified best combination of respondent characteristics to make corrections

- Result: complete data; up to 10% more accurate

Old and new: combining

traditional and AI methods

in the 2016 Census

13

OUTCOMES

NowMore accurate data for

IRCC policymakers

LaterProof of concept for

Census 2021

Page 14: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

14

SAS and Statistics Canada

Page 15: SAS Government Analytics Leadership Forum · through record linkage instead of collection - Result: 24,000. hour reduction of respondent burden. AI: to fill in missing values, Machine

THANK YOU!

For more information,

please visit

www.statcan.gc.ca

#StatCan100