managing quality, risk and cost in bls work with

28
Managing Quality, Risk and Cost in BLS Work with Alternative Data Sources John L. Eltinge U.S. Bureau of Labor Statistics CNSTAT Panel on Multiple Data Sources and State-of-the-Art Estimation Methods December 16, 2015

Upload: others

Post on 16-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing Quality, Risk and Cost in BLS Work with

Managing Quality, Risk and Cost in BLS Work

with Alternative Data Sources

John L. EltingeU.S. Bureau of Labor Statistics

CNSTAT Panel on Multiple Data Sources and State-of-the-Art Estimation Methods

December 16, 2015

Page 2: Managing Quality, Risk and Cost in BLS Work with

Acknowledgements and Disclaimer

The author thanks the organizers for the opportunity to present these topics to the CNSTAT panel. The ideas considered here have developed from many productive discussions with colleagues in the BLS, the federal statistical system, academia and the private sector over the past two decades.

The views expressed here are those of the author and do not necessarily reflect the policies of the U.S. Bureau of Labor Statistics.

2

Page 3: Managing Quality, Risk and Cost in BLS Work with

Overview

Context for

“Evaluation of Errors in Alternative Data Sources”

A. Changes in:

1. Stakeholder Expectations on Federal Statistical Data

2. Data Sources (Current and Alternative)

B. Session 1: Legal and Policy Issues

C. Most of today: Data Quality (Input and Output) 3

Page 4: Managing Quality, Risk and Cost in BLS Work with

Overview (Continued)

This Presentation: Quality in Broader Context Including:

I. Stakeholder Value of Official Stat as “Public Goods”

II. Data Quality

III. Risk Management

IV. Cost Structures

4

Page 5: Managing Quality, Risk and Cost in BLS Work with

I. Stakeholder Value of Official Statistics (and Methodology) as Public Goods

A. Standard definition of “public goods”

(conceptually distinct from broader “for the good of

the public”)

1. “Non-exclusive”

- Difficult or impossible to prevent use by all

- Methodology: Consistent with norms and

agency practice requiring high degree of

transparency

5

Page 6: Managing Quality, Risk and Cost in BLS Work with

I. Public Goods (Continued)

2. “Non-rivalrous”

- Use by one person/group does not reduce

value for others

- Methodology and other forms of knowledge and

innovation arguably have positive network effects

(Bramoulle & Kranton, 2007; Hess & Ostrom,2007)

Corollary for official statistics:

Many stakeholders, multiple utility functions

6

Page 7: Managing Quality, Risk and Cost in BLS Work with

I. Public Goods (Continued)

B. Common Issues with Public Goods

1. “Free rider” problems – someone needs to pay

2. Limitations on “market signals” can lead to

overproduction, underproduction, emphasis on

the wrong quality characteristics, other

inefficiencies

7

Page 8: Managing Quality, Risk and Cost in BLS Work with

I. Public Goods (Continued)

C. Historical Response of Official Statistics Agencies,

Related Stakeholders

1. Body of norms, standards and practices

2. Largely developed in the context of sample

surveys (and some administrative record

systems/registers)

3. Principles and Practices of a Federal Statistical

Agency, similar docs for other countries, UN8

Page 9: Managing Quality, Risk and Cost in BLS Work with

I. Public Goods (Continued)

4. Data quality:

a. Standard multidimensional characterization:

- Accuracy, relevance, timeliness, comparability,

coherence, accessibility (Brackstone, 1999)

b. Evaluate “accuracy” via total survey error models

Open questions: Distinguish among features inherent to “public good” status and stakeholder needs; artefacts of sample survey environment

9

Page 10: Managing Quality, Risk and Cost in BLS Work with

I. Public Goods (Continued)

D. Need to Explore Other Implications of the

“Public Goods” Literature – “Use” vs. “Option” Value

1. “Use value”– value from specific well-defined use

Ex: Use of Consumer Price Index to adjust Social

Security payments, many contracts

Ex: Department of Commerce (2015)

“Value of the American Community Survey”

10

Page 11: Managing Quality, Risk and Cost in BLS Work with

I. Public Goods (Continued)

2. “Option value” – value from possible future use

(Weisbrod, 1964; Arrow and Fisher, 1974; others)

a. Estimands: Some “usually not of interest, but

occasionally very important”

- Special variables (Brick, 2011)

- Special subpopulations: usually similar to larger

aggregates, but large deviations can

occasionally be very important (Fuller, 1999)11

Page 12: Managing Quality, Risk and Cost in BLS Work with

I. Public Goods (Continued)

b. Estimator robustness against specified types of

model failures, outliers, other conditions

Ex: Modeled economic relationships used in small

domain estimation – still holds in fall, 2008?

Ex: Systemic data quality problems:

alternative sources change definitions, data

files, (sub)pop coverage, incomplete-data

patterns, aggregation effects

12

Page 13: Managing Quality, Risk and Cost in BLS Work with

I. Public Goods (Continued)

Open Questions:

i. Articulate linkage between general quality measures

and concrete “use value” and “option value” for

specified key stakeholders

ii. For specified alternative source: Deliver substantial

“use value” or “option value” across a sufficiently

wide range of stakeholders, relative to full cost?

13

Page 14: Managing Quality, Risk and Cost in BLS Work with

II. Data Quality

A. Complementary Relationship (Per Panel Charge):

1. Input data (multiple sources; TSE extensions)

2. Output data (estimators from integrated data,

including small domain ests)

B. Substantial literature, including several presentations

today

14

Page 15: Managing Quality, Risk and Cost in BLS Work with

II. Data Quality (Continued)

C. Open Questions:

1. Extent to which we can link standard data quality

measures with “use value” or “option value”?

2. Feasible to address some elements of (1) through

Bayesian elicitation methods, as in O’Hagan et al.

(2006), Garthwaite (2013), others?

a. Utility functions of key stakeholders?

b. Priors on impact of design changes?15

Page 16: Managing Quality, Risk and Cost in BLS Work with

III. Risk Management

A. Concerns About Alternative Data Sources Often Focus on Risk

Apply Lessons from Risk-Management Literature?

B. Sources and Trajectories of Failure for Alternative Sources

1. Lose access to major third-party data source

2. Quality of source changes – possibly undetected

3. Production system incompatible with new system

of third-party data provider

4. Do not meet production schedule, quality standards

5. Increased disclosure risks (third-party information; generally

rich external data environment)

16

Page 17: Managing Quality, Risk and Cost in BLS Work with

III. Risk Management (Continued)

C. Per risk literature (Crockford, 1986; Perrow, 1999; Flyvbjerg

and Budzier, 2011), need systematic evaluation of:

1. Prospective causes of failure (system design flaws, single-

or multi-point events)

2. Timelines, costs for identification and recovery from failure

3. Impact of failure and recovery on stakeholders

4. Robustness of process against failure

- Esp. important for official statistics due to limited control

over third-party providers of alternative sources17

Page 18: Managing Quality, Risk and Cost in BLS Work with

III. Risk Management (Continued)

D. Fault-Tolerant Designs

1. Literature from engineering, computer science:

Denning (1976), Laprie (1985), Zhang, Gray and Gonzalez

(2004, 2005), Monkman and Schagaev (2013)

2. Extend to production of official statistics (especially in use of

alternative data sources)

Ex: Parallel production during transitions

Ex: Ability to use backup source

if proposed data source fails18

Page 19: Managing Quality, Risk and Cost in BLS Work with

III. Risk Management (Continued)

E. Utility and Risk: Disclosure Limitation

(Synthetic Data and Remote Access)

1. Validation/verification server process:

- Initial exploratory studies with public-use synthetic data

- Repeat “finalized” analysis with “real data” in safe

environment

Reiter (2014, 2015), Vilhubers (2015), others

2. Operational definition of “fault tolerance” here?

3. Features of (1) and external data environment that affect

degree of fault tolerance, as well as utility? 19

Page 20: Managing Quality, Risk and Cost in BLS Work with

IV. Cost Structures

A. Frequent Comment:

Unit-level incremental cost of data capture: Near zero

Issue: Fixed costs large, not well quantified

Open question: Methods and empirical results to

obtain sufficient information on cost structures

(appropriately amortized) to guide assessment of

cost-quality-risk trade-offs for specific design and

management decisions? 20

Page 21: Managing Quality, Risk and Cost in BLS Work with

IV. Cost Structures (Continued)

B. Ex: Production-level systems for data capture,

edit/imputation, integration, estimation

1. Cost components?

2. Amortization:

- Across product lines, agencies

(impact of standardization)?

- Over time (dynamic sources: uncertain duration)21

Page 22: Managing Quality, Risk and Cost in BLS Work with

IV. Cost Structures (Continued)

C. Ex: Backup processes for risk management:

Detect, mitigate, adjust production

- Loss of data source?

- Major change in source quality?

22

Page 23: Managing Quality, Risk and Cost in BLS Work with

IV. Cost Structures (Continued)

D. Alignment of cost structures with:

1. Revenue streams:

- Approximately constant

- Special appropriations for major transitions

2. Stakeholder expectations on investments

- “Runoff models” (per landline telecoms)

- Capital intensive (mostly intangible capital), with

explicit management of goals, investments,

timelines, milestones (more open questions)23

Page 24: Managing Quality, Risk and Cost in BLS Work with

IV. Cost Structures (Continued)

E. Technology development and transfer

1. General

a. Current state of development

(early – customized special cases vs.

well-developed/standardized)

b. Additional steps (resources) required

for “production level” implementation

2. Integration of work by agencies, others?24

Page 25: Managing Quality, Risk and Cost in BLS Work with
Page 26: Managing Quality, Risk and Cost in BLS Work with

V. Conclusions

A. Prospective Use of Alternative Data Sources for

Production of Official Statistics

1. Stakeholder value – public goods

2. Data quality

3. Risk management

4. Cost structures

26

Page 27: Managing Quality, Risk and Cost in BLS Work with

V. Summary (Continued)

B. Prospective guidance on quality and risk standards

for products using alternative data sources?

1. Processes for assessment and management of:

a. Input data quality (cf. Biemer et al., 2014)

b. Output data quality

c. Risk components

2. Transparent reporting to stakeholders:

a. Processes from (1)?

b. Numerical measures of quality, risk?27

Page 28: Managing Quality, Risk and Cost in BLS Work with

Contact Information

John L. EltingeAssociate Commissioner

Office of Survey Methods Researchwww.bls.gov/ore

[email protected]