uk data warehouse work 23 rd may 2012 paul tutton, sarah ravenhill

18
UK Data Warehouse Work 23 rd May 2012 Paul Tutton, Sarah Ravenhill

Upload: stephen-holmes

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

UK Data Warehouse Work23rd May 2012

Paul Tutton, Sarah Ravenhill

Outline

1. Background

2. Approach

3. Warehouse Concepts

4. Prototyping & Modelling

5. Data Harmonisation

6. Recommendations and Next Steps

1. Background

Oth

er S

ervi

cesData Sources

Data Repository

StagingOperational Data Store

Data Consumers

2. Approach

What do we want? How do we want to work?

Does that work? Build it and see

What can we put in there?How would we implement one?

What are the costs and benefits?

3. What and How

Define

Interrogate

Store

DataAnd

Metadata

Validate

Derive

Aggregate

Input & Update Extract

FindGaps

4. Build It…

Integrate data from multiple

sources

Make extracts to support

current and new statistics

Define a method for describing

extracts

Identify gaps in extracts

Automate choice between or combination

of sources

FAKE

Source Level Indicators

Variable Level Indicators

Rate my data – what are we consistently suspicious of?

4. …and See

• Warehouses work• Statistical processes must change• Shared Information Models are important• Think about the minimum acceptable amount

of data

5. Assess Potential

Conceptual Overlap

Meaning of the Data

Dataset Shape

Shape of the population

Statistical Activity

Process surrounding

the data

Harmonisation Analysis

5. Analysis Steps

List your sources

Describe variables

Pool the list

Find the concepts

Classify variables

Assess results

5. Overlap findings

Exact Replication

Conceptually Close

Otherwise Derivable

General Feasibility Combinations

Small numbers found

• Employee Count• Employment• Foreign Investment• Hours/ Pay• Pension Schemes

5. Example Concepts

• Acquisitions/ expenditure• Business Operation• Business Structure• Comments/ Narrative• Disposals/ Income

• Profit/ Loss• Statistical Units• Stock• Taxes/ National Insurance• Turnover

5. Interview Findings

Pooling data:

May assist imputation

Enables consist stories across outputs

Allows congruence checking at unit level

Is more useful if it exposes timelier sources to output managers

Is of more benefit for some subjects than others (e.g. employment over finance)

6. Recommendations and Next Steps

• Continue development of CIM• Analyse extent of process change due to

movement away from survey silos• Implement a warehouse in stages:

Integrate storage first

De-duplicate and harmonise once integration is complete

Consider the addition of statistical processing facilities to reap further benefits