data and metadata session 5 mark viney australian bureau of statistics 6 june 2007

37
Data and Metadata Data and Metadata Session 5 Session 5 Mark Viney Mark Viney Australian Bureau of Statistics Australian Bureau of Statistics 6 June 2007

Upload: francine-harrington

Post on 05-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Data and MetadataData and Metadata

Session 5Session 5

Mark VineyMark Viney

Australian Bureau of StatisticsAustralian Bureau of Statistics

6 June 2007

Page 2: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

What is Data?

Data is a defined, measured quantityTypes of statistical data

ƒ Raw dataƒ Microdataƒ Macrodata

Owners convert data from one type to another by cleaning, editing, imputing and aggregating during the data processing cycle

Page 3: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Raw Data

Data as collected from respondentƒ It may be:-

incompleteinconsistent

ƒ It may still require:-cleaningimputationfollow up with respondent

Page 4: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Microdata

raw data with initial problems removeddata coded to standard classificationsmay still contain identification of respondent

Page 5: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Macrodata

Data resulting from the aggregation of microdata

May include new data items:-ƒ totalsƒ averagesƒ percentagesƒ seasonally adjusted/trend dataƒ chain volume indices

Page 6: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Typically publishable dataƒ does not contain any respondent identification

ƒ confidentialised

Macrodata

Page 7: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Some Macrodata

0.6

1.0

1.7

but what does it mean?

Page 8: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

What is Metadata?

Metadata can be defined simply as data about data- Bo Sundgren 1973

Page 9: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

What is Metadata?

Data that describes ƒ statistical dataƒ describes processes ƒ describes resources and tools used in statistics production

Helps people interpret dataDirects systems to process data

Page 10: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Some Macrodata with Metadata

0.6 GDP (Chain Volume Measure), %Change Sep qtr 06 to Dec qtr 06, Trend, Australia

1.0 GDP (Chain Volume Measure), %Change Sep qtr 06 to Dec qtr 06, Seasonally Adjusted, Australia

1.7 Terms of Trade %Change Sep qtr 06 to Dec qtr 06, Seasonally Adjusted, Australia

Page 11: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Some Macrodata with Metadata

Page 12: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

How is metadata used?

tool for comprehension and understandingƒ provides meaning for numbers

tool for interpretation, facilitate acquisition of new knowledge

help find data and determine its fitness for usehelp develop new and improved processes

Page 13: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Types of Metadata

Passiveƒ documentation

Activeƒ used by systems to define the processing rules to produce outputs

ƒ can be re-used by several systems

Page 14: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Metadata - applying context to data

Describes attributes of dataCan describe:-

ƒ footnotesƒ Unitsƒ Scale/precisionƒ Publication, productsƒ Data users / suppliers ƒ Collection concepts, sources and methodsƒ Form definitions and question textsƒ Data Item definitionsƒ Quality

Page 15: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Metadata - applying context to data (cont)

Can describe:-ƒ Classificationsƒ processing rules

systemsprogramsdatabasesprocessesflowsservicesinterfaces

Page 16: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

When Collected

Units

Who provided?

Concept / meaning

Collections

Allowable values

Who owns the

definition

Dataitem

Dataitem MetadataDataitem Metadata

Time Period

Dataitem MetadataDataitem Metadata

Page 17: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Jan 2004

Years

Mark Viney

Age (of person)

Employment, Health

1 - 99

Australian Bureau of Statistics

Age

Dataitem MetadataDataitem Metadata

2003/2004

Dataitem Metadata (example)Dataitem Metadata (example)

Page 18: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

QuestionModules

Topics

CollectionInstruments

Populations

Data ItemDefinitions

Collections

Classific-ations

Products

DatasetsMacrodata & Annotations

Data Items

Dataset MetadataDataset Metadata

Page 19: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Dataset Metadata (example)Dataset Metadata (example)

Approved Building Jobs (from BAPS)

8752.08752.1 etc.

DwellingsHousing

Area (SLA+)Type of building

Type of work

Excludes any existing floor area or any part of building not bounded by walls

Form (e.g. BACS4)

Floor area created by the job (Square metres)

Building Activity Collection

Floor area commenced

during quarter

2344, 17, 5, 165, 360, 165, 162.47

n.a. , n.p.

Building Activity: Number, Value by

State by...

Page 20: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Metadata Standards

ISO 11179Dublin CoreSDMX

Page 21: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

ISO 11179

Standard structure of metadata repositoryMakes metadata accessible, visible and searchable

Provides understanding and reuse of data elements and definitions

System interoperability

www.iso.orgwww.metadata-standards.org/11179

Page 22: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

SDMX (Statistical Data and Metadata Exchange)

XML basedmodel to facilitate the exchange of statistical data and metadataƒ data combined with metadata

Data Cubes / Timeseries

www.sdmx.org

Page 23: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Dublin Core

www.dublincore.org

Developing metadata standards for discovery across domains

Defining frameworks for the interoperation of metadata sets

Page 24: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

XBRL - eXtensible Business Reporting Language

XML basedused for reporting of business based dataStandard Business Reporting

ƒ possible to produce respondent information direct from business softwarereduced provider burdenmore standard and consistent reporting from providers

www.xbrl.org

Page 25: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

What Metadata helps us achieve

Enforcement of standards to strategic inputs and outputs

Encourage planning and management of statistical activities

Reuseƒ single source of concept ƒ reduced need to reinvent and manageƒ reduced costs

Page 26: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

What Metadata helps us achieve (continued)

Qualityƒ consistent usageƒ common dialogueƒ improved understanding

Flexibility and ProductivityKnowledge Management

ƒ consistencyƒ comparability

Page 27: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Combining Data and Metadata

Select CODE,LABEL_SEX from CL_SEX;

CODE LABEL_SEX******* **************10 Males20 Females30 Persons

BASE TOTAL******* *******10 3020 30

Page 28: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Combining Data and Metadata

Select CODE,LABEL_STATE from CL_STATE;

CODE LABEL_STATE******* **************1 New South Wales2 Victoria3 Queensland4 South Australia5 Western Australia6 Tasmania7 Northern Territory8 Australian Capital Territory0 Australia

BASE TOTAL******* *******1 02 03 04 05 06 07 08 0

Page 29: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Combining Data and Metadata

Select * from MD_LABOUR;

CODE_SEX CODE_STATE EMPLOYMENT_RATE *************** ******************* ***************************10 6 77.320 6 72.130 6 74.0

Page 30: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Combining Data and MetadataSELECT LABEL_SEX,LABEL_STATE,EMPLOYEMNT_RATE FROM CL_SEX,CL_STATE,MD_LABOUR WHERE MD_LABOUR.CODE_SEX = CL_SEX.CODE AND MD_LABOUR.CODE_STATE = CL_STATE.CODE;

LABEL_SEX LABEL_STATE EMPLOYMENT_RATE *************** ******************* ***************************Males Tasmania 77.3Females Tasmania 72.1Persons Tasmania 74.0

Page 31: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Using Metadata

10000 Total income2000 Other Income

260 Income from hiring of equipment260 Income from hiring of equipment

270 Cartage and setup270 Cartage and setup

1000 Hire Services140 Other construction equipment

10 Compaction equipment20 Cranes30 Earthmoving equipment

180 Other income from hire services60 Event/exhibition goods and equipment70 Transport equipment

Page 32: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Using Metadata

CODE LABEL_STATE******* ******************10 Compaction equipment20 Cranes30 Earthmoving equipment60 Event/exhibition goods and equipment70 Transport equipment140 Other construction equipment180 Other income from hire services260 Income from hiring of equipment270 Cartage and setup1000 Hire services2000 Other income10000 Total income

BASE,DETAILED,SUBTOTAL,TOTAL

10 140 1000 10000 20 140 1000 10000 30 140 1000 10000 60 180 1000 10000 70 180 1000 10000260 260 2000 10000270 270 2000 10000

Page 33: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Metadata Driven Systems

These systems use metadata to direct and assist their functions ƒ Active Metadata

In general, this creates a huge advantage and level of flexibility over systems that do not do this.

The metadata may also be external to the system and used for other purposes and systems.

Page 34: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Reuse across systems

Metadata

Page 35: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Reuse across systems

Keep one copy of metadataƒ reduces confusion and ambiguityƒ reduces opportunities to get it wrongƒ reduces maintenanceƒ reduces complexity to end user

Page 36: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

invest in metadata and integrated metadata driven systems rather than point solutions

costs will be repaid many times over

avoid duplication as much as possible

ƒ or automate duplication to retain consistency and integrity

Key points

Page 37: Data and Metadata Session 5 Mark Viney Australian Bureau of Statistics 6 June 2007

Questions?Questions?