data and metadata session 5 mark viney australian bureau of statistics 6 june 2007
TRANSCRIPT
Data and MetadataData and Metadata
Session 5Session 5
Mark VineyMark Viney
Australian Bureau of StatisticsAustralian Bureau of Statistics
6 June 2007
What is Data?
Data is a defined, measured quantityTypes of statistical data
ƒ Raw dataƒ Microdataƒ Macrodata
Owners convert data from one type to another by cleaning, editing, imputing and aggregating during the data processing cycle
Raw Data
Data as collected from respondentƒ It may be:-
incompleteinconsistent
ƒ It may still require:-cleaningimputationfollow up with respondent
Microdata
raw data with initial problems removeddata coded to standard classificationsmay still contain identification of respondent
Macrodata
Data resulting from the aggregation of microdata
May include new data items:-ƒ totalsƒ averagesƒ percentagesƒ seasonally adjusted/trend dataƒ chain volume indices
Typically publishable dataƒ does not contain any respondent identification
ƒ confidentialised
Macrodata
Some Macrodata
0.6
1.0
1.7
but what does it mean?
What is Metadata?
Metadata can be defined simply as data about data- Bo Sundgren 1973
What is Metadata?
Data that describes ƒ statistical dataƒ describes processes ƒ describes resources and tools used in statistics production
Helps people interpret dataDirects systems to process data
Some Macrodata with Metadata
0.6 GDP (Chain Volume Measure), %Change Sep qtr 06 to Dec qtr 06, Trend, Australia
1.0 GDP (Chain Volume Measure), %Change Sep qtr 06 to Dec qtr 06, Seasonally Adjusted, Australia
1.7 Terms of Trade %Change Sep qtr 06 to Dec qtr 06, Seasonally Adjusted, Australia
Some Macrodata with Metadata
How is metadata used?
tool for comprehension and understandingƒ provides meaning for numbers
tool for interpretation, facilitate acquisition of new knowledge
help find data and determine its fitness for usehelp develop new and improved processes
Types of Metadata
Passiveƒ documentation
Activeƒ used by systems to define the processing rules to produce outputs
ƒ can be re-used by several systems
Metadata - applying context to data
Describes attributes of dataCan describe:-
ƒ footnotesƒ Unitsƒ Scale/precisionƒ Publication, productsƒ Data users / suppliers ƒ Collection concepts, sources and methodsƒ Form definitions and question textsƒ Data Item definitionsƒ Quality
Metadata - applying context to data (cont)
Can describe:-ƒ Classificationsƒ processing rules
systemsprogramsdatabasesprocessesflowsservicesinterfaces
When Collected
Units
Who provided?
Concept / meaning
Collections
Allowable values
Who owns the
definition
Dataitem
Dataitem MetadataDataitem Metadata
Time Period
Dataitem MetadataDataitem Metadata
Jan 2004
Years
Mark Viney
Age (of person)
Employment, Health
1 - 99
Australian Bureau of Statistics
Age
Dataitem MetadataDataitem Metadata
2003/2004
Dataitem Metadata (example)Dataitem Metadata (example)
QuestionModules
Topics
CollectionInstruments
Populations
Data ItemDefinitions
Collections
Classific-ations
Products
DatasetsMacrodata & Annotations
Data Items
Dataset MetadataDataset Metadata
Dataset Metadata (example)Dataset Metadata (example)
Approved Building Jobs (from BAPS)
8752.08752.1 etc.
DwellingsHousing
Area (SLA+)Type of building
Type of work
Excludes any existing floor area or any part of building not bounded by walls
Form (e.g. BACS4)
Floor area created by the job (Square metres)
Building Activity Collection
Floor area commenced
during quarter
2344, 17, 5, 165, 360, 165, 162.47
n.a. , n.p.
Building Activity: Number, Value by
State by...
Metadata Standards
ISO 11179Dublin CoreSDMX
ISO 11179
Standard structure of metadata repositoryMakes metadata accessible, visible and searchable
Provides understanding and reuse of data elements and definitions
System interoperability
www.iso.orgwww.metadata-standards.org/11179
SDMX (Statistical Data and Metadata Exchange)
XML basedmodel to facilitate the exchange of statistical data and metadataƒ data combined with metadata
Data Cubes / Timeseries
www.sdmx.org
Dublin Core
www.dublincore.org
Developing metadata standards for discovery across domains
Defining frameworks for the interoperation of metadata sets
XBRL - eXtensible Business Reporting Language
XML basedused for reporting of business based dataStandard Business Reporting
ƒ possible to produce respondent information direct from business softwarereduced provider burdenmore standard and consistent reporting from providers
www.xbrl.org
What Metadata helps us achieve
Enforcement of standards to strategic inputs and outputs
Encourage planning and management of statistical activities
Reuseƒ single source of concept ƒ reduced need to reinvent and manageƒ reduced costs
What Metadata helps us achieve (continued)
Qualityƒ consistent usageƒ common dialogueƒ improved understanding
Flexibility and ProductivityKnowledge Management
ƒ consistencyƒ comparability
Combining Data and Metadata
Select CODE,LABEL_SEX from CL_SEX;
CODE LABEL_SEX******* **************10 Males20 Females30 Persons
BASE TOTAL******* *******10 3020 30
Combining Data and Metadata
Select CODE,LABEL_STATE from CL_STATE;
CODE LABEL_STATE******* **************1 New South Wales2 Victoria3 Queensland4 South Australia5 Western Australia6 Tasmania7 Northern Territory8 Australian Capital Territory0 Australia
BASE TOTAL******* *******1 02 03 04 05 06 07 08 0
Combining Data and Metadata
Select * from MD_LABOUR;
CODE_SEX CODE_STATE EMPLOYMENT_RATE *************** ******************* ***************************10 6 77.320 6 72.130 6 74.0
Combining Data and MetadataSELECT LABEL_SEX,LABEL_STATE,EMPLOYEMNT_RATE FROM CL_SEX,CL_STATE,MD_LABOUR WHERE MD_LABOUR.CODE_SEX = CL_SEX.CODE AND MD_LABOUR.CODE_STATE = CL_STATE.CODE;
LABEL_SEX LABEL_STATE EMPLOYMENT_RATE *************** ******************* ***************************Males Tasmania 77.3Females Tasmania 72.1Persons Tasmania 74.0
Using Metadata
10000 Total income2000 Other Income
260 Income from hiring of equipment260 Income from hiring of equipment
270 Cartage and setup270 Cartage and setup
1000 Hire Services140 Other construction equipment
10 Compaction equipment20 Cranes30 Earthmoving equipment
180 Other income from hire services60 Event/exhibition goods and equipment70 Transport equipment
Using Metadata
CODE LABEL_STATE******* ******************10 Compaction equipment20 Cranes30 Earthmoving equipment60 Event/exhibition goods and equipment70 Transport equipment140 Other construction equipment180 Other income from hire services260 Income from hiring of equipment270 Cartage and setup1000 Hire services2000 Other income10000 Total income
BASE,DETAILED,SUBTOTAL,TOTAL
10 140 1000 10000 20 140 1000 10000 30 140 1000 10000 60 180 1000 10000 70 180 1000 10000260 260 2000 10000270 270 2000 10000
Metadata Driven Systems
These systems use metadata to direct and assist their functions ƒ Active Metadata
In general, this creates a huge advantage and level of flexibility over systems that do not do this.
The metadata may also be external to the system and used for other purposes and systems.
Reuse across systems
Metadata
Reuse across systems
Keep one copy of metadataƒ reduces confusion and ambiguityƒ reduces opportunities to get it wrongƒ reduces maintenanceƒ reduces complexity to end user
invest in metadata and integrated metadata driven systems rather than point solutions
costs will be repaid many times over
avoid duplication as much as possible
ƒ or automate duplication to retain consistency and integrity
Key points
Questions?Questions?