standards and interoperabilitystandards and interoperability dan gillman bureau of labor statistics...

32
What You Need to Know – Too Standards and Interoperability Dan Gillman Bureau of Labor Statistics FCSM Metadata Workshop Washington, D.C. 14 September 2018 1 U.S. BUREAU OF LABOR STATISTICS bls.gov

Upload: others

Post on 28-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • What You Need to Know – Too

    Standards and Interoperability

    Dan Gillman

    Bureau of Labor Statistics

    FCSM Metadata Workshop

    Washington, D.C.

    14 September 2018

    1 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

  • Outline

    Standards in General

    Interoperability

    Case for Standards

    Data Integration ScenarioDiscovery

    Data dictionary

    Methodology

    Overview of Statistical Metadata Standards

    2 — U.S. BUREAU OF LABOR STATISTICS • bls.gov2 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

  • 3 — U.S. BUREAU OF LABOR STATISTICS • bls.gov3 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Standards

    Many standards development organizations (SDO)

    Open standards built by a process that is

    Consensus-driven general agreement w/o sustained dissent

    Open any stakeholder can join

    Transparent process available for inspection

    Fair everyone has same rights

    Balanced stakeholders represent user community

    Includes ISO, W3C, NISO, DDI Alliance, UNECE

  • 4 — U.S. BUREAU OF LABOR STATISTICS • bls.gov4 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Standards

    Caveat -

    Many SDOs, many standards

    “Standards are great. There are so many of them!”

    – Karsten Rasmussen

    “Standards are useless; look at the second S!”

    – Adrienne Tannenbaum

  • 5 — U.S. BUREAU OF LABOR STATISTICS • bls.gov5 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Interoperability

    Interoperability – ability of one system to work independently with some or all of another system

    Applied often to computerized systems, but also to data

    Data interoperability – ability to use data from another source without help from that source

    Implies extensive metadata are available

  • 6 — U.S. BUREAU OF LABOR STATISTICS • bls.gov6 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Interoperability

    But, metadata are data, too

    Data interoperability must include metadata interoperability

    Does this require the metadata have metadata?

    Shared metadata model needed

    Standard

    Technical specification

    Minus that, data problem is just repeated

  • 7 — U.S. BUREAU OF LABOR STATISTICS • bls.gov7 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Standards – Why?

    Reduces or eliminates design steps

    Increases chances for interoperability

    Standards neither necessary nor sufficient

    Building systems – claims of conformity

    Conformance – Satisfaction of all requirements

    Systems can be built independently

    Allows system builders to achieve interoperability

  • 8 — U.S. BUREAU OF LABOR STATISTICS • bls.gov8 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Standards – Why? If your metadata system conforms to a specification

    I can build a system to read your metadata automatically

    I can write metadata in a format you can understand immediately

    But, if I use a different specification, then

    I have to translate your metadata into my specification and vice-versa

    May not be easy

    With 13 principal statistical agencies (minus OMB),

    – Possible translations: (13 choose 2) = 78

    – This is too complex; Need cooperation

  • 9 — U.S. BUREAU OF LABOR STATISTICS • bls.gov9 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Standards – Why?

    Adopting standards greatly reduces this problem

    There’s still the problem of the second S

    There may be many standards to choose among

    Let’s try to make sense of this problem

    Standards developed to solve certain problems – Scope

    Don’t use them beyond their scope

  • 10 — U.S. BUREAU OF LABOR STATISTICS • bls.gov10 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Standards Illustrated

    Through a data integration scenario

    Illustrate metadata “content” standards

    Focus on what can be described

    Not on how to build a system

    Overview, not detailed descriptions

    Include some about the groups developing the standards

  • 11 — U.S. BUREAU OF LABOR STATISTICS • bls.gov11 — U.S. BUREAU OF LABOR STATISTICS • bls.gov 2010-07-07

    Oaf Europe 11

    Scenario

    “America’s Safest Cities”

    by Zack O’Malley Greenburg

    26 October 2009 Forbes Magazine

    Rank cities by “livability”

    Workplace fatalities

    Traffic fatalities

    Violent crimes

    Natural disaster risk

  • 12 — U.S. BUREAU OF LABOR STATISTICS • bls.gov12 — U.S. BUREAU OF LABOR STATISTICS • bls.gov 2010-07-07

    Oaf Europe 12

    Scenario

    Rank MSAs based on

    Numerical ranking for each measure

    Sum of rankings

    Questions

    Can we find and understand relevant data?

    If so, where? how?

  • 13 — U.S. BUREAU OF LABOR STATISTICS • bls.gov13 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Scenario – Discovery

    Natural to ask if data can easily be found through search

    Quick answer – No

    Google searches not entirely successful

    – URLs provided for relevant web sites

    – Relevant data sets, no

    – Still had to search web sites to find data

    Discovery is a very hard problem

    Guarantee to find all resources on a particular subject??

  • 14 — U.S. BUREAU OF LABOR STATISTICS • bls.gov14 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Scenario – Discovery

    Another solution – data set registry or catalog

    Think – library card catalog

    But – on line

    Look at Data.Gov

    Many other catalogs in existence

    Museums – Smithsonian Museum of Natural History

    Libraries – Library of Congress

  • 15 — U.S. BUREAU OF LABOR STATISTICS • bls.gov15 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Discovery (Catalog) Standards

    Relevant standards

    Project Open Data Metadata Schema Data.Gov

    Dublin Core Metadata Initiative NISO, ISO

    MARC – MAchine Readable Catalog NISO, ISO

    ISO/IEC 11179 – Metadata registries ISO

    DCAT (Data Catalog Vocabulary) W3C

    DDI (Data Documentation Initiative) DDI Alliance

  • 16 — U.S. BUREAU OF LABOR STATISTICS • bls.gov 2010-07-07

    af Europe 16

    Scenario – Discovery

    Finding data – Discovery

    Workplace fatalities

    – Bureau of Labor Statistics

    Traffic fatalities

    – National Highway Traffic Safety Administration

    O

  • 17 — U.S. BUREAU OF LABOR STATISTICS • bls.gov17 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Problem

    How do we know to select particular data sets?

    Are there others?

    Need data dictionaries to be sure

  • 18 — U.S. BUREAU OF LABOR STATISTICS • bls.gov 2010-07-07

    af Europe 18

    Scenario – Data Dictionary

    Finding data – Discovery

    Workplace fatalities

    – Bureau of Labor Statistics

    – Data based on MSA

    – Data given as number, not rate

    Traffic fatalities

    – National Highway Traffic Safety Administration

    – Data based on city, not MSA

    – Based on rates

    O

  • 19 — U.S. BUREAU OF LABOR STATISTICS • bls.gov19 — U.S. BUREAU OF LABOR STATISTICS • bls.gov 2010-07-07

    af Europe 19

    Scenario – Data Dictionary

    Data Dictionary – for statistical data

    Contains

    Variables

    – or Measures

    – Code lists or Classifications

    Questions

    Maybe some methodology as well

    Description of variables needed at a minimum

    O

  • 20 — U.S. BUREAU OF LABOR STATISTICS • bls.gov20 — U.S. BUREAU OF LABOR STATISTICS • bls.gov 2010-07-07

    af Europe 20

    Scenario – Data Dictionary

    Variables, Measures, Classifications – needed for

    Selecting specific data sets

    Using selected data sets

    Level beyond discovery

    Most discovery models don’t account for this

    O

  • 21 — U.S. BUREAU OF LABOR STATISTICS • bls.gov21 — U.S. BUREAU OF LABOR STATISTICS • bls.gov 2010-07-07

    af Europe 21

    Data Dictionary Standards

    ISO/IEC 11179

    DDI

    Codebook

    Lifecycle

    UNECE

    GSIM (Generic Statistical Information Model)

    Inter-agency SCOPE/Metadata

    Data dictionary specification

    O

  • 22 — U.S. BUREAU OF LABOR STATISTICS • bls.gov22 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Scenario – Methodology

    Methodological issues

    Questions

    Sampling

    Post-collection processing

    Post-collection estimation

    These can affect analyses

    And there are standards to document these

  • 23 — U.S. BUREAU OF LABOR STATISTICS • bls.gov23 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Standards for Methodology

    DDI (Data Documentation Initiative)

    Codebook

    Lifecycle

    GSIM (Generic Statistical Information Model)

    GSBPM (Generic Statistical Business Process Model)

  • 24 — U.S. BUREAU OF LABOR STATISTICS • bls.gov24 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    SCOPE/Metadata

    SCOPE - Statistical Community of Practice and EngagementGroup to leverage common practice among agenciesReduce costs, Increase sharingFormed inter-agency group on metadata

    – Produced first data.gov specification– Geared towards statistical data sets– Produced data dictionary specification

    • Variables, Measures, Code Lists, and Classifications SCOPE/Metadata

    Meets bi-weeklyNeeds more participants

  • 25 — U.S. BUREAU OF LABOR STATISTICS • bls.gov25 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    ISO/IEC 11179

    http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html

    First standard on metadata, model based, reusable metadata

    Operational needs for a registry or catalog

    Standard built in 6 parts

    Used as input to DDI, GSIM, SDMX, and SCOPE/Metadata

    SDMX – Statistical Data and Metadata eXchange

    Freely available from ISO

    http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html

  • 26 — U.S. BUREAU OF LABOR STATISTICS • bls.gov26 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    GSIM and GSBPM

    Developed under UNECEUN Economic Commission for Europe

    Comprises Europe, Canada, and US

    Statistical cooperative program is world-wide

    Statistical metadata standards under Modernization efforts

    Many countries involved, especiallyAustralia, Canada, New Zealand, US

    France, Italy, Netherlands, Portugal, Scandinavia, Slovenia

  • 27 — U.S. BUREAU OF LABOR STATISTICS • bls.gov27 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    GSIM

    https://statswiki.unece.org/display/gsim/Generic+Statistical+Information+Model

    Model of statistical information objects4 main sections

    – Conceptual, Structural, Business, Exchange

    High level, conceptual model

    No bindings – not directly implementable

    Some effort to build implementable system (LIM)

    https://statswiki.unece.org/display/gsim/Generic+Statistical+Information+Model

  • 28 — U.S. BUREAU OF LABOR STATISTICS • bls.gov28 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    GSBPM

    https://statswiki.unece.org/display/GSBPM/Generic+Statistical+Business+Process+Model

    Outline of statistical life-cycle processes

    Eight main phases

    Each phase has subparts

    Adopted by agencies to classify IT efforts and systems

    https://statswiki.unece.org/display/GSBPM/Generic+Statistical+Business+Process+Model

  • 29 — U.S. BUREAU OF LABOR STATISTICS • bls.gov29 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    DDI

    DDI Alliance - https://www.ddialliance.org/ Consortium of data libraries, archives, producers, researchers Two threads

    Codebook – data dictionary, not reusable metadataLifecycle – GSBPM-based

    – reusable, extensive methodology, includes Codebook– GSIM profile

    Both bound to XML, so immediately implementable University and commercial software available Yearly user conferences: NADDI, EDDI

    https://www.ddialliance.org/

  • 30 — U.S. BUREAU OF LABOR STATISTICS • bls.gov30 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    SDMX

    https://sdmx.org/

    Managed by BIS, ECB, Eurostat, IMF, OECD, UNSD, WB

    For exchange of dimensional dataN-cubes, time series, other

    Based on XML, so implementable

    Complex learning curve

    Extensive installed base

    Yearly user conferences

    https://sdmx.org/

  • 31 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Questions

  • Contact Information

    32 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

    Dan Gillman(202) 691-7523

    [email protected]

    What You Need to Know –TooStandards and InteroperabilityOutlineStandardsStandards continuedInteroperabilityInteroperability continuedStandards –Why?Standards –Why? continuedStandards –Why? continuedStandards IllustratedScenarioScenario 2Scenario –DiscoveryScenario –Discovery continuedDiscovery (Catalog) StandardsScenario –Discovery continuedProblemScenario –Data DictionaryScenario –Data Dictionary continuedScenario –Data Dictionary continuedData Dictionary StandardsScenario –MethodologyStandards for MethodologySCOPE/MetadataISO/IEC 11179GSIM and GSBPMGSIMGSBPMDDISDMXQuestionsContact Information