metadata common vocabulary a journey from a glossary to an ontology of statistical metadata, and...

21
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar ([email protected] ) Statistics Portugal Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Lisbon, 11 – 13 March, 2009

Upload: dortha-wade

Post on 14-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

Metadata Common Vocabularya journey from a glossary to an ontology of

statistical metadata, and back

Sérgio Bacelar ([email protected])Statistics Portugal

Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS)

Lisbon, 11 – 13 March, 2009

Page 2: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

2

DefinitionsSDMX and SDMX Content-Oriented Guidelines (COG)Metadata Common Vocabulary (MCV)

Concepts and related definitions used in structural and reference metadata of international organizations and national data producing agencies.

Content Oriented Guidelines = MCV+ Cross Domain Concepts (subset of MCV) + Statistical Subject-matter DomainsLast version (2009): 397 terms.Goal: uniform understanding of standard metadata concepts.

Page 3: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

3

ESSnet on SDMX• Objective

– Further development of SDMX• Further development and improvement of the SDMX

Content-oriented Guidelines• Metadata Task Force on SDMX (Statistics Portugal)• WP Proposal: MCV Ontology

• Metadata Common Vocabulary (MCV)• Semantic univocity design of a conceptual model of the

domain• Detecting eventual inconsistencies, redundancies or

incompleteness of the glossary• Lack of structure, flat list, non-hierarchic relations between

terms• No semantic relations between terms

Page 4: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

4

Conceptual system

Building a glossary implies usually a previous design of a conceptual model of the respective domain.

• Proposal for a revision of MCV– Starting with the existent terms and definitions– creating semantic relations between terms based on

the definitions of the MCV terms• (bottom-up or middle-out strategy):

– Goal: reveal the latent conceptual system, detecting eventual structural incongruence or redundancies.

Page 5: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

5

Conceptual system and Concept Map

• Main goals– find redundancies, inconsistencies, omissions, terms

belonging to other domains different from statistical metadata (justified by the complex and interdisciplinary nature of metadata).

– To find omitted terms (important and relevant), is necessary to analyze the definitions of the concepts.

• Bearing this in mind we built a “Concept Map” representing about 20% of the terms in MCV (draft version).

• A concept map is a diagram showing the relationships among terms/concepts. Concepts are connected with labeled arrows, in a downward-branching hierarchical structure.

• Visualization (graphical): difficult since there is a great number of terms and relations.

Page 6: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

6

Concept Map (partial view)

Page 7: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

7

Concept Map (partial view)

Page 8: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

8

Terms and relations between MCV terms/concepts

Concept_1 relation Concept_2Accessibility characteristic_of QualityAccounting basis type_of Methods / procedures /conventionsAccouting conventions same_as Accounting basisAccuracy characteristic_of QualityAdjustment type_of Compilation practicesAdjustment methods same_as AdjustmentAdministrative data has_a Administrative sourceAdministrative data type_of DataAdministrative data collection colection_of Administrative dataAdministrative item part_of Administrative recordAdministrative record part_of Administrative dataAge attributeOf PersonAgency or organisation typeOf Analytical unitAgency or organisation has CommentAggregation group_of Category

Page 9: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

9

Using Resource Description Framework (RDF)

RDF is a framework for representing information in the Web.RDF is particularly concerned with meaning.RDF is a collection of triples, each one consisting of a subject, a predicate and an object: e.g. “MetadataExchange is-a DataAnd MetadataExchange”

Page 10: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

10

Middle range solutionUsing SKOS (Simple Knowledge Organization

System)- currently developed within the W3C framework

Bridging technology between “chaos” and more rigorous logical formalism of ontology languages (like OWL).It is an application of the Resource Description Framework (RDF) providing a model for expressing the basic structure and content of concept schemes such as thesauri.

Page 11: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

11

SKOS example: concept -data<rdf:RDF

...........<skos:Concept rdf:about=http://www.mycom/#data>

<skos:definition>Characteristics or information, usually numerical, that are collected through observation</skos:definition>

<skos:prefLabel>data</skos:prefLabel><skos:altLabel></skos:altLabel><skos:broader rdf:resource="http://www.my.com/#information"/> <skos:related rdf:resource="http://www.my.com/#Characteristic"/>

<skos:scopeNote>Data is the physical representation of information in a manner suitable for communication, interpretation, or processing by human beings or by automatic means (Economic Commission for Europe of the United Nations (UNECE), "Terminology on Statistical Metadata", Conference of European Statisticians Statistical Standards and Studies, No. 53, Geneva, 2000).</skos:scopeNote>

</skos:Concept></rdf:RDF>

Page 12: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

12

Ontologies

Ontology = explicit formal specifications of the terms in the domain (statistical metadata) and relations among them. It is a model of reality in the world (created using an iterative design)

Using an editing and modeling system of ontologies like Protégé (open source software in http://protege.stanford.edu )

Page 13: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

13

Ontologies reasoningIt is essential to provide tools and services

(reasoners) to help users answer queries over ontologies and classes and instances, e.g.:

find more general/specific classes;retrieve individual matching an existing

queryex. Is there any survey with

trimestral frequency that uses any classification system and has a dissemination format as an on-line database?

Page 14: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

14

Ontologies - methodology

Developing an ontology:

1. Defining classes

2. Arranging classes in a taxonomic hierarchy (classes and subclasses)

3. Defining slots (same as roles or properties)

4. Describing allowed values for these slots (facets, role restrictions)

5. Filling in the values for slots for instances (individuals)

Page 15: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

15

Ontology - ClassesJust a first try to build an ontology of statistical metadata:

main classes created from MCV(According to SDMX Content-Oriented Guidelines: Framework, Draft March 2006, p.6)1. General metadata (derived from ISO, UNECE and UN documents);2. Metadata describing Statistical methodologies;3. Metadata describing Quality assessment;4. Terms referring to Data and metadata exchange (SDMX information model and data structure definitions, etc.).

Page 16: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

16

Classes and subclasses (Protégé)

Page 17: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

17

Classes and subclasses

Page 18: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

18

Classes and subclasses

Quality

Page 19: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

19

Properties

Property

Class

(e.g. “Quality according to Eurostat, has a dimension called relevance”)

relevance

Page 20: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

20

Codification - Ontology Web Language (OWL)

…………………..<owl:Ontology rdf:about=""> <rdfs:comment >Metadata Common Vocabulary (MCV) ontology.</rdfs:comment> </owl:Ontology>……………………… // Object Properties<!-- http://www.semanticweb.org/ontologies/2008/8/MCV.owl#uses -->

<owl:ObjectProperty rdf:about="#uses"> <owl:inverseOf rdf:resource="#isUsedBy"/> </owl:ObjectProperty>……………………….. // Classes <!-- http://www.semanticweb.org/ontologies/2008/8/MCV.owl#ComputerAssistedInterviewing -->

<owl:Class rdf:about="#ComputerAssistedInterviewing"> <rdfs:subClassOf rdf:resource="#DataCollection"/> </owl:Class>

Page 21: Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt)sergio.bacelar@ine.pt

21

Conclusion

Since Ontology is a very strict, rigorous and formal language to represent knowledge, mapping a glossary like Metadata Common Vocabulary into a Statistical Metadata Ontology can help to reduce eventual inconsistencies, incompleteness and lack of structure;

This may facilitate harmonization of concepts describing data (semantic univocity) to the SDMX users.