dataset quality ontology - an engineering experience

40
Dataset Quality Ontology: An engineering experience Jeremy Deba*sta University of Bonn / Fraunhofer IAIS Germany

Upload: jerdeb

Post on 14-Jan-2017

274 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

Page 1: Dataset Quality Ontology - An Engineering Experience

DatasetQualityOntology:Anengineeringexperience

JeremyDeba*staUniversityofBonn/FraunhoferIAIS

Germany

Page 2: Dataset Quality Ontology - An Engineering Experience

…whoamI

•  PhDStudentattheUniversityofBonn•  OriginallyfromMalta(EnyislandintheMediterraneanbetweenItalyandLibya)

deba*[email protected] 2

Page 3: Dataset Quality Ontology - An Engineering Experience

…whoamI

•  B.Sc(Hons)inComputerScience–UniversityofMalta– Thesis:CollaboraEveEdiEngandExpertFinding

•  M.AppScinComputerScience–DERI(nowInsight),NaEonalUniversityofIreland,Galway– Thesis:Ontology-basedrulesforUser-ControlledSupportinUbiquitousEnvironments

deba*[email protected] 3

Page 4: Dataset Quality Ontology - An Engineering Experience

…myPhD–thebigpicture

•  WorkrelatedtoDataQuality(inLD)–  represenEngqualitymetadata(daQ)– assessingdataquality(Luzzu)–  idenEfyingnewmetricsfromstandardvocabularies(likePROV-O)usingdifferenttechniquesforscalability.

Page 5: Dataset Quality Ontology - An Engineering Experience

…agenda•  DefiniEonsofQuality

•  Lookatsomequalityaspectsre:OntologyEngineering

•  OurexperienceindevelopingdaQ–  contribuEonstowardsaW3Cvocab

•  VoCoLasatoolforcollaboraEvevocabularydevelopment

•  VOWLasatoolforvisualrepresentaEonofontologies

deba*[email protected] 5

Page 6: Dataset Quality Ontology - An Engineering Experience

…quality?

deba*[email protected] 6

Robert PirsigJoseph Juran

Phillip Crosby

Page 7: Dataset Quality Ontology - An Engineering Experience

Robert Pirsig

… the result of care Zen and the Art of Motorcycle Maintenance (1974)

7deba*[email protected]

Page 8: Dataset Quality Ontology - An Engineering Experience

… fitness for use Quality Control Handbook (1974)

Joseph Juran

8deba*[email protected]

Page 9: Dataset Quality Ontology - An Engineering Experience

… conformance to requirements

Quality is Free : The Art of Making Quality Certain. Mentor book. (1979)

Joseph JuranPhillip Crosby

9deba*[email protected]

Page 10: Dataset Quality Ontology - An Engineering Experience

…whatisqualityforyou?

deba*[email protected] 10

Page 11: Dataset Quality Ontology - An Engineering Experience

…QualityasdefinedinadicEonary

D1.howgoodorbadsomethingis

D2.acharacteris1corfeaturethatsomeoneorsomethinghas

D3.ahighlevelofvalueorexcellence

…defini1onsfromh9p://www.merriam-webster.com

deba*[email protected] 11

Page 12: Dataset Quality Ontology - An Engineering Experience

…dereferenceability“AnyHTTPURIshouldbedereferenceable,meaningthatHTTPclientscanlookuptheURIusingtheHTTPprotocolandretrieveadescrip1onoftheresourcethatisiden1fiedbytheURI.”–TomHeath,ChrisBizer:LinkedDataBook

(LDEvol)

deba*[email protected] 12

?303SeeOther 200OK

…theprocessthatretrievesarepresenta/onoftherequestedresource

Page 13: Dataset Quality Ontology - An Engineering Experience

…unknownexternalontologyused

•  Usageofontologiesthatcannotbedereferenced.–  rdfs:domain–  rdfs:range–  rdfs:subClassOf– …

•  UsageofdeprecatedClasses(classestaggedwithowl:deprecatedClass)

deba*[email protected] 13

Page 14: Dataset Quality Ontology - An Engineering Experience

…licencing“Specifyanappropriateopendatalicense.Data(inourcaseontology)reuseis

morelikelytooccurwhenthereisaclearstatementabouttheorigin,ownershipandtermsrelatedtotheuseofthepublisheddata”–BPfor

publishingLinkedData(hjps://www.w3.org/TR/ld-bp/)

•  Awaytodefineclearboundaries

•  InclusionofMachineandHumanReadableLicensetoontology’smetainformaEon

deba*[email protected] 14

Page 15: Dataset Quality Ontology - An Engineering Experience

…ontologydecleraEon

•  Describinganontologyusingowl:Ontology(andvoaf:Vocabulary–forLOVinclusion)– Metadataincludes:creator,datemodified,descrip1on,versioninfo,preferrednamespaceuri,preferredprefix…

– otherprovenanceinformaEonsuchashistoryofchangesetc…

deba*[email protected] 15

Page 16: Dataset Quality Ontology - An Engineering Experience

…domainandrangedefiniEon

•  Opendomain-rangeisnotrecommended

•  Reducesinteroperabilityandthus“understanding”ofresources’properEes

deba*[email protected] 16

Page 17: Dataset Quality Ontology - An Engineering Experience

…ontologyhijacking

•  RedefiniEonofclassesandproperEesinavocabularythatitisnotinitsnaturalnamespace.– e.gredefiningfoaf:PersoninyourownontologytobeasubclassofanewlydefinedPersonconcept.

deba*[email protected] 17

Page 18: Dataset Quality Ontology - An Engineering Experience

…consistencychecking

•  Possibleproblemswhenusingaxiomssuchas:– owl:inverseFunc1onalProperty– owl:disjointClass– owl:disjointWith– owl:inverseOf– …

deba*[email protected] 18

Page 19: Dataset Quality Ontology - An Engineering Experience

…otherpossiblemeasures•  MulElingualism

•  Humanreadablelabelsandcomments

•  Interlinkingwithsimilarterms/concepts

•  ValidSyntax

•  Un-typedclassesandproperEes

•  …others?

deba*[email protected] 19

Page 20: Dataset Quality Ontology - An Engineering Experience

…thedaQmeta-model

deba*[email protected] 20

Page 21: Dataset Quality Ontology - An Engineering Experience

…daQ–History

•  DescribingQualityMetadatainastandardisedmanner

•  Startedaroundtheendof2013

•  FormsthebasisoftheupcomingW3CDataQualityVocabulary(DQV)standard

deba*[email protected] 21

Page 22: Dataset Quality Ontology - An Engineering Experience

…daQ–TheFirstVersion

deba*[email protected] 22

Page 23: Dataset Quality Ontology - An Engineering Experience

…daQ–SubsequentVersions

•  Alwayspen,paperandasimpletexteditor– GITasaversioningcontrolsystem

•  4versionsbeforethecurrentversion

•  UseCaseiteraEontesEng

deba*[email protected] 23

Page 24: Dataset Quality Ontology - An Engineering Experience

…daQ–2ndVersion

•  Introduced:QualityGraph,and3levelsofAbstrac1on(BasedonZaverietal.categorisaEon)

deba*[email protected] 24

rdfg:Graph QualityGraphA

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric

dateComputed requires

value

xsd:dateTime

computedOn rdfs:Resource

Page 25: Dataset Quality Ontology - An Engineering Experience

…daQ–AbstracEon•  HidingComplexity

•  DisEncEonbetweendaQconceptsandtangiblequalitymeasureconcepts

•  Abstractclassescannotbetyped(rdf:type),butinsteadshouldbesub-classed(rdfs:subClassOf)

•  ThereisnowaytocheckforabstractclassviolaEonunlessthereisanapplicaEonthatcheckssuchsyntaxerrors.

deba*[email protected] 25

Page 26: Dataset Quality Ontology - An Engineering Experience

…daq–WhyAbstractProperEes?

•  BestPracEcetoavoiddoubtandambiguity:– Ametricisajachedtoonedimensiononly.– Adimensionisajachedtoonecategoryonly.

•  UnifiedviewalsopresentedinZaverietal.DataQualitySurvey

deba*[email protected] 26

Page 27: Dataset Quality Ontology - An Engineering Experience

…daQ–3rdVersion

•  Introduced:TheDataCubeVocabulary

deba*[email protected] 27

rdfg:Graph QualityGraph

Aqb:DataSet

definesQBDataSet

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric

dateComputedrequires

valuexsd:dateTime

qb:ObservaDon

hasObservaDon

rdfs:Resource

computedOn

metric

qb:dataSet

Page 28: Dataset Quality Ontology - An Engineering Experience

…daQ–4thVersion

•  Modified:QualityGraph;Introduced:expectedDataType;Added:datetoObserva1on

deba*[email protected] 28

rdfg:Graph QualityGraph

Aqb:DataSet

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric

expectedDataType

requires

valuexsd:anySimpleType

qb:ObservaBon

hasObservaBon

rdfs:Resource

computedOn

metric

qb:dataSet

dc:date

Page 29: Dataset Quality Ontology - An Engineering Experience

…daQ–canametricreturnavalueotherthanasimpledatatype?

“ThispropertyfromDAQisdefinedtohaverangexsd:anySimpleType.Whileitseemsusefultodefinetheexpecteddatatypeforametric,asimpletypemaytoonarrow:inmanycasesametricwillbedeterminedonadatarecordorasubgraph.”–[BailerWarner28/10/2015]W3CDWBPPublicCommentsList-h9ps://lists.w3.org/Archives/Public/public-dwbp-comments/2015Oct/0019.html

deba*[email protected] 29

Page 30: Dataset Quality Ontology - An Engineering Experience

…daQ-5thVersion

•  Introduced:isEs1mate,computedBy

deba*[email protected] 30

rdfg:Graph QualityGraph

Aqb:DataSet

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric

expectedDataType requires

value

xsd:anySimpleType qb:ObservaBon

hasObservaBon

rdfs:Resource

computedOn

metric

qb:dataSet

dc:date

xsd:boolean

isEsBmate

prov:Agent

computedBy

Page 31: Dataset Quality Ontology - An Engineering Experience

…daQ–CurrentVersion

•  Removed:computedBy,qb:Observa1onAdded:daq:Observa1on

deba*[email protected] 31

rdfg:Graph QualityGraph

Aqb:DataSet

B

Category Dimension Metric

rdfs:Resource

hasDimension hasMetric

expectedDataType requires

value

xsd:anySimpleType daq:ObservaBon

hasObservaBon

rdfs:Resource

computedOn

metric

qb:dataSet

sdmx-dimension:BmePeriod

xsd:boolean

isEsBmate

xsd:dateTime

qb:ObservaBon

prov:EnBty

Page 32: Dataset Quality Ontology - An Engineering Experience

…involvementinW3C

•  W3CWorkingGroup–DataontheWebBestPracEces–  developopendataecosystem–  provideguidancetopublishers–  fostertrustindata

•  3Deliverables:–  BestPracEces– DataQualityVocabulary(DQV)– DataUsageVocabulary

deba*[email protected] 32

Page 33: Dataset Quality Ontology - An Engineering Experience

…involvementinW3C-DQV

•  Ameta-modeltocovermanyqualityaspectsofadataset(linkeddataornot)

•  ThecorecomponentdescribingquanEtaEvemeasuresisinspiredbydaQ

deba*[email protected] 33

Page 34: Dataset Quality Ontology - An Engineering Experience

…involvementinW3C-DQV

deba*[email protected] 34

Page 35: Dataset Quality Ontology - An Engineering Experience

…involvementinW3C-DQV

•  Notableissues(204,205)betweendaQandDQV(hjps://www.w3.org/2013/dwbp/track/issues/xxx)-wherexxxis204or205

– UsageofabstractclassesandproperEes– DefiningCategory-Dimension-Metricassubclassofskos:Concept

deba*[email protected] 35

Page 36: Dataset Quality Ontology - An Engineering Experience

…collaboraEveframework•  VoCoL–anIDEforcollaboraEvevocabularydevelopment

withVCSintegraEon

•  A(exchangeable)componentbasedsystem–  HumanReadableDocumentGeneraEon–  IntelligentTurtleEditor–  EvoluEonTracker–  OntologyVisualisaEon–  SPARQLEndpointService–  Client-SidevalidaEonbeforecommittoVCS

•  OnlineDemo:hEp://buEerbur06.iai.uni-bonn.de/

deba*[email protected] 36

Page 37: Dataset Quality Ontology - An Engineering Experience

…visualisingontologies-VOWL

•  VOWL–AvisualnotaEonforOWL–  IntuiEve

–  Self-explaining

–  Comprehensible

– Well-specified

–  Complete

– Device-independent

deba*[email protected] 37

hjp://vowl.visualdataweb.org

Page 38: Dataset Quality Ontology - An Engineering Experience

…VOWLimplementaEons

deba*[email protected] 38

ProtégéPlugin

WebVOWL

Page 39: Dataset Quality Ontology - An Engineering Experience

…daQinVOWL

deba*[email protected] 39

Page 40: Dataset Quality Ontology - An Engineering Experience

…ReferencesandLinks•  (daQ)-RepresenEngdatasetqualitymetadatausingmulE-

dimensionalviews–J.Deba*sta,C.Lange,S.Auer•  (DQV)-hjps://www.w3.org/TR/vocab-dqv/•  (VoCoL)-hjps://github.com/vocol/vocol•  (Zaverietal.)-QualityAssessmentforLinkedData:A

Survey•  (LDEvol)-hjp://linkeddatabook.com/ediEons/1.0/•  (VOWL)–hjp://vowl.visualdataweb.org

[email protected]

deba*[email protected] 40