loupe model - use cases and requirements
TRANSCRIPT
LOUPE’S MODEL
USE CASES AND REQUIREMENTS
Nandana Mihindukulasooriya, María Poveda Villalón,
Raúl García CastroOntology Engineering Group. Departamento de Inteligencia Artificial.
Facultad de Informática, Universidad Politécnica de Madrid.
Campus de Montegancedo s/n.
28660 Boadilla del Monte. Madrid. Spain
{nandana, mpoveda, rgarcia}@fi.upm.es
Introduction to Loupe
2
Loupe - Overview
3
Explore the vocabularies used and the abstract triple patterns in 5+ billion triples including all Dbpedia datasets, Wikidata, Linked Brainz, Bio2RDF.
Loupe helps to understand data, uncover patterns, formulate queries, and detect quality issues
Loupe - Overview
4
Explore the vocabularies used and the abstract triple patterns in 5+ billion triples including all Dbpedia datasets, Wikidata, Linked Brainz, Bio2RDF.
Loupe helps to understand data, uncover patterns, formulate queries, and detect quality issues
No RDF data, No Public API
Loupe - Google Analytics
5
• Users from 86 countries
• Spain(23.76%), US (16.69%), Germany
(10.64%), UK (9.14%), Italy (4.51%)
Next Steps
6
Louping the LOD Cloud
7
Loupe – LOD Laundromat integration
8Nandana Mihindukulasooriya, OEG
• LOD Laundromat
• 32 billion triples from 650K documents
• cleaned for syntax errors and duplicates
• coverage of smaller documents
• Collaboration with VU University Amsterdam
• Indexing all data from LOD Laundromat
Use CasesWhat can we do with data indexed in Loupe?
9
Dataset descriptions
• Bridge between publishers and consumers
• A dataset description expresses metadata about
RDF datasets (e.g., DCAT, VoID)• statistics, vocabularies, structural metadata.
• A dataset profile is a set of dataset
characteristics that allow • To describe in the best possible way a dataset
• To separate it maximally from other datasets
• Can be used for dataset recommendation
10
Dataset Statistics
11
UC::ex1 - Compare dataset statistics (I)
12
DBpedia (2015-04) datasets
Size (in # of triples)
UC::ex1 - Compare dataset statistics (II)
13
# of Classes Used
DBpedia (2015-04) datasets
UC::ex2 - Monitor evolution of a dataset
14
Vocabulary Usage - Classes
15
Classes
Classes Properties
# of classes per vocabulary
Common instances
dbo:Place class
esDBpedia dataset
UC::ex3 - Dataset summary generation
16
Auto-generated dataset schemaVisual descriptions
foaf:Person
openaire:result
foaf:Organization
xsd:Stringfoaf:firstName
openaire:isAuthorOf
xsd:String
foaf:lastName
xsd:String
xsd:String
xsd:String
dcterms:dateAccepted
openaire:resultType
dcterms:language
openaire:hasAuthorfoaf:member
xsd:boolean
xsd:boolean
xsd:boolean
openaire:legalPerson
openaire:enterprise
openaire:sme
OpenAIRE Dataset
UC::ex4 - Automatic Dataset Classification
• Generic vs Domain specific datasets• size
• number of vocabularies
• number of classes
• number of properties
• Detection of the domain using the vocabularies used• High-level domains (E.g., cross domain, life sciences,
publications, government, geographic)
17
Property Information
18
E.g., dbo:placeOfBirth property - Analysis of objects<?subject , dbo:placeOfBirth, ?object>
UC::ex5 - Quality Report Generation
• Violations• Object / datatype property violations
• Domain / range constraint violations
• Disjoint class violations
• Outlier detection
• Detection of antipatterns
• Data repair guidelines
19
UC::ex6 - Data validation with RDF Shapes
20Nandana Mihindukulasooriya, OEG
Pattern
Extraction
Domain ExpertReview
RDF Shape
Generation
Data
Validation
Data
Repair
SHACL Shapes
Multilingual String Counts
3Cixty Dataset
21
String count by language Language tagged string count by property
UC::ex6 - Dataset Discovery / Search
• Simple
• I want to find dataset(s) that
• contain information about persons with some concrete
information
• E.g., “give me datasets that have more than 500
instances of foaf:Person that have the dbo:birthPlace
property”
• Advanced
• I want to find dataset(s) that
• can answer a given sparql query
• contain data that fit to a given W3C RDF data shape
22
UC::ex7 - Dataset ranking
• Ranking metrics• Size
• number of triples (of a given pattern)
• number of instance of a given class
• Richness
• the avg number of properties per instance
• General vs Domain specific dataset
• # classes, # of properties, # triples
• Provence information
23
Ontology development UC
• Reuse ontology elements used in datasets
24
Ontology development UC
• Reuse ontology elements used in datasets
• Look for patterns
25
Ontology development UC
• Reuse ontology elements used in datasets
• Look for patterns
• Ontology reuse reports
26
Ontology development UC
• Reuse ontology elements used in datasets
• Look for patterns
• Ontology reuse reports
• Ontology monitoring
• Why some classes or properties are not used?
• Aren’t they relevant?
• Are other classes are used for the same purpose?
27
Ontology development UC
• Reuse ontology elements used in datasets
• Look for patterns
• Ontology reuse reports
• Ontology monitoring
• Why some classes or properties are not used?
• Aren’t they relevant?
• Are other classes are used for the same purpose?
• Ontology comparison reports
28
29
We want YOU
to tell us your
use cases !!
Loupe Model
30
Model
31
http://ont-loupe.linkeddata.es/def/core#
Datasets and named graphs
32
Metadata from dcat
Classes and properties
33
Classes and properties
34
Classes and properties
35
Classes and properties
36
How many instances of a given
class are there.
Classes and properties
37
How many instances of a given
class are there. < x, a, C >
Classes and properties
38
How many instances of a given
class are there. < x, a, C >
Fixed
Classes and properties
39
How many instances of a given
class are there.
CountFixed
< x, a, C >
Classes and properties
40
How many instances of a
given class that have a
given property are there.
Classes and properties
41
< x, a, C >
< x, P, o >
How many instances of a
given class that have a
given property are there.
Classes and properties
42
< x, a, C >
< x, P, o >
Fixed
How many instances of a
given class that have a
given property are there.
Classes and properties
43
< x, a, C >
< x, P, o >
CountFixed
How many instances of a
given class that have a
given property are there.
Classes and properties
44
How many triples that have
a given property are there.
Classes and properties
45
< s, P, o >
How many triples that have
a given property are there.
Classes and properties
46
< s, P, o >
Fixed
How many triples that have
a given property are there.
Classes and properties
47
< s, P, o >
Fixed
Count
How many triples that have
a given property are there.
Triple patterns
48
How many triples that have a given
subject class, property and object
class are there.
< s, P, o >
< s, a, C1 >
< o, a, C2 >
Count
Languages
49
How many strings tagged with
a given language are there.
Languages
50
How many strings tagged with
a given language are there.
< x, b, “”@lang >
CountFixed
Languages
51
How many strings tagged with
a given language are there.
< x, b, “”@lang >
CountFixed
How many triples tagged with
a given language are there.
Languages
52
How many strings tagged with
a given language are there.
< x, b, “”@lang >
CountFixed
How many triples tagged with
a given language are there.
< s,b, “”@lang >
Fixed
Count
Vocabularies
53
Classes and properties
declared in namespaces.
Questions?
54
LOUPE’S MODEL
USE CASES AND REQUIREMENTS
Nandana Mihindukulasooriya, María Poveda Villalón,
Raúl García CastroOntology Engineering Group. Departamento de Inteligencia Artificial.
Facultad de Informática, Universidad Politécnica de Madrid.
Campus de Montegancedo s/n.
28660 Boadilla del Monte. Madrid. Spain
{nandana, mpoveda, rgarcia}@fi.upm.es
Backup Slides
56
Data Catalog Vocabulary (DCAT)
57
https://www.w3.org/TR/vocab-dcat/
Vocabulary of Interlinked Datasets (VoID)
58
https://www.w3.org/TR/void/