programming with semantic broad data
TRANSCRIPT
Steffen Staab Programming with Semantic Broad Data 1Institute for Web Science and Technologies · University of Koblenz-Landau, GermanyWeb and Internet Science Group · ECS · University of Southampton, UK &
Programming with
Semantic Broad Data
Steffen Staab@ststaab
west.uni-koblenz.de
Steffen Staab Programming with Semantic Broad Data 2
The World of Big Data – Volume & Velocity
Genome data• Up to 200 GB/personVideo data• Upload 300 hrs/minSensor data• 5000 sensors/jet
engine• 1 Tera bit/s
360 TB/disc
https://flic.kr/p/8zuDTm
https://flic.kr/p/59jc2hComplex?
Steffen Staab Programming with Semantic Broad Data 3
The World of Big Data – Volume & Velocity
Genome data• Up to 200 GB/personVideo data• Upload 300 hrs/minSensor data• 5000 sensors/jet
engine• 1 Tera bit/s
https://flic.kr/p/8zuDTm
https://flic.kr/p/59jc2h
Hard in some dimensions
18 concepts
Noise amplitudes
Steffen Staab Programming with Semantic Broad Data 4
The World of Big Data – Variety
Data models• Graph data• Relational• XML• RDF• CSV• JPEG• MPEG-1, 2, 4• Dicom• PDF• Excel• ...
Conceptual modelsaka ER schemataaka Logical schemataaka XML schemataaka RDFS / OWL ontologies
Foaf, Dublin Core, Marc81, Unifact,.....
Dozens - Hundreds
Steffen Staab Programming with Semantic Broad Data 5
The World of Big Data – Variety – 15 years agoSAP• In the order of 10,000
‘concepts’• Days to find the right column
Medical information system (Lars)• Treating transplant patients • Approx. 10,000 concepts
Only myvery limited experiences
Big consulting business
Steffen Staab Programming with Semantic Broad Data 6
The World of Big Data – Variety – Today!Wikidata• 1,148,230 concepts • 2515 relationsUMLS• 1 Mio conceptsBioinformatics• 1000s public databases • 35 in Bio2rdf
(11 bio triples)eGov datasets• 200,000 by Fraunh. Fokus• 20,000 by ODIKnowledge Graphs• Ask Google, Microsoft, Samsung, HP,
...Sensor types• 330 broad types in Wikipedia• Tens of thousands
How to write valid, robust
programs?
How to find data?
Steffen Staab Programming with Semantic Broad Data 7
How to write a valid, robust program?
SELECT ?xWHERE { ?x a CONCEPT15}
SELECT ?xWHERE { ?x a CONCEPT151735}
https://flic.kr/p/8zuDTm
18 concepts1,166,040 concepts1,148,230 concepts
Sept, ´16March, ´16
Steffen Staab Programming with Semantic Broad Data 8
How to approach big data
In fhe following I am guessing what Axel Polleres might have told youabout Enterprise Linked Data
Steffen Staab Programming with Semantic Broad Data 9
Traditional Information Architecture
Business Logics
Structured DataUnstructured
Data
Presentation and Interaction
Characteristics:• Processes are
known• Data structures
are known• Meaning of data
primarily in schema and code
Steffen Staab Programming with Semantic Broad Data 10
Big Data in Today‘s Information Architecture
Characteristics:• Little structure• Semi-structured
data• Meaning of data of
primary importance!
Steffen Staab Programming with Semantic Broad Data 11
Variety Issue 1: Data ModelsData Models:• Relational• Tree (XML,...)• Document oriented• Stream• Array• Graph-DB
RDFGraph data model as common denominator
Steffen Staab Programming with Semantic Broad Data 12
Dealing with Issue 1: RDF as Data Model
RDFGraph data model as common denominator
knowsBowie Saran-
don
8-1-1947
bornOn
Steffen Staab Programming with Semantic Broad Data 13
Variety Issue 2: Conceptual Models
Conceptual Models:• ER• UML• ...
RDFSOntology as common
denominator
Steffen Staab Programming with Semantic Broad Data 14
Variety Issue 2: RDFS as common conceptual meta model
RDFSfor explicit conceptual
description
knowsBowie Saran-
don
8-1-1947
bornOn
MusicArtist Actor
typetype
Steffen Staab Programming with Semantic Broad Data 15
Variety Issue 3: System Boundaries
IRIsfor globally unique
referencing
f:knowsm:Bowie d:Saran
-don
8-1-1947
m:bornOn
m:Music Artist d:Actor
rdf:typerdf:type
m = http://musicbrainz.orgd = http://dbpedia.orgf = http://xmlns.com/foaf/0.1/rdf = https://www.w3.org/2001/sw/
Steffen Staab Programming with Semantic Broad Data 16
A Practical Perspective on Broad Data with LITEQ
Steffen Staab Programming with Semantic Broad Data 17
Drosophila: Linked Open Data Cloud
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Dozens of domains
Hundreds of data sources
Thousands of concepts
Millions of entities
Billions of triples
Semantic Broad Data
Steffen Staab Programming with Semantic Broad Data 19
c1
Programming with Linked Data
Tasks of the Programmer 1 Schema exploration
2 Programming code types
3 Programming queries
4 Programming procedures for
• creating, • manipulating,• persisting
objects
Steffen Staab Programming with Semantic Broad Data 20
Node Path Query Language Using AutocompletionExploration of classes
Steffen Staab Programming with Semantic Broad Data 21
Node Path Query Language Using AutocompletionExploration of classes
Exploration of relations
Steffen Staab Programming with Semantic Broad Data 22
Node Path Query Language: Query FormulationExploration of classesExploration of relationsQuerying for instances
Type set of mo:MusicArtist
No definition or declaration needed
Steffen Staab Programming with Semantic Broad Data 23
Node Path Query Language for Code DevelopmentExploration of classesExploration of relationsQuerying for instancesDeveloping code with queries
All translated into SPARQL queries at• Development time• Type inference at compile time
(but also as part of IDE)• Querying again at run time
One language to bind them all
Steffen Staab Programming with Semantic Broad Data 24
Node Path Query Language for Code DevelopmentExploration of classesExploration of relationsQuerying for instancesDeveloping code with queriesDeveloping code with new classes
All translated into SPARQL queries at• Development time• Run time update• Persistence!
Steffen Staab Programming with Semantic Broad Data 25
Formal NPQL Syntax
Data browsing
Restricting Class Expressions
Evaluating Class Expressions
Navigating from Data to Classes
Navigating from Data to Property Types
URI set
Intensional Queries
Extensional Queries
NavigationalQueries
Steffen Staab Programming with Semantic Broad Data 27
NPQL Algebra (Example)Reversibility
can be used to simplify path expressions.
Steffen Staab Programming with Semantic Broad Data 28
Summary on LITEQLanguage Integrated Types, Extensions, and Queries
NPQL (Node Path Query Language)• Navigational Queries• Intensional Queries• Extensional Queries• Compilation to SPARQL
LITEQ• Implementation of NPQL as F# Type Provider in Visual Studio• Autocompletion using NPQL queries• Automatic typing
of extensional query resultsby intensional queries
Steffen Staab Programming with Semantic Broad Data 29
„That seems to work very well in practice, but how does it work in theory?“
17 let allArtists = Store.NPQL().``mo:MusicArtist``.Extension
What is implied by such a line......for the programme?...for the compiler?
seems to
Steffen Staab Programming with Semantic Broad Data 30
A Foundational Perspective on Semantic Broad Data Using DL
Steffen Staab Programming with Semantic Broad Data 31
What we want to have: Static Type Checking
But:• In LITEQ: Queries must receive types• Number of types in our system very/infinitely large• Existing type systems expect complete knowledge
Programming with Data from a Knowledge Base
Issue in our prototype
Steffen Staab Programming with Semantic Broad Data 32
Related Work
Generic Types• Everything is a node
or an edge• No type checking!
Only 2nd place in Halo competition
Mapping approaches• Hibernate• LITEQ• ActiveRDF• Summer / Winter• ...
Preferred in SemWeb now Been there, done that
Steffen Staab Programming with Semantic Broad Data 33
Example – and Issues with Mapping
Mapping DL types to PL types problematic because1. Mix of nominal (MusicArtist) and structural typing (recorded.Song)2. Schema-less information (influencedBy)3. Inference (hendrix:MusicArtist)4. Sheer size of terminology
How to type a query?
Steffen Staab Programming with Semantic Broad Data 34
Example
Code
To be rejected is not subtype of
How to type a query?
Steffen Staab Programming with Semantic Broad Data 35
Example
Code
To be accepted is a
How to type a query?
Steffen Staab Programming with Semantic Broad Data 36
What we want to have: Static Type Checking
Challenge:• A programming language that accepts
concept expressions as types and can deal with inferences
Programming with Data from a Knowledge Base
DL
Steffen Staab Programming with Semantic Broad Data 37
Given • Atomic Types: A={...Ai...}• Plus Function types: T={...Ai..., ...TiTj...}
Add elements• Concept expressions ( Intensional NPQL queries )• Instances ( Extensional NPQL queries)
Add knowledge• Typing and subtyping derived from knowledge base
Core Ideas of DL
Steffen Staab Programming with Semantic Broad Data 38
Concept Forming Expressions
Syntax Semantics
Top T I
Bottom I
Concept Name A AI
Intersection A B AI BI
Negation A I \ AI
Existential Restriction R.C { a I | (a,b) RI and b CI}
Axioms Syntax Semantics
T-Box Subclass C D AI BI
A-Box Concept assertion a:C aI CI
A-Box Role assertion (a,b) : R (aI,bI) RI
Description Logics Fragment
Steffen Staab Programming with Semantic Broad Data 39
Universal model of computation• Abstraction• Application
Example:• f.x.f (f x)
Evaluation rules
Calculus
Steffen Staab Programming with Semantic Broad Data 41
Core DL: Evaluation and Typing
Nominal DL-Type
Steffen Staab Programming with Semantic Broad Data 42
Subtyping
¥ many typesAdd KB knowledge
only when needed for checking application,
not proactively
Steffen Staab Programming with Semantic Broad Data 43
• Queries return sets• Concept set type needed• Set operators needed
• Map, Fold, Element• Queries may return infinite sets
• No theoretical problem, but lack of well-defined stopping conditions in KBs
• Type dispatch based on inferencing
Further issues and opportunities in DL
Steffen Staab Programming with Semantic Broad Data 45
Theorem: A well-typed closed term does not get stuck during evaluation (with common exceptions).
Result for DL
Typing is a safety net, but does not solve the halting problem
(empty list)
Steffen Staab Programming with Semantic Broad Data 47
Broad data• has grown from 104 to 106 concepts (plus data)• continues to grow
– more integration of distributed databases– more sensors of different types– More crowdwork
• has not been recognized as a problem of its own, yet• will lead to
– brittleness– high maintenance efforts– loss of opportunities
Present of Broad Data
Steffen Staab Programming with Semantic Broad Data 48
New Methods for Broad data• Explore
– Understand• Find• Relate (see e.g. Linda‘s talk today)• Program• Maintain
Future of Broad Data
Steffen Staab Programming with Semantic Broad Data 49Institute for Web Science and Technologies · University of Koblenz-Landau, GermanyWeb and Internet Science Group · ECS · University of Southampton, UK &
Thank you for your attention!
Thanks to my collaborators for this work:
Stefan Schegelmann, Martin Leinberger, Matthias Thimm (WeST, Koblenz)Evelyne Viegas (Microsoft Research, Redmond)
Ralf Lämmel (SOFTLANG, Koblenz)