linked data universe - large scale computing tasks for the hpi futuresoc-lab
TRANSCRIPT
![Page 1: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/1.jpg)
Harald SackInternet Technologies and Systems (ITS) Future Internet TechnologiesHasso-Plattner-Institute for IT Systems Engineering
5th Annual Symposium on Future Trends in Service-Oriented ComputingJune 16th, 2010Hasso-Plattner-Institute for IT Systems EngineeringPotsdam
Linked Open Data Universe
![Page 2: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/2.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
2
The Web is huge....
To be more precise, the WWW is rather huge...•more than 25 x 109 documents in
Search engine indexes (TNL Blog: Google has 24 billion items index, considers MSN search nearest competitor, September 2005)
•Google Web Crawler found more than 1012 documents(The Official Google Blog: We knew the Web was Big....., Juli 25, 2008)
•New Google Search Index Caffeine comprises 100 Million Gigabytes of datai.e. 1017 Byte (SMX Video: Google’s Matt Cutts On Caffeine Launch, June 9, 2010,http://searchengineland.com/smx-video-googles-matt-cutts-on-caffeine-launch-43933)
•And then, there is also the DeepWeb (Darkweb) ...and it is supposed to be up to 500 time larger than the Surface Web(Bergman, 2001)
![Page 3: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/3.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
3
The Web is growing...
Multimedia, Real-Time Data, Sensor Data, ....
in 06/2010: 7 TB/day
in 05/2010: •24 h of video upload / minute•2 billion streamed videos per day
in 06/2010: 7 TB/day
![Page 4: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/4.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
3
The Web is growing...
Multimedia, Real-Time Data, Sensor Data, ....
in 06/2010: 7 TB/day
in 05/2010: •24 h of video upload / minute•2 billion streamed videos per day
in 06/2010: 7 TB/day
![Page 5: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/5.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
4
How to find something on the Web?
in 06/2010: 7 TB/day
![Page 6: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/6.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
5
The ‘Web of Data‘
Semantic Web Technologies
• Interoperable and machine understandabledata semantics
• Based on formal knowledge representations
• Creating a ‘Web of Data‘
![Page 7: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/7.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
6
• Topic: Semantic Web and Linked Data
•Problems and Experiments
•Application: Exploratory Multimedia Search
Linked Open Data Universe
![Page 8: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/8.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
7
Semantic Web and Linked Data
From World Wide Web to Web of Data„The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help… “
Prerequisites:
• Content can be read and interpreted correctly (=understood) by machines
Tim Berners-Lee, Semantic Web Roadmap, Sept 1998
Semantic Web• (natural language) web content is
explicitely annotated with semantic metadata
• semantic metadata encode the meaning (semantics) of web content and can be read andinterpreted correctly my machine
Natural Language Processing• Technology from traditional Information
Retrieval (WWW Search Engines)
![Page 9: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/9.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
8
Semantic Web and Linked Data
Understanding Web Content - I
Natural Language Processing• Technology from traditional Information
Retrieval (WWW Search Engines)
![Page 10: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/10.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
8
Semantic Web and Linked Data
Understanding Web Content - I
Natural Language Processing• Technology from traditional Information
Retrieval (WWW Search Engines)
![Page 11: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/11.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
8
Semantic Web and Linked Data
Understanding Web Content - I
Natural Language Processing• Technology from traditional Information
Retrieval (WWW Search Engines)
?...
?
text: „FAB“
fabulous
Entity MappingDisambiguation
?
![Page 12: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/12.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
8
Semantic Web and Linked Data
Understanding Web Content - I
Natural Language Processing• Technology from traditional Information
Retrieval (WWW Search Engines)
Fabio CapelloManager ofUK National
Football Team
?...
?
text: „FAB“
fabulous
Entity MappingDisambiguation
?
![Page 13: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/13.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
8
Semantic Web and Linked Data
Understanding Web Content - I
Natural Language Processing• Technology from traditional Information
Retrieval (WWW Search Engines)
Fabio CapelloManager ofUK National
Football Team
?...
?
text: „FAB“
fabulous
Entity MappingDisambiguation
?
David JamesGoal Keeper of
UK NationalFootball Team
![Page 14: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/14.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
9
Semantic Web and Linked Data
Understanding Web Content - II
text: „FAB“
Fabio Capello
Entity Mapping
![Page 15: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/15.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
9
Semantic Web and Linked Data
Understanding Web Content - II
text: „FAB“
Fabio Capello
Entity Mapping
Soccer Manager
is a
![Page 16: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/16.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
9
Semantic Web and Linked Data
Understanding Web Content - II
text: „FAB“
Fabio Capello
Entity Mapping
Soccer Manager
is a
Person
is a
![Page 17: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/17.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
10
Semantic Web and Linked Data
Understanding Web Content - III
Fabio Capello (entity)
![Page 18: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/18.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
10
Semantic Web and Linked Data
Understanding Web Content - III
Fabio Capello (entity)
Soccer Manager
is a
(class)
Class-membership has type
![Page 19: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/19.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
10
Semantic Web and Linked Data
Understanding Web Content - III
Fabio Capello (entity)
Soccer Manager
is a
(class)
Class-membership has type
Person
is a
(class)
superclass
subclass
is subclass of
![Page 20: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/20.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
11
Semantic Web and Linked Data
Understanding Web Content - IV
Fabio Capello
Soccer Manager
Person
is a
is aEntities
Classes
![Page 21: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/21.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
11
Semantic Web and Linked Data
Understanding Web Content - IV
Fabio Capello
Soccer Manager
Person
is a
PlacehasBirthPlace
is aEntities
Classes
![Page 22: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/22.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
11
Semantic Web and Linked Data
Understanding Web Content - IV
Fabio Capello
Soccer Manager
Person
is a
PlacehasBirthPlaceDate hasBirthDate
is aEntities
Classes
![Page 23: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/23.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
11
Semantic Web and Linked Data
Understanding Web Content - IV
Fabio Capello
Soccer Manager
Person
is a
PlacehasBirthPlaceDate hasBirthDate
is a
hasBirthDate1946-06-18
is a
Entities
Classes
![Page 24: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/24.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
11
Semantic Web and Linked Data
Understanding Web Content - IV
Fabio Capello
Soccer Manager
Person
is a
PlacehasBirthPlaceDate hasBirthDate
is a
hasBirthDate1946-06-18
is a
San Canzian d‘IsonzohasBirthPlace
is a
Entities
Classes
![Page 25: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/25.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
12
Semantic Web and Linked Data
![Page 26: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/26.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
13
Semantic Web and Linked Data
Fabio Capello http://dbpedia.org/resource/Fabio_Capello
URI - Uniform Resource Identifier
![Page 27: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/27.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
14
Semantic Web and Linked Data
http://dbpedia.org/resource/Fabio_Capello
http://en.wikipediapedia.org/resource/Fabio_Capello
![Page 28: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/28.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
15
Semantic Web and Linked Datahttp://dbpedia.org/resource/Fabio_Capello
![Page 29: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/29.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
16
Semantic Web and Linked Data
http://dbpedia.org/resource/Fabio_Capello
RDF Resource Description Framework
:Fabio_Capello dbpp:birthPlace :San_Canzian_d%27Isonzo .:Fabio_Capello dbpp:birthDate “1946-06-18“ .:Fabio_Capello rdfs:type dbpo:SoccerManager .:Fabio_Capello rdfs:type dbpo:Person ....
:Fabio_Capello rdf:type dbpo:SoccerManager .
RDF Tripel RDF Subject RDF Property RDF Object
![Page 30: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/30.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
17
Semantic Web and Linked Data
http://dbpedia.org/ontology/soccer_manager
dbpo:SoccerManager rdf:type owl:class .dbpo:SoccerManager rdfs:subClassOf dbpo:Person .dbpo:SoccerManager rdfs:label “Soccer Manager“ .dbpp:birthPlace rdf:type rdf:Property .dbpp:birthPlace rdfs:domain dbpo:Person .dbpp:birthPlace rdfs:range dbpo:Place .dbpp:birthDate rdf:type rdf:Property .dbpp:birthDate rdfs:domain :Person .dbpp:birthDate rdfs:range xsd:date ....
RDF Schema
Person PlacehasBirthPlaceDate hasBirthDate
Soccer Manager
is a
![Page 31: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/31.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
18
Semantic Web and Linked DataUnderstanding Web Content - V
Fabio Capello
LivingPeople
PersonDate
hasBirthDate1946-06-18
hasBirthDate
is a
is a
is a
DeadPeople∩ =∅
logical constraint
is a
+ Rules (Description Logics)
∀x.∃y.hasDeathDate(x,y) ∧ Person(x) ∧ Date(y) → DeadPeople(x)
![Page 32: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/32.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
19
Semantic Web and Linked Data
SELECT DISTINCT ?l ?l2 ?g FROM <http://dbpedia.org> WHERE { ?s dbpp:nationalteam ?o . ?s rdfs:label?l FILTER langMatches( lang(?l), "EN" ) . ?s dbpp:nationalgoals ?g FILTER(?g>10). ?s dbprop:nationalteam ?nat . ?nat rdfs:label ?l2 FILTER langMatches( lang(?l2), "EN" ).} ORDER BY DESC(?g)
Select all players of a soccer nationalteam that have scored more than 10 goals while inthe team
![Page 33: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/33.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
20
Semantic Web and Linked Data
Select all players of a soccer nationalteam that have scored more than 10 goals while in the team
![Page 34: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/34.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
21
Semantic Web and Linked Data
(RDF)
(URI)
M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare.net/mediasemanticweb/quick-linked-data-introduction
Linked Data■ Term was originally coined by Tim Berners-Lee
(Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html)
The Web of data is abouta dataand namingmodel on the Web
![Page 35: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/35.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
22
Semantic Web and Linked Data
Linked Data
■ Technical Principles
□ use URIs to identify things uniquely (not only documents...)
□ use HTTP URIs (URLs) so that these things can be referred to and looked up ("dereferenced") by people and user agents
□ use RDF as an universal data model to provide useful information about these things
□ include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.
![Page 36: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/36.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
23
Semantic Web and Linked Data
Linked Data□ The application lf the Linked Data principles leads to the creation of a
,Web of Data‘
![Page 37: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/37.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
24
Semantic Web and Linked Data
Linking Open Data■ Public available structured data should be published as Linked Data
■ Various data sources should be interlinked
LOD-WikiPage: http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/
![Page 38: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/38.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
25
Semantic Web and Linked Data
![Page 39: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/39.jpg)
Linked Data Achievments■ Extension of the Web with a
data commons (14b RDF triples = facts)
■ Vibrant global RTD community
■ Industrial uptake starting(BBC, Thomson, Reuters, etc.)
■ Emerging governmental adoption in sight
■ Establishing Linked Data as a deployment path for the Semantic Web
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
26
Semantic Web and Linked Data
![Page 40: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/40.jpg)
Linked Data Challenges■ Coherence
relatively few, expensively maintained links
■ Qualitypartly low quality data and inconsistencies
■ Performancestill substantial penalties compared torelational database technologies
■ Data consumptionlarge scale processing, schema mapping anddata fusion still in its infancy
■ UsabilityMissing direct end user tools and network effect
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
27
Semantic Web and Linked Data
Sören Auer:"Linked Data: Now what?"ESWC2010 Panel Discussion
![Page 41: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/41.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
28
• Topic: Semantic Web and Linked Data
•Problems and Experiments
•Application: Exploratory Multimedia Search
Linked Open Data Universe
![Page 42: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/42.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
29
Problems and Experiments
![Page 43: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/43.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
30
Problems and Experiments
![Page 44: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/44.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
31
Problems and Experiments
A. Hoigan et al: Weaving the Pedantic Web, LDOW 2010
![Page 45: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/45.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
32
Problems and Experiments
Experiment Summary (1) Crawling the Semantic Web
(2) Structural Analysis
(3) Content-based Analysis
(4) Data Cleansing
(5) Heuristics for Ranking Semantic Web Data
(6) Augmenting Semantic Web Infrastructure
![Page 46: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/46.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
33
Problems and Experiments
So what? ■ Interesting Facts to find out about
Semantic Web & Linked Data
■How big is the Semantic Universe?
■ # tripel
■ # documents
■ # interlinking
■ Linking Open Data is only registered vocabulary/data in the LOD-Wiki→ 14b RDF triples
■What else is out there ... and how much of it?
■ ...and how do we get it?
![Page 47: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/47.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
34
Problems and Experiments
(1) Crawling the Semantic Web■Of course we are not the first to be out there...
■ SwoogleLi Ding et al: Finding and Ranking Knowledge on the Semantic Web, ISWC 2005.
■ Scutter/Slug Leigh Dodds: Slug: A Semantic Web Crawler, 2006
■ Sindice Giovanni Tumarello et al: Sindice.com - weaving the open linked data, ISWC 2007
→ 2.1b RDF triples
■ SWSE Andreas Harth et al: SWSE: Objects before Documents,
Semantic Web Challenge 2008, ISWC 2008
→ 1.1b RDF triples
■ FalconsG.Cheng et al.:Falcons: Searching and Browsing Entities on the Semantic Web, WWW17 2008.
→ 2.9b RDF triples
![Page 48: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/48.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
35
Problems and Experiments
(1) Crawling the Semantic Web■ First experiments:
■ Adapting & Improving Slug Crawler
■ for parallelization (48 Cores) and
■ lots of RAM (256GB - 2TB)
■ first test run: >1GB RDF data/1h
■What‘s new:
■ crawl not only RDF/RDFS and OWL resources
■ include (X)HTML with RDFa extensions and
■ dynamic documents with (semantic) sitemaps
■What‘s next...?
![Page 49: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/49.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
36
Problems and Experiments
(2) Analyzing the Semantic Web I - Structural Analysis■ Again we are not the first to be out there...
■ Structural Analysis of the ,early‘ WWW
IN44m nodes
SCC56m nodes
OUT44m nodes
unconnected components
unconnected components
tunnels
appendices
appendices
A. Broder et al.: Graph structure in the Web. In Comput. Netw. 33, 1-6 (Jun. 2000), 309-320.
![Page 50: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/50.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
37
Problems and Experiments
(2) Analyzing the Semantic Web I - Structural Analysis■ Again we are not the first to be there...
■ Structural Analysis of the ,early‘ Semantic Web
Weiyi Ge et al.: Object Link Structure in the Semantic Web, ESWC 2010
■ Experimental Setup
■ 18m RDF documents (Falcons crawl 2009)
■ 110m nodes with 190m edges■ Analysis of RDF link graph
■ average node degree: ≈3.4
■ effective diameter: ≈11.5
■ Largest connected component: ≈88% of all nodes
![Page 51: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/51.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
38
Problems and Experiments
(3) Analyzing the Semantic Web II - Content-Based Analysis■ Again we are not the first to be there...
http://pedantic-web.org/
A. Hoigan et al: Weaving the Pedantic Web, LDOW 2010
■ 150k documents with more than 12m RDF triples
■ Discovered categories of symptoms:
■ incomplete → dead links
■ incoherent → no correct interpretation (local)
■ hijack → no correct interpretation (remote)
■ inconsistent → contradictions
![Page 52: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/52.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
39
Problems and Experiments
(3) Analyzing the Semantic Web II - Content-Based Analysis■ Again we are not the first to be there...
Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010■ Artificial Benchmark dataset used
Leigh University Benchmark (LUBM) with 100b RDF triples
■ Computing the transitive closure (= reasoning)
■ Making implicit knowledge explicit
Fabio Capello San Canzian d‘IsonzohasBirthPlace
![Page 53: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/53.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
39
Problems and Experiments
(3) Analyzing the Semantic Web II - Content-Based Analysis■ Again we are not the first to be there...
Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010■ Artificial Benchmark dataset used
Leigh University Benchmark (LUBM) with 100b RDF triples
■ Computing the transitive closure (= reasoning)
■ Making implicit knowledge explicit
Fabio Capello
Person
is a
San Canzian d‘IsonzohasBirthPlace
![Page 54: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/54.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
39
Problems and Experiments
(3) Analyzing the Semantic Web II - Content-Based Analysis■ Again we are not the first to be there...
Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010■ Artificial Benchmark dataset used
Leigh University Benchmark (LUBM) with 100b RDF triples
■ Computing the transitive closure (= reasoning)
■ Making implicit knowledge explicit
Fabio Capello
Person
is a
PlacehasBirthPlace
San Canzian d‘IsonzohasBirthPlace
![Page 55: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/55.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
39
Problems and Experiments
(3) Analyzing the Semantic Web II - Content-Based Analysis■ Again we are not the first to be there...
Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010■ Artificial Benchmark dataset used
Leigh University Benchmark (LUBM) with 100b RDF triples
■ Computing the transitive closure (= reasoning)
■ Making implicit knowledge explicit
Fabio Capello
Person
is a
PlacehasBirthPlace
San Canzian d‘IsonzohasBirthPlace
class membershipcan be deduced
![Page 56: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/56.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
40
Problems and Experiments
(4) Analyzing the Semantic Web III - Data Cleansing■ trying to clean out Linked Open Data and possibly also (partially) the
Semantic Web...
(1) Identify inconsistencies and ambiguities by (automated) content-based analysis
(2)Solve inconsistencies & ambiguities
■ if possible by reasoning
■ else by crowdsourcing (game-based evaluation, etc.)
Cleaning out the Augean stables...AUGEAN-STABLES: Extremely nasty and smelly warehouses of filth, straw and manure
![Page 57: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/57.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
41
Problems and Experiments
(5) Analyzing the Semantic Web IV - Data Ranking■ Linked Data provides (unbiased) knowledge
■ unbiased = no distinction of what is important, what is not important
■ e.g., Albert Einstein■ > 600 facts (triples)■ > 80 properties■ no ranking■ no relevance
http://dbpedia.org/page/Albert_Einstein
![Page 58: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/58.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
42
Problems and Experiments
(5) Analyzing the Semantic Web IV - Data Ranking■We have developed heuristics for ranking objects and properties,
e.g.
:Albert_Einstein
:AmericanVegetarian
rdf:type
:Scientistrdf:type
![Page 59: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/59.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
42
Problems and Experiments
(5) Analyzing the Semantic Web IV - Data Ranking■We have developed heuristics for ranking objects and properties,
e.g.
:Albert_Einstein
:AmericanVegetarian
rdf:type
:Alfred_Kleiner
rdf:type
:Scientistrdf:type
![Page 60: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/60.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
42
Problems and Experiments
(5) Analyzing the Semantic Web IV - Data Ranking■We have developed heuristics for ranking objects and properties,
e.g.
:Albert_Einstein
:AmericanVegetarian
rdf:type
:Alfred_Kleiner
rdf:type
:Scientistrdf:type :Bill_Cosby
rdf:type
![Page 61: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/61.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
42
Problems and Experiments
(5) Analyzing the Semantic Web IV - Data Ranking■We have developed heuristics for ranking objects and properties,
e.g.
:Albert_Einstein
:AmericanVegetarian
rdf:type
:Alfred_Kleiner
rdf:type
:Scientistrdf:type :Bill_Cosby
rdf:type
:doctoralAdviser
considered to be relevant
![Page 62: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/62.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
43
Problems and Experiments
(6) Semantic Web Infrastructure - Tripel Stores■ RDF(S) Data is stored in Triple Stores
■ Basic idea:
■ Use 1 table with 3 columns (s,p,o)
■ For every row / row combinationcreate index structures for fast access(spo, sop, pos, pso, ops, osp)
■ Drawback: many self-joins needed(memory consumption)
![Page 63: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/63.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
44
Problems and Experiments
Experiment Summary (1) Crawling the Semantic Web
(2) Structural Analysis
(3) Content-based Analysis
(4) Data Cleansing
(5) Heuristics for Ranking Semantic Web Data
(6) Augmenting Semantic Web Infrastructure
![Page 64: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/64.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
45
• Topic: Semantic Web and Linked Data
•Problem Defintion and Experiments
•Application: Exploratory Multimedia Search
Linked Open Data Universe
![Page 65: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/65.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
46
http://www.yovisto.com
Application: Exploratory Multimedia Search
Yovisto semantic video search engine
■specialized on academic video content, e.g., lecture recordings
■enables to search within the content of video
■ automated video analysis: video scene cut detection, intelligent character recognition, complemented by collaborative user annotation
■more than 8.000h of video
Semantic Metadata:
■ Ontology: http://www.yovisto.com/ontology/0.9/
■ DBpedia, FOAF, DublinCore, MPEG-7, Tagging
■ RDFa annotation
■ public SPARQL Endpoint: http://sparql.yovisto.com/J. Waitelonis, H. Sack: Augmenting Video Search with Linked Open Data, in Proc. of International Conference on Semantic Systems 2009 (i-semantics 2009), September, 2-4, 2009, Graz, Journal of Universal Computer Science
![Page 66: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/66.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
47 ■Semantic Annotation
timeMetadata Extraction
Application: Exploratory Multimedia Search
![Page 67: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/67.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
47 ■Semantic Annotation
timeMetadata Extraction
e.g., person xy
location yz
event abc
Entity Recognition/ Mapping
Application: Exploratory Multimedia Search
![Page 68: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/68.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
47 ■Semantic Annotation
timeMetadata Extraction
e.g., bibliographical data,geographical data,encyclopedic data, ..
e.g., person xy
location yz
event abc
Entity Recognition/ Mapping
Application: Exploratory Multimedia Search
![Page 69: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/69.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
Exploratory Search• Is a kind of investigation task, where the user is
(a) not familiar with the domain of the search result,i.e. before entering appropriate keywords, she needs to learn about the domain
(b) not sure about the way how to reach search destination (concerning search process and search technology)
(c) not really sure about what she’s looking for, i.e. “Can you please find something out about ... ?”.
48
White, R.W., Kules, B., Drucker, S.M., and schraefel, M.C.Supporting Exploratory Search, Introduction to Special Section of Communications of the ACM, Vol. 49, Issue 4, (2006), pp. 36-39.
„Which modern philosophers build on the theories of the greek philosopher Plato?“
Application: Exploratory Multimedia Search
![Page 70: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/70.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
29
history
search term
related resources with properties
Waitelonis, Sack: Augmenting Video Search with Linked Open Data, in Proc. I-Semantics , Graz 2009.
![Page 71: Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab](https://reader033.vdocuments.net/reader033/viewer/2022052618/554ea3a6b4c9055f7b8b48da/html5/thumbnails/71.jpg)
JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
2010
50
• Topic: Semantic Web and Linked Data
•Problem Defintion and Experiments
•Application: Exploratory Multimedia Search
Linked Open Data Universe
Thank you for your Attention!