wtf is the semantic web and linked data
DESCRIPTION
Talk given at UT ISchool on Nov 17, 2011TRANSCRIPT
WTF is the Semantic Web and Linked Data
Juan F. SequedaDepartment of Computer Science
University of Texas at AustinNov 17, 2011
Semantic Web? Linked Data?
WTF?
WTF is the Semantic Web?
WTF is the Semantic Web?
Internet != Web
What is the Web?
“… the Web, is a system of interlinked hypertext documents accessed via the Internet. With a web browser, one can view web pages that may contain text, images […] and navigate between them via hyperlinks”
http://en.wikipedia.org/wiki/World_Wide_Web
Current Web = internet + links + docs
History of the Web
• Created by Tim Berners-Lee at CERN in 1989• Mosaic browser in 1993• W3C created in 1994• Exponential growth mid 90s• Amazon, Ebay – 1995• Search engines – Google 1998• Dot-com boom 1997 – 2001• Web 2.0 – blogs, Facebook, Twitter, etc
What is the problem?
WHAT’S THE WEATHER IN
AUSTIN TODAY?
http://www.flickr.com/photos/jamieca/31631256/
What is the problem?
• The web is full of documents• We aren’t always interested in documents
– We are interested in THINGS– These THINGS might be in documents
• We can read a HTML document rendered in a browser and find what we are searching for– This is hard for computers. – Computers have to guess (even though they are
pretty good at it)
The Web of Documents
Search
Crawler
Search Engine
The Web is a Data Shredder
Structured Data
Unstructured Data
Thanks Martin Hepp
What would we like?
• Make it easy for computers/software to find THINGS
Do you SEARCH or do you FIND?
Search for
Football Players who went to the University of Texas at Austin, played for
the Dallas Cowboys as Cornerback
Why can’t we just FIND it…
Guess how I FOUND out?
On a Semantic Web
• Besides publishing documents on the web– which computers can’t understand easily
• Let’s publish on the web something that computers can understand
DATA
The Semantic Web is a web of data
The current web is a web of documents
But wait… doesn’t the web already have data?
Current Data on the Web
• Relational Databases• APIs• XML• CSV• XLS• …• Can’t computers and applications already
consume that data on the web?
Yes! But it is all in different formats and data models!
This makes it hard to integrate data
The data in different data sources aren’t linked
For example, how do I know that the Juan Sequeda in Facebook is the same as Juan
Sequeda in Twitter
Or if I create a mashup from different services, I have to learn different APIs and I get different
formats of data back
Data is Siloed
Wouldn’t it be great if we had a standard way of publishing data on the Web?
We have a standardized way of publishing documents on the web, right?
HTML
Then why can’t we have a standard way of publishing data on the Web?
Good question! And the answer is YES. There is!
RDF
Resource Description Framework (RDF)
• Data Model = a way to model data– i.e. Relational databases use relational data model
• RDF is a graph data model
Key Value vs Graph
• Key Values– firstName Juan– lastName Sequeda– livesIn Austin– knows Stephane Corlosquet
• But what are these key/values describing?– ME!
RDF is a Graph
• Let’s group the Key/Values together– <JuanSequeda> <firstName> “Juan”– <JuanSequeda> <lastName> “Sequeda”– <JuanSequeda> <livesIn> “Austin”– <JuanSequeda> <knows> <StephaneCorlosquet>– ..– <StephaneCorlosquet> <firstName> “Stephane”– <StephaneCorlosquet> <lastName> “Corlosquet”– <StephaneCorlosquet> <livesIn> “Boston”
RDF is a Graph
• Let’s group the Key/Values together– <JuanSequeda> <firstName> “Juan”– <JuanSequeda> <lastName> “Sequeda”– <JuanSequeda> <livesIn> “Austin”– <JuanSequeda> <knows> <StephaneCorlosquet>– ..– <StephaneCorlosquet> <firstName> “Stephane”– <StephaneCorlosquet> <lastName> “Corlosquet”– <StephaneCorlosquet> <livesIn> “Boston”
Key/ValueIdentifier for the “group”
RDF can be serialized in different ways
• RDF/XML• RDFa (RDF in HTML)• N3• Turtle• JSON
RDFa
RDF/XML
RDF/N-triples
RDF/Turtle
So does that mean that I have to publish my data in RDF now?
You don’t have to… but we would like you to
Schema.orgRich Snippets
…
An example
Document on the Web
Databases back up documents
Isbn Title Author PublisherID ReleasedData
978-0-596-15381-6
Programming the Semantic Web
Toby Segaran 1 July 2009
… … … … …
PublisherID PublisherName
1 O’Reilly Media
… …
This is a THING:A book title “Programming the Semantic Web” by Toby Segaran, …
THINGS have PROPERTIES:A Book as a Title, an author, …
Lets represent the data in RDF
book
Programming the Semantic Web
978-0-596-15381-6
Toby Segaran
Publisher O’Reilly
title
name
author
publisher
isbn
Isbn Title Author PublisherID ReleasedData
978-0-596-15381-6
Programming the Semantic Web
Toby Segaran
1 July 2009
PublisherID PublisherName
1 O’Reilly Media
Remember that we are on the web
Everything on the web is identified by a URI
And now let’s link the data to other data
http://…/isbn978
Programming the Semantic Web
978-0-596-15381-6
Toby Segaran
http://…/publisher1 O’Reilly
title
name
author
publisher
isbn
And now consider the data from Revyu.com
http://…/isbn978
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
hasReview
reviewer
description
name
Let’s start to link data
http://…/isbn978
Programming the Semantic Web
978-0-596-15381-6
Toby Segaran
http://…/publisher1 O’Reilly
title
name
author
publisher
isbn
http://…/isbn978
owl:sameAs
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
hasReview
hasReviewer
description
name
Juan Sequeda publishes data too
http://juansequeda.
com/id
livesIn
Juan Sequedaname
http://dbpedia.org/Austin
Let’s link more datahttp://…/isbn978
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
http://juansequeda.
com/id
hasReview
hasReviewer
description
name
sameAs
livesIn
Juan Sequedaname
http://dbpedia.org/Austin
And more
http://…/isbn978
Programming the Semantic Web
978-0-596-15381-6
Toby Segaran
http://…/publisher1
O’Reilly
title
name
author
publisher
isbn
http://…/isbn978
owl:sameAs
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
http://juansequeda.
com/id
hasReview
hasReviewer
description
name
owl:sameAs
livesIn
Juan Sequedaname
http://dbpedia.org/Austin
Data on the Web that is in RDF and is linked to other RDF data is
LINKED DATA
Linked Data Principles1. Use URIs as names for
things2. Use HTTP URIs so that
people can look up (dereference) those names.
3. When someone looks up a URI, provide useful information.
4. Include links to other URIs so that they can discover more things.
Linked Data makes the web appear as ONE
GIANTHUGE
GLOBAL
DATABASE!
I can query a database with SQL. Is there a way to query Linked Data with a query language?
Yes! There is actually a standardize language for that
SPARQL
FIND all the reviews on the book “Programming the Semantic Web” by people who live in Austin
SELECT ?review ?commentWHERE { isbn:978 ex:hasReview ?review . ?review ex:description ?comment . ?review ex:hasReviewer ?person . ?person ex:lives dbpedia:Austin .}
SPARQL
http://…/isbn978
Programming the Semantic Web
978-0-596-15381-6
Toby Segaran
http://…/publisher1 O’Reilly
title
name
author
publisher
isbn
http://…/isbn978
sameAs
http://…/
review1
Awesome Book
http://…/
reviewer
Juan Sequeda
http://juansequeda.
com
hasReview
hasReviewer
description
name
sameAs
livesIn
Juan Sequedaname
http://dbpedia.org/Austin
SELECT ?review ?commentWHERE {isbn:978 ex:hasReview ?review .?review ex:description ?comment .?review ex:hasReviewer ?person .?person ex:lives dbpedia:Austin .}
OWL
• Here is where the real semantics shows up• Web Ontology Language• Define schema/vocabulary• Classes, Properties, Inheritance, etc• Subclasses, Subproperties• …• You can get more complicated with rules…
dexa:TirmiziSM08
auth: <http://dblp.l3s.de/d2r/page/authors/>dexa: <http://dblp.l3s.de/d2r/page/publications/conf/dexa/>dc: <http://purl.org/dc/elements/1.1/>sw: <http://data.semanticweb.org/person/>swrc: <http://swrc.ontoware.org/ontology#>owl: <http://www.w3.org/2002/07/owl#>rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>rdfs: <http://www.w3.org/2000/01/rdf-schema#>
auth:Juan_Sequeda
auth:Daniel_P._Miranker
auth:Syed_Hamid_Tirmizi
dc:creator
dc:creatordc:creator
“Translating SQL Applications to the
Semantic Web"
dc:title
sw:juan-f-sequeda
sw:daniel-miranker
sw:syed-tirmizi
foaf:Personswrc:InProceedings
swrc:Publication
dc:creator
rdf:type
rdf:type
rdfs:subClassOf
owl:sameAs
owl:sameAs
owl:sameAs
OWL
RDF
RDB and the Semantic Web
74
RELATIONAL MODEL
TABLE DEFINITION
CONSTRAINTS
TRIGGERS
RDF
RDFS
OWL
RIF
TIM
E
This looks cool, but let’s be realistic. What is the incentive to publish Linked Data?
What was your incentive to publish an HTML page in 1990?
1) Share data in documents2) Because you neighbor was doing it
… later on …3) Marketing, Advertising, …, SEO
So why should we publish Linked Data in 2011?
1) Share data as data2) Because you neighbor is doing it
…3) Marketing, Advertising, SEO ++
Linked Data Publishers• UK Government• US Government• BBC• Open Calais – Thomson Reuters• Freebase/Google• NY Times• Best Buy• Sears• Kmart• Overstock.com• CNET• Dbpedia• O’Reilly Media• …
May 2007
Oct 2007
Nov 2007
Feb 2008
Mar 2008
Sept 2008
Mar 2009 (1)
Mar 2009 (2)
July 2009
September 2010
September 2011
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
YOU GET THE PICTURE
ITS BIG and getting
BIGGER and
BIGGER
What is the Web
• Web of Documents HTML• Web of Data RDF• Global Unique IDs HTTP URIs• Schema/Ontologies OWL• Query RDF SPARQL
Now what can we do with this data?
Generic Applications
Linked Data Browsers
Linked Data Browsers
• Not actually separate browsers. Run inside of HTML browsers
• View the data that is returned after looking up a URI in tabular form
• User can navigate between data sources by following RDF Links
• (IMO) No usability
Linked Data Browsers
• http://browse.semanticweb.org/• Tabulator• OpenLink Dataexplorer• Zitgist• Marbles• Explorator• Disco• LinkSailor
Linked Data (Semantic Web) Search Engines
Linked Data (Semantic Web) Search Engines
• Just like conventional search engines (Google, Bing, Yahoo), crawl RDF documents and follow RDF links.– Current search engines don’t crawl data, unless it’s RDFa
• Human focus Search– Falcons - Keyword– SWSE – Keyworkd– VisiNav – Complex Queries
• Machine focus Search– Sindice – data instances– Swoogle - ontologies– Watson - ontologies– Uberblic – curated integrated data instances
(Semantic) SEO ++
• Markup your HTML with RDFa• Use standard vocabularies (ontologies)
– Google Vocabulary– Good Relations– Dublin Core
• Google and Yahoo will crawl this data and use it for better rendering
On-the-fly Mashups
http://sig.ma
Domain Specific Applications
Domain Specific Applications
• Government– Data.gov– Data.gov.uk– http://data-gov.tw.rpi.edu/wiki/Demos
• Music– Seevl.net
• Dbpedia Mobile• Life Science
– LinkedLifeData• Sports
– BBC World Cup
Faceted Browsers
http://dbpedia.neofonie.de/browse/
Query your data
Find all the locations of all the original paintings of Modigliani
Select all proteins that are linked to a curated interaction from the literature and to inflammatory response
http://linkedlifedata.com/
http://tata.csres.utexas.edu:8080/specify/data/taxon51807
http://tata.csres.utexas.edu:8080/specify/data/taxon51807
Links to other Data Sources
Linked Data is Data Integration
Specify
Morphbank
Morphster
SPARQLQuery
Diamond
Ultrawrap
Ultrawrap
Ultrawrap
Example 1 (Specify – DBpedia)
• Get full name and guid from taxon with id http://tata.csres.utexas.edu:8080/specify/data/taxon51807#thing
• AND fin any subjects it may have “skos:subject”
Result Example 1• Note that
http://dbpedia.org/resource/Category:Fish_of_Australia comes from a different data source (dbpedia.org)
Example 2 (Specify-Morphbank)
• Get full name and guid from taxon with id http://tata.csres.utexas.edu:8080/specify/data/taxon42947#thing
• AND the rank and kingdom from Morphbank
Result Example 2• Note that full name and guid come
from Specify http://tata.csres.utexas.edu:8080/specify/data/taxon42947
• AND rank and kingdom come from Morphbank http://tata.csres.utexas.edu:8080/morphbank/data/taxa398354
The killer app for Semantic Technology is YOUR life (online)
– Tom Gruber
A little semantics goes a long way
- Jim HendlerKnowledge is Power
- Jim Hendler
The novel part of the Semantic Web is not the Semantics, but the Web
- Frank van Harmelen
Occupy Your Data- Tim Finin
RAW DATA NOW- Tim Berners-Lee
Linked Data is the (Semantic) Web done right
- Tim Berners-Lee
QUESTIONS?