semantic web - introduction
TRANSCRIPT
What is the Semantic Web?
What is the Semantic Web?
www.pleso.net
for Codecamp 2009
Kyiv, Ukraine
2009-01-15, Amsterdam, The Netherlands
Ivan Herman, W3C
ReadWriteWeb
Microformats
This is just a generic slide set. Should be adapted, reviewed, possibly with slides removed, for a specific event. Rule of thumb: on the average, a slide is a minute
Lets organize a trip to Budapest using the Web!
You try to find a proper flight with
a big, reputable airline, or
the airline of the target country, or
or a low cost one
You have to find a hotel, so you look for
a really cheap accommodation, or
or a really luxurious one, or
and intermediate one
oops, that is no good, the page is in Hungarian that almost nobody understands, but
this one could work
Of course, you could decide to trust a specialized site
like this one, or
or this one
You may want to know something about Budapest; look for some photographs
on flickr
on Google
or you can look at mine
but you can also look at a (social) travel site
What happened here?
You had to consult a large number of sites, all different in style, purpose, possibly language
You had to mentally integrate all those information to achieve your goals
We all know that, sometimes, this is a long and tedious process!
All those pages are only tips of respective icebergs:
the real data is hidden somewhere in databases, XML files, Excel sheets,
you have only access to what the Web page designers allow you to see
Specialized sites (Expedia, TripAdvisor) do a bit more:
they gather and combine data from other sources (usually with the approval of the data owners)
but they still control how you see those sources
But sometimes you want to personalize: access the original data and combine it yourself!
Another example: social sites. I have a list of friends by
Dopplr,
Twine,
LinkedIn,
and, of course, the ubiquitous Facebook
I had to type in and connect with friends again and again for each site independently
This is even worse then before: I feed the icebergs, but I still do not have an easy access to data
What would we like to have?
Use the data on the Web the same way as we do with documents:
be able to link to data (independently of their presentation)
use that data the way I want (present it, mine it, etc)
agents, programs, scripts, etc, should be able to interpret part of that data
But wait! Isnt what mashup sites are already doing?
A mashup example:
In some ways, yes, and that shows the huge power of what such Web of data provides
But mashup sites are forced to do very ad-hoc jobs
various data sources expose their data via Web Services
each with a different API, a different logic, different structure
these sites are forced to reinvent the wheel many times because there is no standard way of doing things
Let us put it together
What we need for a Web of Data:
use URI-s to publish data, not only full documents
allow the data to link to other data
characterize/classify the data and the links (the terms) to convey some extra meaning
and use standards for all these!
So What is the Semantic Web?
It is a collection of standard technologies to realize a Web of
Data
WWW GGG (Giant Global Graph)
It is that simple
Of course, the devil is in the details
a common model has to be provided for machines to describe, query, etc, the data and their connections
the classification of the terms can become very complex for specific knowledge areas: this is where ontologies, thesauri, etc, enter the game
but these details are fleshed out by experts as we speak!
Towards a Semantic Web
The current Web represents information using
natural language (English, Hungarian, Chinese,)
graphics, multimedia, page layout
Humans can process this easily
can deduce facts from partial information
can create mental associations
are used to various sensory information
(well, sort of people with disabilities may have serious problems on the Web with rich media!)
Towards a Semantic Web
Tasks often require to combine data on the Web:
hotel and travel information may come from different sites
searches in different digital libraries
etc.
Again, humans combine these information easily
even if different terminologies are used!
However
However: machines are ignorant!
partial information is unusable
difficult to make sense from, e.g., an image
drawing analogies automatically is difficult
difficult to combine information automatically
is same as ?
Example: automatic airline reservation
Your automatic airline reservation
knows about your preferences
builds up knowledge base using your past
can combine the local knowledge with remote services:
airline preferences
dietary requirements
calendaring
etc
It communicates with remote information
(M. Dertouzos: The Unfinished Revolution)
What is needed?
(Some) data should be available for machines for further processing
Data should be possibly combined, merged on a Web scale
Sometimes, data may describe other data
but sometimes the data is to be exchanged by itself, like my calendar or my travel preferences
Machines may also need to reason about that data
The rough structure of data integration
Map the various data onto an abstract data representation
make the data independent of its internal representation
Merge the resulting representations
Start making queries on the whole!
queries not possible on the individual data sets
A simplified bookstore data (dataset A)
1st: export your data as a set of relations
Some notes on the exporting the data
Data export does not necessarily mean physical conversion of the data
relations can be generated on-the-fly at query time
via SQL bridges
scraping HTML pages
extracting data from Excel sheets
etc.
One can export part of the data
Another bookstore data (dataset F)
2nd: export your second set of data
3rd: start merging your data
3rd: start merging your data (cont.)
3rd: merge identical resources
Start making queries
User of data F can now ask queries like:
give me the title of the original
This information is not in the dataset F
but can be retrieved by merging with dataset A!
However, more can be achieved
We feel that a:author and f:auteur should be the same
But an automatic merge does not know that!
Let us add some extra information to the merged data:
a:author same as f:auteur
both identify a Person
a term that a community may have already defined:
a Person is uniquely identified by his/her name and, say, homepage
it can be used as a category for certain type of resources
3rd revisited: use the extra knowledge
Start making richer queries!
User of dataset F can now query:
give me the home page of the originals author
The information is not in datasets F or A
but was made available by:
merging datasets A and datasets F
adding three simple extra statements as an extra glue
Combine with different datasets
Via, e.g., the Person, the dataset can be combined with other sources
For example, data in Wikipedia can be extracted using dedicated tools
Merge with Wikipedia data
Merge with Wikipedia data
Merge with Wikipedia data
It could become even more powerful
We could add extra knowledge to the merged datasets
e.g., a full classification of various types of library data
geographical information
etc.
This is where ontologies, extra rules, etc, come in
ontologies/rule sets can be relatively simple and small, or huge, or anything in between
Even more powerful queries can be asked as a result
Simple SPARQL example
SELECT ?isbn ?price ?currency # note: not ?x!WHERE {?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency.}
Simple SPARQL example
Returns:
[[,33,], [,50,], [,60,], [,78,$]]
SELECT ?isbn ?price ?currency # note: not ?x!WHERE {?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency.}
Pattern constraints
SELECT ?isbn ?price ?currency # note: not ?x!WHERE { ?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency. FILTER(?currency == }
Returns: [[,50,], [,60,]]
What did we do? (cont)
The network effect
Through URI-s we can link any data to any data
The network effect is extended to the (Web) data
Mashup on steroids become possible
Semantic Web technologies stack
Yahoos SearchMonkey
Search results may be customized via small applications using content metadata in, eg, RDFa
Users can customize their search pages
Linking Open Data Project
Goal: expose open datasets in RDF
Set RDF links among the data items from different datasets
Billions triples, millions of links
The important point here is that (1) the data becomes available to the World via a unified format (ie, RDF), regardless on how it is stored inside and (2) the various datasets are interlinked together, ie, they are not independent islands. Dbpedia is probably the most important 'hub' in the project.
DBpedia: Extracting structured data from Wikipedia
http://en.wikipedia.org/wiki/Kolkata
dbpedia:native_name Kolkata (Calcutta)@en;
dbpedia:altitude 9; dbpedia:populationTotal 4580544;
dbpedia:population_metro 14681589;
geo:lat 22.56970024108887^^xsd:float; ...
Automatic links among open datasets
owl:sameAs ; ...
owl:sameAs wgs84_pos:lat 22.5697222; wgs84_pos:long 88.3697222; sws:population 4631392 ...DBpedia
Geonames
Processors can switch automatically from one to the other
Faviki: social bookmarking with Wiki tagging
Tag bookmarks via Wikipedia terms/DBpedia URIs
Helps disambiguating tag usage
Lots of Tools (not an exhaustive list!)
Categories:
Triple Stores
Inference engines
Converters
Search engines
Middleware
CMS
Semantic Web browsers
Development environments
Semantic Wikis
Some names:
Jena, AllegroGraph, Mulgara, Sesame, flickurl,
TopBraidSuite, Virtuoso environment, Falcon, Drupal7, Redland, Pellet,
Disco, Oracle11g, RacerPro, IODT, Ontobroker, OWLIM, Tallis Platform,
RDF Gateway, RDFLib, Open Anzo, DartGrid, Zitgist, Ontotext, Protg,
Thetus publisher, SemanticWorks, SWI-Prolog, RDFStore
Application patterns
It is fairly difficult to categorize applications (there are always overlaps)
With this caveat, some of the application patterns:
data integration (ie, integrating data from major databases)
intelligent (specialized) portals (with improved local search based on vocabularies and ontologies)
content and knowledge organization
knowledge representation, decision support
X2X integration (often combined with Web Services)
data registries, repositories
collaboration tools (eg, social network applications)
Microformats currently supported
hCalendar Putting Event & Todo data on the web (iCalendar)
hCard electronic business card/self-identification (vCard
rel-license To declare licenses for content
Example:
rel-tag Allow authors to assign keywords to stuff.
Example: ...
VoteLinks
XFN Distributed Social Networks (XHTML Friends Network )
Example: Molly Holzschlag
XOXO - eXtensible Open XHTML Outlines (you are looking at one!)
Microformats coming in
the not-so-distant future
adr - for marking up address information
geo - for marking up geographic coordinates (latitude; longitude)
hAtom - format to standardize feeds/syndicating episodic content (e.g. weblog postings)
hAudio
hProduct
hRecipe
hResume - for publishing resumes and CVs
Microformats coming in
the not-so-distant future (contd)
hReview -Publishing reviews of products, events, people, etc
rel-directory - distributed directory building
rel-enclosure - for indicating attachments (e.g. files) to download and cache
rel-home - indicate a hyperlink to the homepage of the site
rel-payment - indicate a payment mechanism
xFolk
Semantic Web
Machines talking to machines
Making the Web more 'intelligent
Tim Berners-Lee: computers "analyzing all the data on the Web the content, links, and transactions between people and computers.
Bottom Up = annotate, metadata, RDF!
Top Down = Simple
Image credit: dullhunkTop-down:
Leverage existing web information
Apply specific, vertical semantic knowledge
Deliver the results as a consumer-centric web app
Semantic Apps
What is a Semantic App?
- Not necessarily W3C Semantic Web
An app that determines the meaning of text and other data, and then creates connections for users
Data portability and connectibility are keys (ref: Nova Spivack)
Example: CalaisReuters, the international business and financial news giant, launched an API called Open Calais in Feb 08.
The API does a semantic markup on unstructured HTML documents - recognizing people, places, companies, and events. Ref: Reuters Wants The World To Be Tagged; Alex Iskold, ReadWriteWeb, Feb 08
Top 10 Semantic Web Products of 2008
Yahoo! SearchMonkey
Powerset
SearchMonkey allows developers to build applications on top of Yahoo! search, including allowing site owners to share structured data with Yahoo!, using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction.
Powerset (see our initial coverage here and here) is a natural language search engine. It's fair to say that Powerset has had a great 2008, having been acquired by Microsoft in July this year.
(acquired by Microsoft in '08)
Top 10 Semantic Web Products of 2008
Open Calais (Thomson Reuters)
Calais - a toolkit of products that enable users to incorporate semantic functionality within blog, content management system, website or application.
Dapper MashupAds
serve up a banner ad that's related to whatever movie this page happens to be about.
Top 10 Semantic Web Products of 2008
BooRah
BooRah is a restaurant review sit. BooRah uses semantic analysis and natural language processing to aggregate reviews from food blogs. Because of this, BooRah can recognize praise and criticism in these reviews and then rates restaurants accordingly.
BlueOrganizer (AdaptiveBlue)
AdaptiveBlue are makers of the Firefox plugin, BlueOrganizer.The basic idea behind is that it gives you added information about webpages you visit and offers useful links based on the subject matter.
Top 10 Semantic Web Products of 2008
Hakia
- a search engine focusing on natural language processing methods to try and deliver 'meaningful' search results. Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis.
TripIt
Tripit is an app that manages your travel planning.
Top 10 Semantic Web Products of 2008
Zemanta
Zemanta is a blogging tool to add relevant content to your posts. Users can now incorporate their own social networks, RSS feeds, and photos into their blog posts.
UpTake
Semantic search startup UpTake (formerly Kango) aims to make the process of booking travel online easier. Hotels and activities - over 400,000 of them - from more than 1,000 different travel sites. Over 20 million reviews, opinions.
Thanks!
http://www.pleso.net/
[email protected]
Credits:
* 2009-01-15, What is the Semantic Web? (in 15 minutes), Ivan
Herman, ISOC New Years Reception in Amsterdam, the
Netherlands
* 2008-09-24, Introduction to the Semantic Web (tutorial) Ivan
Herman, 2nd European Semantic Technology Conference in Vienna,
Austria
* ReadWriteWeb - Web Technology Trends for 2008 and Beyond
(http://www.readwriteweb.com/), 10 best semantic applications
* Microformats (http://microformats.org/)
Click to edit the title text format
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level
Sixth Outline level
Copyright 2009, W3C
|
Click to edit the title text format
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level
Sixth Outline level
Copyright 2009, W3C
|
IDAuthorTitlePublisherYear
ISBN0-00-651409-Xid_xyzThe Glass Palaceid_qpr2000
???Page ??? (???)10/03/2008, 15:06:23Page / IDNameHome Pageid_xyzGhosh, Amitavhttp://www.amitavghosh.comIDPubl. NameCityid_qprHarpers CollinsLondon
???Page ??? (???)10/03/2008, 15:06:23Page / ABCDE
1IDTitreAuteurTraducteurOriginal
2ISBN0 2020386682Le Palais des miroirsA7A8ISBN-0-00-651409-X
3456Nom7Ghosh, Amitav8Besse, Christianne
???Page ??? (???)27/04/2008, 04:17:34Page /