semantic web - introduction

What is the Semantic Web?

What is the Semantic Web?

www.pleso.net
for Codecamp 2009
Kyiv, Ukraine

2009-01-15, Amsterdam, The Netherlands
Ivan Herman, W3C

ReadWriteWeb

Microformats

This is just a generic slide set. Should be adapted, reviewed, possibly with slides removed, for a specific event. Rule of thumb: on the average, a slide is a minute

Lets organize a trip to Budapest using the Web!

You try to find a proper flight with

a big, reputable airline, or

the airline of the target country, or

or a low cost one

You have to find a hotel, so you look for

a really cheap accommodation, or

or a really luxurious one, or

and intermediate one

oops, that is no good, the page is in Hungarian that almost nobody understands, but

this one could work

Of course, you could decide to trust a specialized site

like this one, or

or this one

You may want to know something about Budapest; look for some photographs

on flickr

on Google

or you can look at mine

but you can also look at a (social) travel site

What happened here?

You had to consult a large number of sites, all different in style, purpose, possibly language

You had to mentally integrate all those information to achieve your goals

We all know that, sometimes, this is a long and tedious process!

All those pages are only tips of respective icebergs:

the real data is hidden somewhere in databases, XML files, Excel sheets,

you have only access to what the Web page designers allow you to see

Specialized sites (Expedia, TripAdvisor) do a bit more:

they gather and combine data from other sources (usually with the approval of the data owners)

but they still control how you see those sources

But sometimes you want to personalize: access the original data and combine it yourself!

Another example: social sites. I have a list of friends by

Dopplr,

Twine,

LinkedIn,

and, of course, the ubiquitous Facebook

I had to type in and connect with friends again and again for each site independently

This is even worse then before: I feed the icebergs, but I still do not have an easy access to data

What would we like to have?

Use the data on the Web the same way as we do with documents:

be able to link to data (independently of their presentation)

use that data the way I want (present it, mine it, etc)

agents, programs, scripts, etc, should be able to interpret part of that data

But wait! Isnt what mashup sites are already doing?

A mashup example:

In some ways, yes, and that shows the huge power of what such Web of data provides

But mashup sites are forced to do very ad-hoc jobs

various data sources expose their data via Web Services

each with a different API, a different logic, different structure

these sites are forced to reinvent the wheel many times because there is no standard way of doing things

Let us put it together

What we need for a Web of Data:

use URI-s to publish data, not only full documents

allow the data to link to other data

characterize/classify the data and the links (the terms) to convey some extra meaning

and use standards for all these!

So What is the Semantic Web?

It is a collection of standard technologies to realize a Web of Data

WWW GGG (Giant Global Graph)

It is that simple

Of course, the devil is in the details

a common model has to be provided for machines to describe, query, etc, the data and their connections

the classification of the terms can become very complex for specific knowledge areas: this is where ontologies, thesauri, etc, enter the game

but these details are fleshed out by experts as we speak!

Towards a Semantic Web

The current Web represents information using

natural language (English, Hungarian, Chinese,)

graphics, multimedia, page layout

Humans can process this easily

can deduce facts from partial information

can create mental associations

are used to various sensory information

(well, sort of people with disabilities may have serious problems on the Web with rich media!)

Towards a Semantic Web

Tasks often require to combine data on the Web:

hotel and travel information may come from different sites

searches in different digital libraries

etc.

Again, humans combine these information easily

even if different terminologies are used!

However

However: machines are ignorant!

partial information is unusable

difficult to make sense from, e.g., an image

drawing analogies automatically is difficult

difficult to combine information automatically

is same as ?

Example: automatic airline reservation

Your automatic airline reservation

knows about your preferences

builds up knowledge base using your past

can combine the local knowledge with remote services:

airline preferences

dietary requirements

calendaring

etc

It communicates with remote information

(M. Dertouzos: The Unfinished Revolution)

What is needed?

(Some) data should be available for machines for further processing

Data should be possibly combined, merged on a Web scale

Sometimes, data may describe other data

but sometimes the data is to be exchanged by itself, like my calendar or my travel preferences

Machines may also need to reason about that data

The rough structure of data integration

Map the various data onto an abstract data representation

make the data independent of its internal representation

Merge the resulting representations

Start making queries on the whole!

queries not possible on the individual data sets

A simplified bookstore data (dataset A)

1st: export your data as a set of relations

Some notes on the exporting the data

Data export does not necessarily mean physical conversion of the data

relations can be generated on-the-fly at query time

via SQL bridges

scraping HTML pages

extracting data from Excel sheets

etc.

One can export part of the data

Another bookstore data (dataset F)

2nd: export your second set of data

3rd: start merging your data

3rd: start merging your data (cont.)

3rd: merge identical resources

Start making queries

User of data F can now ask queries like:

give me the title of the original

This information is not in the dataset F

but can be retrieved by merging with dataset A!

However, more can be achieved

We feel that a:author and f:auteur should be the same

But an automatic merge does not know that!

Let us add some extra information to the merged data:

a:author same as f:auteur

both identify a Person

a term that a community may have already defined:

a Person is uniquely identified by his/her name and, say, homepage

it can be used as a category for certain type of resources

3rd revisited: use the extra knowledge

Start making richer queries!

User of dataset F can now query:

give me the home page of the originals author

The information is not in datasets F or A

but was made available by:

merging datasets A and datasets F

adding three simple extra statements as an extra glue

Combine with different datasets

Via, e.g., the Person, the dataset can be combined with other sources

For example, data in Wikipedia can be extracted using dedicated tools

Merge with Wikipedia data



It could become even more powerful

We could add extra knowledge to the merged datasets

e.g., a full classification of various types of library data

geographical information

etc.

This is where ontologies, extra rules, etc, come in

ontologies/rule sets can be relatively simple and small, or huge, or anything in between

Even more powerful queries can be asked as a result

Simple SPARQL example

SELECT ?isbn ?price ?currency # note: not ?x!WHERE {?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency.}

Simple SPARQL example

Returns:
[[,33,], [,50,], [,60,], [,78,$]]

SELECT ?isbn ?price ?currency # note: not ?x!WHERE {?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency.}

Pattern constraints

SELECT ?isbn ?price ?currency # note: not ?x!WHERE { ?isbn a:price ?x. ?x rdf:value ?price. ?x p:currency ?currency. FILTER(?currency == }

Returns: [[,50,], [,60,]]

What did we do? (cont)

The network effect

Through URI-s we can link any data to any data

The network effect is extended to the (Web) data

Mashup on steroids become possible

Semantic Web technologies stack

Yahoos SearchMonkey

Search results may be customized via small applications using content metadata in, eg, RDFa

Users can customize their search pages

Linking Open Data Project

Goal: expose open datasets in RDF

Set RDF links among the data items from different datasets

Billions triples, millions of links

The important point here is that (1) the data becomes available to the World via a unified format (ie, RDF), regardless on how it is stored inside and (2) the various datasets are interlinked together, ie, they are not independent islands. Dbpedia is probably the most important 'hub' in the project.

DBpedia: Extracting structured data from Wikipedia

http://en.wikipedia.org/wiki/Kolkata
dbpedia:native_name Kolkata (Calcutta)@en;
dbpedia:altitude 9; dbpedia:populationTotal 4580544;
dbpedia:population_metro 14681589;
geo:lat 22.56970024108887^^xsd:float; ...

Automatic links among open datasets

owl:sameAs ; ...

owl:sameAs wgs84_pos:lat 22.5697222; wgs84_pos:long 88.3697222; sws:population 4631392 ...DBpedia

Geonames

Processors can switch automatically from one to the other

Faviki: social bookmarking with Wiki tagging

Tag bookmarks via Wikipedia terms/DBpedia URIs

Helps disambiguating tag usage

Lots of Tools (not an exhaustive list!)

Categories:

Triple Stores

Inference engines

Converters

Search engines

Middleware

CMS

Semantic Web browsers

Development environments

Semantic Wikis

Some names:

Jena, AllegroGraph, Mulgara, Sesame, flickurl,

TopBraidSuite, Virtuoso environment, Falcon, Drupal7, Redland, Pellet,

Disco, Oracle11g, RacerPro, IODT, Ontobroker, OWLIM, Tallis Platform,

RDF Gateway, RDFLib, Open Anzo, DartGrid, Zitgist, Ontotext, Protg,

Thetus publisher, SemanticWorks, SWI-Prolog, RDFStore

Application patterns

It is fairly difficult to categorize applications (there are always overlaps)

With this caveat, some of the application patterns:

data integration (ie, integrating data from major databases)

intelligent (specialized) portals (with improved local search based on vocabularies and ontologies)

content and knowledge organization

knowledge representation, decision support

X2X integration (often combined with Web Services)

data registries, repositories

collaboration tools (eg, social network applications)

Microformats currently supported

hCalendar Putting Event & Todo data on the web (iCalendar)

hCard electronic business card/self-identification (vCard

rel-license To declare licenses for content

Example:

rel-tag Allow authors to assign keywords to stuff.

Example: ...

VoteLinks

XFN Distributed Social Networks (XHTML Friends Network )

Example: Molly Holzschlag

XOXO - eXtensible Open XHTML Outlines (you are looking at one!)

Microformats coming in
the not-so-distant future

adr - for marking up address information

geo - for marking up geographic coordinates (latitude; longitude)

hAtom - format to standardize feeds/syndicating episodic content (e.g. weblog postings)

hAudio

hProduct

hRecipe

hResume - for publishing resumes and CVs

Microformats coming in
the not-so-distant future (contd)

hReview -Publishing reviews of products, events, people, etc

rel-directory - distributed directory building

rel-enclosure - for indicating attachments (e.g. files) to download and cache

rel-home - indicate a hyperlink to the homepage of the site

rel-payment - indicate a payment mechanism

xFolk

Semantic Web

Machines talking to machines

Making the Web more 'intelligent

Tim Berners-Lee: computers "analyzing all the data on the Web the content, links, and transactions between people and computers.

Bottom Up = annotate, metadata, RDF!

Top Down = Simple

Image credit: dullhunkTop-down:

Leverage existing web information

Apply specific, vertical semantic knowledge

Deliver the results as a consumer-centric web app

Semantic Apps

What is a Semantic App?

- Not necessarily W3C Semantic Web

An app that determines the meaning of text and other data, and then creates connections for users

Data portability and connectibility are keys (ref: Nova Spivack)

Example: CalaisReuters, the international business and financial news giant, launched an API called Open Calais in Feb 08.

The API does a semantic markup on unstructured HTML documents - recognizing people, places, companies, and events. Ref: Reuters Wants The World To Be Tagged; Alex Iskold, ReadWriteWeb, Feb 08

Top 10 Semantic Web Products of 2008

Yahoo! SearchMonkey

Powerset

SearchMonkey allows developers to build applications on top of Yahoo! search, including allowing site owners to share structured data with Yahoo!, using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction.

Powerset (see our initial coverage here and here) is a natural language search engine. It's fair to say that Powerset has had a great 2008, having been acquired by Microsoft in July this year.

(acquired by Microsoft in '08)


Open Calais (Thomson Reuters)

Calais - a toolkit of products that enable users to incorporate semantic functionality within blog, content management system, website or application.

Dapper MashupAds

serve up a banner ad that's related to whatever movie this page happens to be about.


BooRah

BooRah is a restaurant review sit. BooRah uses semantic analysis and natural language processing to aggregate reviews from food blogs. Because of this, BooRah can recognize praise and criticism in these reviews and then rates restaurants accordingly.

BlueOrganizer (AdaptiveBlue)

AdaptiveBlue are makers of the Firefox plugin, BlueOrganizer.The basic idea behind is that it gives you added information about webpages you visit and offers useful links based on the subject matter.


Hakia

- a search engine focusing on natural language processing methods to try and deliver 'meaningful' search results. Hakia attempts to analyze the concept of a search query, in particular by doing sentence analysis.

TripIt

Tripit is an app that manages your travel planning.


Zemanta

Zemanta is a blogging tool to add relevant content to your posts. Users can now incorporate their own social networks, RSS feeds, and photos into their blog posts.

UpTake

Semantic search startup UpTake (formerly Kango) aims to make the process of booking travel online easier. Hotels and activities - over 400,000 of them - from more than 1,000 different travel sites. Over 20 million reviews, opinions.

Thanks!

http://www.pleso.net/
[email protected]

Credits:
* 2009-01-15, What is the Semantic Web? (in 15 minutes), Ivan Herman, ISOC New Years Reception in Amsterdam, the Netherlands
* 2008-09-24, Introduction to the Semantic Web (tutorial) Ivan Herman, 2nd European Semantic Technology Conference in Vienna, Austria
* ReadWriteWeb - Web Technology Trends for 2008 and Beyond (http://www.readwriteweb.com/), 10 best semantic applications
* Microformats (http://microformats.org/)

Click to edit the title text format

Click to edit the outline text format

Second Outline Level

Third Outline Level

Fourth Outline Level

Fifth Outline Level

Sixth Outline level

Copyright 2009, W3C

|

Click to edit the title text format

Click to edit the outline text format

Second Outline Level

Third Outline Level

Fourth Outline Level

Fifth Outline Level

Sixth Outline level

Copyright 2009, W3C

|

IDAuthorTitlePublisherYear

ISBN0-00-651409-Xid_xyzThe Glass Palaceid_qpr2000

???Page ??? (???)10/03/2008, 15:06:23Page / IDNameHome Pageid_xyzGhosh, Amitavhttp://www.amitavghosh.comIDPubl. NameCityid_qprHarpers CollinsLondon

???Page ??? (???)10/03/2008, 15:06:23Page / ABCDE

1IDTitreAuteurTraducteurOriginal

2ISBN0 2020386682Le Palais des miroirsA7A8ISBN-0-00-651409-X

3456Nom7Ghosh, Amitav8Besse, Christianne

???Page ??? (???)27/04/2008, 04:17:34Page /

semantic web - introduction

Education