towards the semantic web

30
TAP R.V.Guha, IBM Research Rob McCool, Stanford KSL

Upload: guest3bd2a12

Post on 08-Jul-2015

1.451 views

Category:

Education


4 download

TRANSCRIPT

Page 1: Towards The Semantic Web

TAP

R.V.Guha, IBM ResearchRob McCool, Stanford KSL

Page 2: Towards The Semantic Web

TAP: Context• Islands of XML from disparate web services• Example : Tori Amos

• Up to consumer to put these chunks together• Situation analogous to pre-web hypertext systems and

RDBMS today

Page 3: Towards The Semantic Web

TAP Goal

• Create a coherent semantic web from disparate chunks

• Effectively make the web a giant distributed DB• Why --- Bringing the Internet to programs

Page 4: Towards The Semantic Web

TAP: What We Do

• Inspired by DNS and early web --- simple contracts, everything decentralized

• Protocols to publish & navigate – a small simple set of publishing & access guidelines

that knit together schematically unified whole create

• Bootstrapping: Create comprehensive chunks of the semantic web in a few areas

• Applications: Semantic Search, Internet Wet Lab

Page 5: Towards The Semantic Web

TAP Protocol : GetData

• Simple API to navigate this web• DNS : GetHostByName(<host>) => ip addr.

• TAP: GetData(<resource>, <property>) => value – GetData(<Tori Amos>, birthplace) => <Newton, NC>– GetData(<Newton, NC>, temperature) => 57 F– GetData(<Newton, NC>, locatedIn) => <North Carolina>

• Publisher exposes data as a graph via GetData • Consumer uses GetData to navigate graph• Key tech. issues : Caching, Directories, Names

Page 6: Towards The Semantic Web

The Name Problem We don’t get nice sub-graphs like these, with easy to use assembly instructions

Date Of Birth

“8/22/63”

Musician

Crucify

Under The Pink

North Carolina

USALocated in

City

Music Album

instanceof

instanceof

Located in

62 Ftemperature

Author

EMI

Atlantic

publisher

birth

place

publisher

instanceof

instan

ceof Author

Geo Almanac

Weather channel

CDNow

People Magazine

Newton, NC

Newton, NC

Newton, NC

Tori Amos

Tori Amos

Page 7: Towards The Semantic Web

We get a mess like this

Date Of Birth

“8/22/63”

Musician

Crucify

Under The Pink

North CarolinaUSA

Located in

City

Music Album

instanceof

instanceof

Located in

62 Ftemperature

Author

EMI

Atlantic

publisher

birth

place

publisher

instanceof

instan

ceof Author

Geo Almanac

Weather channel

CDNow

People Magazine

NTNC

Newton,_NorthCar

USNC0491

0,9855,109071,00

328723677

Page 8: Towards The Semantic Web

The Name Problem

• Names are crucial in information exchange– 2 parties cannot exchange information about an object

without agreeing on how they are going to refer to it

• The Problem : too many names to keep track off!

– No URN for <Newton, NC> or <Tori Amos>– Different sites have different names for the same thing!– URN efforts to date largely failures– Traditional Approach : Name-Mapping tables

Page 9: Towards The Semantic Web

Date Of Birth

“8/22/63”

Musician

Crucify

Under The Pink

North CarolinaUSA

Located in

City

Music Album

instanceof

instanceof

Located in62 Ftemperature

Author

EMI

Atlantic

publisher

birth

place

publisher

instanceof

instan

ceof Author

Geo Almanac

Weather channel

CDNow People Magazine

NTNC

Newton,_NorthCar

USNC0491

0,9855,109071,00

328723677

Calling program

328723677 <-> 0,9855,1…USNC0491 <-> NTNC <-> ...

NTNC

Newton,_Nor…0,9855, …

328723677

USNC0491

Page 10: Towards The Semantic Web

TAP Naming

• Reference by descriptions– E.g., “A Musician whose firstName is ’Tori’ and

whose lastName is ‘Amos’ and whose …” – Names are degenerate descriptions

• Amzn:B000002UB2, CDNOW: 328723677 – Description based name negotiation

• Core Insight – Don’t require globally unique names for everything if

we can describe things using a starting vocabulary – Need a description language, starting vocabulary and

negotiation mechanism– Bootstrapping some shared meaning into more

shared meaning

Page 11: Towards The Semantic Web

The vision: descriptions choreograph the integration

Date Of Birth

“8/22/63”

Musician

Crucify

Under The Pink

North Carolina

USALocated in

City

Music Album

instanceof

instanceofLocated in

62 Ftemperature

Author

EMI

Atlantic

publisher

birth

place

publisher

instanceof

instan

ceof Author

Geo AlmanacWeather channel

CDNow

People Magazine

NTNC

Newton,_NorthCar

USNC0491

0,9855,109071,00328723677

Calling program

D1

D1

D1, D2

D2

D1 = description of Newton, NCD2 = description of Tori Amos

Page 12: Towards The Semantic Web

Description based References

• The core protocol : GetData – GetData(Resource Description, arc-label)– GetData(<Tori Amos>, birthplace) – GetData(RDF Description of Tori Amos, birthplace)

• A form of loose coupling:– Handling Ambiguity, Failure to denote, …

• The core contract:– Expose your data as a Graph– Map incoming descriptions to nodes in your graph

• In return, your data is now integrated into the global semantic web

Page 13: Towards The Semantic Web

Infrastructure: Kernel Vocabulary

• Provides vocabulary for descriptions

• Purpose is to provide the infrastructure for constructing descriptions with which programs can refer to things

• “A Musician whose firstName is ’Tori’ and whose lastName is ‘Amos’ and whose

• It doesn’t reside anywhere : it’s a specification

Page 14: Towards The Semantic Web

Applications

• Good infrastructures have waves of applications– WWW : home pages, portals, ecommerce, …– DNS : email, telnet, ftp, gopher, … WWW

• Semantic Search– Adding Semantics to Search – Crawl, grab, index model of search doesn’t work for

dynamic web sites or web applications– Semantic based Search Augmentation enables search

to cover time sensitive data

• Internet Wet Lab

Page 15: Towards The Semantic Web

Semantic Web Application: Semantic Search

Page 16: Towards The Semantic Web

Search Augmentation Example

Page 17: Towards The Semantic Web

How the Semantic Infrastructure gets used in Semantic Search

Search Front End

“Yo Yo Ma”

Musician whose genre is ClassicalMusic,First name is …

Who has - concert dates? - discography? - auctions? - bio?For musician whose

EBay CDNow AllMusic TicketMaster

KB

UDDI++

Concert Dates for Musician whose …

Bio for …

Discography for …

Auctions for …

Caching & Buffering

Page 18: Towards The Semantic Web

TAP KBs for Semantic Search

• Large Knowledge Base of specific musicians, cities, athletes, …– Currently covers about 20% of search terms– Built in a largely automated fashion

• Scrapers for free data sources• Simple noun phrase analysis of news articles

– AP, Reuters, …

• Scrapers for important sites to bootstrap

• KB also helps bootstrap the semantic web

Page 19: Towards The Semantic Web

KB Coverage Today

• Music – Musicians, instr., styles

• Movies– Movies, actors, tv-shows

• Authors– Top authors, classic books,

• Sports– Athletes, sports, sports

teams, equipment

• Autos– Auto models, motorcycles, .

• Companies– Fortune 500

• Home Appliances– Types, brands

• Toys– Types, brands

• Baby products– Types, brands

• Places– Countries, cities, tourist

attractions, …

• Consumer electronics – Audio/Video, Communication– Game : consoles, titles, …

• Health – Diseases, Drugs, …

Page 20: Towards The Semantic Web

Semantic Site Search

• Semantic Search useful not just for internet wide search, but also for site search

• Same principles as internet-wide search• KBs created for searching related individual

sites can be shared between sites• These KBs feed into global semantic web• Example: Semantic Search for www.w3.org

Page 21: Towards The Semantic Web

TAP Appl: Internet Wet Lab

• In many sciences, more data will be produced in the next 2 years than exists today

• Increasingly, research consists of writing programs that mine this data

• Data is isolated as islands in different labs• Data from one lab not easily available to

programs in another lab• We want to use TAP to create a single virtual

net-wide “database” containing all this experimental data

• Example : Clinical Trial Data

Page 22: Towards The Semantic Web

TAP Organization

• TAP is a multi-organization research effort– IBM, Stanford KSL, Stanford Logic Group, CMU West,

• KBs, source-code, etc. freely available (via BSD license)

• A number of new projects starting up … places, entertainment, …

• We invite you to join

• URL: http://tap.stanford.edu/

Page 23: Towards The Semantic Web

TAP: Summary

• Small set of guidelines that create a coherent semantic web out of disparate web services

• Potential solution to naming problem– Relevant to all web services

• Semantics Search & Internet Wet Lab as driving applications

• TAP is a research project – Lot of fundamental work remains to be done– Everything freely available. We want you to join!

Page 24: Towards The Semantic Web

Questions

Page 25: Towards The Semantic Web

Date Of Birth

“8/22/63”

MusicianCrucify

Under The Pink

Newton, NC

North Carolina

USA

US State

Located in

Tori Amos

City

Music Album

Country

instanceof

instanceof

instanceof

Located in

62 Ftemperature

Author

EMI

Atlanticpublisher

Weather channel

Bg KB

People Magazine

CDNow

Geo Almanac

birthplace

publisher

instanceof

instanceof

instan

ceof

Author

Page 26: Towards The Semantic Web

Date Of Birth

“8/22/63”

Musician

Crucify

Under The Pink

North CarolinaUSA

Located in

City

Music Album

instanceof

instanceof

Located in

62 Ftemperature

Author

EMI

Atlantic

publisher

birth

place

publisher

instanceof

instan

ceof Author

Geo AlmanacWeather channel

CDNow

People Magazine

Newton, NC

Newton, NC

Newton, NC

Tori Amos

Tori Amos

Page 27: Towards The Semantic Web

TAP : Summary

• Focus is shifting from just storing and retrieving data to exchanging data. XML provides syntax. We need semantics

• We need infrastructure layer for semantics

• Applications drive infrastructures. The driving application for this layer is Semantics based Search & News Augmentation.

Page 28: Towards The Semantic Web

What is an Internet Infrastructure Layer?

• There is a data structure, pieces of which are in different places on the net

• DNS: Hash table of host names to ip addresses accessed via GetHostByName

• WWW : Directed graph of documents accessed via HTTP GET/POST

• Infrastructure layer provides a set of standards & APIs to unify the different pieces so that a client can pretend it is all local

Page 29: Towards The Semantic Web

Application 2 : RTA for news articles

Page 30: Towards The Semantic Web

RTA for News Articles

Search/SyndicationFront End

News article

SportsTeam_TexasRangers,AthleteRodriguez_Alex …

Whose - team schedule? - posters? - auctions? - bio?

EBay AOL Shopping AllPosters MLB.com

Knowledge Base

Directory

Team Schedule for team whose title …Poster for …

Videos for …

Auctions for …

Text analysis