coms e6125 web-enhanced information management (whim)

80
February 22, 2011 COMS 6125 1 COMS E6125 Web-enHanced COMS E6125 Web-enHanced Information Management Information Management (WHIM) (WHIM) Prof. Gail Kaiser Prof. Gail Kaiser Spring 2011 Spring 2011

Upload: conan-hooper

Post on 31-Dec-2015

22 views

Category:

Documents


1 download

DESCRIPTION

COMS E6125 Web-enHanced Information Management (WHIM). Prof. Gail Kaiser Spring 2011. Today’s Topic:. Introduction to the Semantic Web RDF Ontologies. Simplicity is Good. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 1

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

Prof. Gail KaiserProf. Gail Kaiser

Spring 2011Spring 2011

Page 2: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 2

Today’s Topic:

• Introduction to theSemantic Web

• RDF• Ontologies

Page 3: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 3

Simplicity is Good• The World Wide Web contains huge amounts

of information created by many different organizations, communities and individuals for many different reasons

• Web users can easily access this information by specifying a known URL or using a search engine, and following links to find other related resources

• This simplicity is a key aspect that made the Web so popular

Page 4: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 4

Simplicity is Bad• The simplicity of the current Web has a price• It is very easy to get lost, or discover irrelevant

or unrelated information• For instance, if we search for courses taught by

a person named “Gail Kaiser”, we might find all kinds of other information

• http://www.google.com/search?hl=&q=course+taught+by+gail+kaiser&sourceid=navclient-ff&rlz=1B3GGGL_enUS253US253&ie=UTF-8

• The problem is that the search engine does know what “courses” or “taught” means

Page 5: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 5

Machine accessible meaning

(What it’s like to be a machine)

CV

name

education

work

private

Page 6: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 6

So what does this mean?

• What’s a “CV”?• What’s a “name”?• Etc.Need semantics

Page 7: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 7

What to do?• Develop enabling standards and

technologies – to help machines understand more

information on the Web – so that they can support richer

discovery, data integration, navigation and automation of tasks

Page 8: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 8

Add Metadata• Associate semantically rich, descriptive

information with any resource• For instance, add metadata about

teaching, so we can search for documents that have metadata specifying “Gail Kaiser” as a “teacher” (or “instructor”)

Page 9: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 9

The Semantic Web• Provides a common framework that allows

data to be shared and reused across application, enterprise and community boundaries

• Not only provides URLs for documents, but to people, concepts and relationships

• By giving unique identifiers to the person, the role “teacher” and the concept of “course”, we make very clear who the person is and the corresponding relation between this person and a particular document

Page 10: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 10

What’s the difference?• Most Web content today is designed for humans

to read, not for computer programs to manipulate meaningfully

• Computers can adeptly parse Web pages for layout and routine processing—here a header, there a link to another page—but in general, computers have no reliable way to process the semantics

• The Semantic Web brings structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can carry out sophisticated tasks for users

Page 11: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 11

What’s the difference?

The Semantic Web is not a separate web but an extension of the current web, in which information is given well-defined meaning, better enabling computers and people to work in co-operation.

[Berners-Lee et al., 2001]

Page 12: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 12

Wasn’t that what XML was supposed to do?

• Yes and no• For the Semantic Web to function,

computers must have access to structured collections of information and to sets of inference rules that they can use to conduct automated reasoning

Page 13: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 13

Isn’t that just Knowledge

Representation?• Traditional knowledge representation

systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as “parent” or “vehicle”

• But central control is stifling, and doesn’t scale

• Which is why centralized hypertext link servers were abandoned for WWW

Page 14: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 14

What about Web Services?

• Web services are computational programs accessed using Web technologies

• They may or may not operate on Web pages as data

• But when they do, the semantics are implied by WSDL descriptions but basically hidden inside the code

• There is no way for an arbitrary Web service or other program to “understand” the semantics of Web pages

Page 15: COMS E6125 Web-enHanced Information Management (WHIM)

Semantic Web Layers(T. Berners-Lee)

15

Page 16: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 16

Start with XML, not HTML

<H1>WHIM</H1><UL>

<LI>Instructor: Gail Kaiser<LI>Students: Donald Duck

</UL>

<H1>WHIM</H1><UL>

<LI>Instructor: Gail Kaiser<LI>Students: Donald Duck

</UL>

HTML:

<course date=“Spring 2011”><title>WHIM</title><instructor>Gail Kaiser</instructor><students>Donald Duck</students>

</course>

<course date=“Spring 2011”><title>WHIM</title><instructor>Gail Kaiser</instructor><students>Donald Duck</students>

</course>

XML:

Page 17: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 17

XML document = labeled tree

course

instructortitle students

name http

<course date=“...”> <title>...</title> <instructor>...</instructor>

<name>...</name><http>...</http>

<students>...</students></course>

=

• XML Schema: grammars for describing legal trees and datatypes

• node = label + attr/values + contents

Page 18: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 18

Why not use XML Tags to represent Semantics?

• Syntax: the structure of your data • Semantics: the meaning of your data• Two conditions necessary for

interoperability:– Adopt a common syntax: enables applications

to parse the data – Adopt a means for understanding the

semantics: enables applications to use the data

Page 19: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 19

XML and Semantics?<title> … <title>• But what does “title” mean?• If we ask google, we get (on the 1st page)

– Boxing and martial arts equipment– Prefix or suffix added to person’s name– HTML tag– Women’s underwear– US Laws– Home purchase insurance– Library search

Page 20: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 20

XML Limitations for Semantic Markup

• XML makes no commitment on: Domain-specific vocabulary Modeling primitives

• Requires pre-arranged agreement on &

• Only feasible for closed collaboration– agents in a small & stable community– pages on a small & stable intranet

• Not suited for sharing Web resources

Page 21: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 21

XML machine accessible meaning

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

Page 22: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 22

Beyond XML• XML lets everyone create their own

tags • Scripts, or programs, can make use of

these tags in sophisticated ways - but the programmer has to know what the page writer uses each tag for

• XML allows users to add structure to their documents but says nothing about what the structures mean

Page 23: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 23

Semantic Web Layers

Page 24: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 24

Add RDF = Resource Description Framework• Encodes meaning in sets of triples - subject,

predicate and object - analogous to the subject, verb and object of an elementary sentence

• Makes assertions that particular things (people, Web pages or whatever) have properties (such as “is a sister of”, “is the author of”) with certain values (another person, another Web page)

• This structure can describe much of the data processed by machines

Page 25: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 25

Example• Imagine that we want to state the fact

that someone named Gail Kaiser wrote a particular Web page

• A straightforward way to state this in English would be in the form of a simple statement such as:

http://www.cs.columbia.edu/~kaiser/index.html has an author whose value is Gail Kaiser

Page 26: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 26

Making Statements about Resources

• We need a way to identify the thing we want to describe (the Web page)

• We need a way to identify a specific property (author) of the thing that we want to describe

• We need a way to identify the thing we want to assign as the value of this property (who the author is), for the thing we want to describe

Page 27: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 27

Making Statements about Resources

• In the example, we used the Web page's URL (Uniform Resource Locator) to identify it - subject

• We used the word “author” to identify the property we want to talk about - predicate

• And the phrase “Gail Kaiser” to identify the thing (a person) we want to say is the value of this property - object

Page 28: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 28

Many Statements can be made

• We could state other properties of this Web page by writing additional English statements of the same general form

http://www.cs.columbia.edu/~kaiser/index.html has a modification-date whose value is January 07, 2011

http://www.cs.columbia.edu/~kaiser/index.html has a size whose value is 18,985 bytes

Page 29: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 29

But what do these Statements actually

mean?• Subject and object can each be identified by a

URL, just as used in a link on a Web page• The verbs – predicates – can also be identified

by URLs, which enables anyone to define a new concept, a new predicate, just by defining a URL for it somewhere on the Web (a “Web resource”)

• The URLs ensure that concepts are not just words in a document, but are tied to a unique definition that everyone can find on the Web

Page 30: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 30

Web Resources• RDF is a language for representing

information about resources on the World Wide Web

• It is particularly intended for representing metadata about Web resources, such as the title, author, modification date and size of a Web page

Page 31: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 31

Generalized Resources• By generalizing the concept of a “Web

resource”, RDF can be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web

• Examples include the author of the web page

Page 32: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 32

Reconsider Examplehttp://www.cs.columbia.edu/~kaiser/

index.html has an author whose value is Gail Kaiser

Neither the notion of a “author” nor Gail Kaiser can be retrieved from the Web

Thus we need URIs in addition to URLs

Page 33: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 33

Concept Graphs• RDF is based on the idea of identifying

things using URIs• And describing resources (subjects) in

terms of simple properties (verbs or predicates) and property values (objects)

• This enables RDF to represent related concepts as a graph of nodes and arcs representing the resources, their properties and values

Page 34: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 34

Concept Graph Example• XML syntax• Chained triples form a graph

http://bank.cs.columbia.edu/classes/cs6125/

site-owner

Kaiserkaiser+6125@...

emailW3C

describes

http://www.w3.org/RDF

site-owner

<rdf:Description rdf:about=“#Kaiser”> <email>kaiser+6125@...</email></rdf:Description>

Page 35: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 35

Information Exchange• RDF provides a common framework for expressing

this information so it can be exchanged between applications without loss of meaning

• The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created

• Application designers can leverage the availability of common RDF parsers and processing tools

• RDF is written in XML format further leveraging XML tools and experience

Page 36: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 36

What is RDF (again) ?• RDF is a data model

– the model is domain-neutral and application-neutral

– the model can be viewed as directed, labeled graphs or as an object-oriented model (object/attribute/value)

• RDF data model is an abstract, conceptual layer independent of XML

• consequently, XML is a transfer syntax for RDF, not a component of RDF

• RDF data might never occur in XML form

Page 37: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 37

RDF Model

• RDF “statements” consist ofresources (= nodes)

which have propertieswhich have values (= nodes,strings)

= subject= predicate= object

Page 38: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 38

RDF Model

http://www.w3.org/TR/REC-rdf-syntax/

“Dave Beckett”

editor

“http://www.w3.org/TR/REC-rdf-syntax/ has the editor Dave Beckett”

resource valueproperty

Page 39: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 39

RDF Model Example

http://www.w3.org/TR/REC-rdf-syntax/

“Dave Beckett”

dc:Creator

“2004-02-10”

dc:Date

“W3C”

dc:Publisher

Page 40: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 40

Complex Values• So far, values of properties have been

strings• A graph node (corresponding to a resource)

also can be the value of a property–arbitrarily complex tree and graph structures are possible

–syntactically, values can be embedded (i.e., lexically in-line) or referenced (linked)

Page 41: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 41

Complex Values

http://www.w3.org/TR/REC-rdf-syntax/

“Dave Beckett”

dc:Creator

“mailto:[email protected]

p:EMail

p:Name

Page 42: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 42

Complex Values• Corresponding triples

{ “http://www.w3.org/TR/REC-rdf-syntax/”, dc:Creator, x }

{ x, p:Name, “Dave Beckett” }{ x, p:EMail, “[email protected]” }

http://www.w3.org/TR/REC-rdf-syntax/

“Dave Beckett”

dc:Creator

“mailto:[email protected]

p:EMail

p:Name

Page 43: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 43

Containers• Containers are collections - allow grouping of

resources (or literal values)• It is possible to make statements about the

container (as a whole) or about its members individually Different types of containers– bag - unordered collection– seq - ordered collection (= “sequence”)– alt - represents alternatives

• It is possible to create collections based on URI patterns – e.g., all files in a particular web site

• Duplicate values are permitted - no mechanism to enforce unique value constraints

Page 44: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 44

Containers

http://www.w3.org/TR/REC-rdf-syntax

“Dave Beckett”

rdf:_1

rdf:Seq

dc:Creator

rdf:Type

“Brian McBride”

rdf:_2

Page 45: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 45

Higher-order Statements• One can make RDF statements about other RDF

statements• Example: “The Library of Congress affiliates Dave

Beckett as the author of the RDF Syntax spec”• Allow us to express beliefs (and other modalities)• Important for trust models, digital signatures, etc.• Constitute metadata about metadata• Represented by modeling RDF in RDF itself

Page 46: COMS E6125 Web-enHanced Information Management (WHIM)

Reification

http://www.w3.org/TR/REC-rdf-syntax “Dave Beckett”dc:Creator

“Library of Congress”

dc:Creator

• The dotted box corresponds to the following statements

• { x, rdf:predicate, “dc:creator” }• { x, rdf:subject, “http://www.w3.org/TR/REC-rdf-syntax }• { x, rdf:object, “Dave Beckett” }• { x, rdf:type, “rdf:statement” }

February 22, 2011 46COMS 6125

Page 47: COMS E6125 Web-enHanced Information Management (WHIM)

Reification• Reification allows a computer to process an

abstraction as if it were any other datum • RDF is not really second-order• But it does provide a built-in predicate

vocabulary for reification

February 22, 2011 47COMS 6125

Page 48: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 48

Reification

pers05 ISBN...Author-of

NYT claims

<rdf:Description rdf:about=“#NYT”> <claims> <rdf:Description rdf:about=“#pers05”> <authorOf>ISBN...</authorOf> </rdf:Description> </claims></rdf:Description>

Any statement can be an object (graphs can be nested)

Page 49: COMS E6125 Web-enHanced Information Management (WHIM)

49

RDF Schema • Defines small vocabulary for RDF:

• Class, subClassOf, type• Property, subPropertyOf• domain, range

• Organizes this vocabulary in a typed hierarchy

• Vocabulary can be used to define other vocabularies for your application domain

Person

Student Researcher

subClassOfsubClassOf

type

hasSuperVisordomain range

Swap

type

hasSuperVisor Gail

Page 50: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 50

<rdf:Description ID="MotorVehicle"> <rdf:type resource="http://www.w3.org/...#Class"/> <rdf:subClassOf rdf:resource="http://www.w3.org/...#Resource"/></rdf:Description>

<rdf:Description ID="Truck"> <rdf:type resource="http://www.w3.org/...#Class"/> <rdf:subClassOf rdf:resource="#MotorVehicle"/></rdf:Description>

<rdf:Description ID="registeredTo"> <rdf:type resource="http://www.w3.org/...#Property"/> <rdf:domain rdf:resource="#MotorVehicle"/> <rdf:range rdf:resource="#Person"/></rdf:Description>

<rdf:Description ID=”ownedBy"> <rdf:type resource="http://www.w3.org/...#Property"/> <rdf:subPropertyOf rdf:resource="#registeredTo"/></rdf:Description>

RDF Schema syntax in XML

Page 51: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 51

Conclusions about RDF• Next step up from plain XML

– modeling primitives– possible to define vocabulary

• However:– no precisely described meaning– no inference model

• Problematic examples: • “Columbus believed that the world is flat”• “Gloria believes that the Web should be delivered

on CD-ROM”

Page 52: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 52

Where do we get the precisely defined

meaning?• Two databases may use different identifiers for

the same concept, such as zip code vs. postal code

• A program that wants to compare or combine information across the two databases has to know that these two terms mean the same thing

• The program must have a way to discover such common meanings for whatever databases it encounters

• A solution to this problem is provided by collections of information called ontologies

Page 53: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 53

Semantic Web Layers

Page 54: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 54

What is an Ontology?

• In philosophy, an ontology is a theory about the nature of existence, of what types of things exist; ontology as a discipline studies such theories

• Semantic Web researchers (and various other communities) have co-opted the term for their own jargon

• For Semantic Web researchers, an ontology is a document or file that formally defines the relationships among terms

• The most typical kind of ontology for the Web has a taxonomy and a set of inference rules

Page 55: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 55

What is a Taxonomy?

Taxonomy = segmentation, classification and ordering of elements into a classification system according to the relationships between each other

Object

Person Topic Document

ResearcherStudent Semantics

OntologyDoctoral Student PhD Student F-Logic

Menu

Page 56: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 56

Taxonomies• A taxonomy defines classes of objects and

relations among them• For example, an address may be defined as a

type of location, and city codes may be defined to apply only to locations

• If city codes must be of type city and cities generally have Web sites, we can discuss the Web site associated with a city code even if no database links a city code directly to a Web site

Page 57: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 57

An Ontology also provides a form of Thesaurus

Object

Person Topic Document

Researcher

Student

Semantics

PhD StudentDoctoral Student

• Terminology for specific domain• Graph with primitives, fixed relationships (similar, synonym)

similarsynonym

OntologyF-Logic

Menu

Page 58: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 58

An Ontology also provides a Topic Map

• Topics (nodes), relationships and occurrences (to documents)• Useful for navigation and visualization

Object

Person Topic Document

ResearcherStudent Semantics

PhD StudentDoctoral Student

knows described_in

writes

AffiliationTel

OntologyF-Logic

similarsynonym

Menu

Page 59: COMS E6125 Web-enHanced Information Management (WHIM)

OntologyF-Logic

similar

PhD StudentDoctoral Student

The Taxonomy is Augmented by Inference Rules

Object

Person Topic Document

Tel

Semantics

knows described_in

writes

Affiliation

described_in is_about

knowsP writes D is_about T P T

DT T D

Rules

ResearcherStudent

instance_of

is_a

is_a

is_a

Swapneel Sheth

59

Page 60: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 60

Inference Rules• An ontology may express the rule “If a city code is

associated with a state code, and an address uses that city code, then that address has the associated state code”

• A program could then deduce, for instance, that a Columbia University address, being in New York City, must be in New York State, which is in the U.S., and therefore should be formatted to U.S. standards

• The computer doesn't truly “understand” any of this information

• But it can now manipulate the terms much more effectively in ways that are useful and meaningful to the human user

Page 61: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 61

Solution to Terminology Problems

• The meaning of terms or XML tags used on a Web page can be defined by pointers from the page to an ontology

• The same problems as before now arise if I point to an ontology that defines addresses as containing a zip code and you point to one that uses postal code

• This can be resolved if ontologies (or other Web services) provide equivalence relations: one or both of our ontologies may contain the information that my zip code is equivalent to your postal code

Page 62: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 62

Using Ontologies• Ontologies can be used in a simple fashion to

improve the accuracy of Web searches• The search program can look for only those

pages that refer to a precise concept instead of all the ones using ambiguous keywords

• More advanced applications could use ontologies to relate the information on a page to the associated knowledge structures and inference rules

Page 63: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 63

Example• Suppose you wish to find the Ms. Cook

you met at a trade conference last year• You don't remember her first name, but

you remember that she worked for one of your clients and that her brother was a student at your alma mater

Page 64: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 64

Example• An intelligent search program can sift

through all the pages of people whose name is “Cook”

• Sidestep all the pages relating to cooks, cooking, the Cook Islands and so forth

• Find the person named Cook who works for a company that's on your client list

• And follow links to Web pages of their relatives to track down if any are in school at the right place

Page 65: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 65

Agents• The real power of the Semantic Web will be

realized when people create (many) programs that collect Web content from diverse sources, process the information and exchange the results with other programs

• The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available

Page 66: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 66

Proofs• The Semantic Web promotes this

synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data comes with semantics

• An important facet of agents' functioning will be the exchange of “proofs”

Page 67: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 67

Example• Suppose Ms. Cook's contact information

has been located by an online service, and places her in Baghdad

• You want to check this, so your computer asks the service for a proof of its answer

• An inference engine on your computer verifies this proof, i.e., that this Ms. Cook indeed matches the one you were seeking, and it can show you the relevant Web pages if you still have doubts

Page 68: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 68

Service Discovery

• Many automated Web-based services already exist without semantics

• But current service discovery initiatives attack the problem at a structural or syntactic level, and rely heavily on standardization of a predetermined set of functionality descriptions

Page 69: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 69

Service Discovery• Other programs such as agents have no way to

locate a service that will perform a specific function• This process can happen only when there is a

common language to describe a service in a way that lets other agents “understand” both the function offered and how to take advantage of it

• The consumer and producer agents can reach a shared understanding by exchanging ontologies, which provide the vocabulary needed for discussion

• Semantics also makes it easier to take advantage of a service that only partially matches a request

Page 70: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 70

Non-Web Applications• The Semantic Web can extend into our

physical world• URIs can point to anything, including physical

entities, which means we can use RDF to describe devices such as cell phones and TVs

• Such devices can advertise their functionality —what they can do and how they are controlled —much like software agents

• Semantic descriptions of device capabilities and functionality will let us achieve “home automation” with minimal human intervention

Page 71: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 71

Examples• When you answer your phone, other sound is

automatically turned down– Instead of having to program each specific

appliance, you could program such a function once and for all to cover every local device that advertises having a volume control — the TV, the DVD player, the media players on the laptop, …

• Your Web-enabled microwave oven consults the frozen-food manufacturer's Web site for optimal cooking parameters

Page 72: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 72

OWL Delivers Ontologies that Work

on the Web• What's needed next is a way to develop

domain specific vocabularies• An ontology defines the terms used to

describe and represent an area of knowledge

• Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them, making that knowledge reusable

Page 73: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 73

OWL = Web Ontology Language

• For defining structured, Web-based ontologies enabling richer integration and interoperability of data among descriptive communities

• Uses URIs for naming • Uses RDF and RDF Schema for description• Adds vocabulary for describing relations

between classes (e.g. disjointness), cardinality (e.g. "exactly one"), characteristics of properties (e.g. symmetry)

Page 74: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 74

Semantic Web Layers

Page 75: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 75

Semantic Web Layers• The Unicode and URI layers make sure

that we use international character sets and provide means for identifying the objects in the Semantic Web

• The XML layer with namespaces and schema definitions make sure we can integrate the Semantic Web definitions with other XML-based standards

Page 76: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 76

Semantic Web Layers• RDF and RDFSchema make it possible to

make statements about objects with URIs and define vocabularies that can be referred to by URIs

• RDFSchema defines the XML vocabulary for defining classes, subclasses, properties and subproperties

• The Ontology layer (OWL) supports the evolution of vocabularies as it can define relations between the different concepts

Page 77: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 77

Semantic Web Layers• The top layers, Logic, Proof and Trust,

are “under development”• The Logic layer will enable the writing

of rules• The Proof layer will execute the rules • The Trust layer together with the Digital

Signature layer will provide mechanisms for applications to determine whether to trust the given proof or not

Page 78: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 78

Semantic Web Layers

RFC

Standard

Standard

Standard

Work in Progress

Page 79: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 79

Next Assignments• Full paper due Tuesday March 8th

• Project Proposal due Tuesday March 8th

Page 80: COMS E6125 Web-enHanced Information Management (WHIM)

February 22, 2011 COMS 6125 80

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

Prof. Gail KaiserProf. Gail Kaiser

Spring 2011Spring 2011