coms e6125 web-enhanced information management (whim)

Post on 31-Dec-2015

22 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

COMS E6125 Web-enHanced Information Management (WHIM). Prof. Gail Kaiser Spring 2011. Today’s Topic:. Introduction to the Semantic Web RDF Ontologies. Simplicity is Good. - PowerPoint PPT Presentation

TRANSCRIPT

February 22, 2011 COMS 6125 1

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

Prof. Gail KaiserProf. Gail Kaiser

Spring 2011Spring 2011

February 22, 2011 COMS 6125 2

Today’s Topic:

• Introduction to theSemantic Web

• RDF• Ontologies

February 22, 2011 COMS 6125 3

Simplicity is Good• The World Wide Web contains huge amounts

of information created by many different organizations, communities and individuals for many different reasons

• Web users can easily access this information by specifying a known URL or using a search engine, and following links to find other related resources

• This simplicity is a key aspect that made the Web so popular

February 22, 2011 COMS 6125 4

Simplicity is Bad• The simplicity of the current Web has a price• It is very easy to get lost, or discover irrelevant

or unrelated information• For instance, if we search for courses taught by

a person named “Gail Kaiser”, we might find all kinds of other information

• http://www.google.com/search?hl=&q=course+taught+by+gail+kaiser&sourceid=navclient-ff&rlz=1B3GGGL_enUS253US253&ie=UTF-8

• The problem is that the search engine does know what “courses” or “taught” means

February 22, 2011 COMS 6125 5

Machine accessible meaning

(What it’s like to be a machine)

CV

name

education

work

private

February 22, 2011 COMS 6125 6

So what does this mean?

• What’s a “CV”?• What’s a “name”?• Etc.Need semantics

February 22, 2011 COMS 6125 7

What to do?• Develop enabling standards and

technologies – to help machines understand more

information on the Web – so that they can support richer

discovery, data integration, navigation and automation of tasks

February 22, 2011 COMS 6125 8

Add Metadata• Associate semantically rich, descriptive

information with any resource• For instance, add metadata about

teaching, so we can search for documents that have metadata specifying “Gail Kaiser” as a “teacher” (or “instructor”)

February 22, 2011 COMS 6125 9

The Semantic Web• Provides a common framework that allows

data to be shared and reused across application, enterprise and community boundaries

• Not only provides URLs for documents, but to people, concepts and relationships

• By giving unique identifiers to the person, the role “teacher” and the concept of “course”, we make very clear who the person is and the corresponding relation between this person and a particular document

February 22, 2011 COMS 6125 10

What’s the difference?• Most Web content today is designed for humans

to read, not for computer programs to manipulate meaningfully

• Computers can adeptly parse Web pages for layout and routine processing—here a header, there a link to another page—but in general, computers have no reliable way to process the semantics

• The Semantic Web brings structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can carry out sophisticated tasks for users

February 22, 2011 COMS 6125 11

What’s the difference?

The Semantic Web is not a separate web but an extension of the current web, in which information is given well-defined meaning, better enabling computers and people to work in co-operation.

[Berners-Lee et al., 2001]

February 22, 2011 COMS 6125 12

Wasn’t that what XML was supposed to do?

• Yes and no• For the Semantic Web to function,

computers must have access to structured collections of information and to sets of inference rules that they can use to conduct automated reasoning

February 22, 2011 COMS 6125 13

Isn’t that just Knowledge

Representation?• Traditional knowledge representation

systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as “parent” or “vehicle”

• But central control is stifling, and doesn’t scale

• Which is why centralized hypertext link servers were abandoned for WWW

February 22, 2011 COMS 6125 14

What about Web Services?

• Web services are computational programs accessed using Web technologies

• They may or may not operate on Web pages as data

• But when they do, the semantics are implied by WSDL descriptions but basically hidden inside the code

• There is no way for an arbitrary Web service or other program to “understand” the semantics of Web pages

Semantic Web Layers(T. Berners-Lee)

15

February 22, 2011 COMS 6125 16

Start with XML, not HTML

<H1>WHIM</H1><UL>

<LI>Instructor: Gail Kaiser<LI>Students: Donald Duck

</UL>

<H1>WHIM</H1><UL>

<LI>Instructor: Gail Kaiser<LI>Students: Donald Duck

</UL>

HTML:

<course date=“Spring 2011”><title>WHIM</title><instructor>Gail Kaiser</instructor><students>Donald Duck</students>

</course>

<course date=“Spring 2011”><title>WHIM</title><instructor>Gail Kaiser</instructor><students>Donald Duck</students>

</course>

XML:

February 22, 2011 COMS 6125 17

XML document = labeled tree

course

instructortitle students

name http

<course date=“...”> <title>...</title> <instructor>...</instructor>

<name>...</name><http>...</http>

<students>...</students></course>

=

• XML Schema: grammars for describing legal trees and datatypes

• node = label + attr/values + contents

February 22, 2011 COMS 6125 18

Why not use XML Tags to represent Semantics?

• Syntax: the structure of your data • Semantics: the meaning of your data• Two conditions necessary for

interoperability:– Adopt a common syntax: enables applications

to parse the data – Adopt a means for understanding the

semantics: enables applications to use the data

February 22, 2011 COMS 6125 19

XML and Semantics?<title> … <title>• But what does “title” mean?• If we ask google, we get (on the 1st page)

– Boxing and martial arts equipment– Prefix or suffix added to person’s name– HTML tag– Women’s underwear– US Laws– Home purchase insurance– Library search

February 22, 2011 COMS 6125 20

XML Limitations for Semantic Markup

• XML makes no commitment on: Domain-specific vocabulary Modeling primitives

• Requires pre-arranged agreement on &

• Only feasible for closed collaboration– agents in a small & stable community– pages on a small & stable intranet

• Not suited for sharing Web resources

February 22, 2011 COMS 6125 21

XML machine accessible meaning

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

February 22, 2011 COMS 6125 22

Beyond XML• XML lets everyone create their own

tags • Scripts, or programs, can make use of

these tags in sophisticated ways - but the programmer has to know what the page writer uses each tag for

• XML allows users to add structure to their documents but says nothing about what the structures mean

February 22, 2011 COMS 6125 23

Semantic Web Layers

February 22, 2011 COMS 6125 24

Add RDF = Resource Description Framework• Encodes meaning in sets of triples - subject,

predicate and object - analogous to the subject, verb and object of an elementary sentence

• Makes assertions that particular things (people, Web pages or whatever) have properties (such as “is a sister of”, “is the author of”) with certain values (another person, another Web page)

• This structure can describe much of the data processed by machines

February 22, 2011 COMS 6125 25

Example• Imagine that we want to state the fact

that someone named Gail Kaiser wrote a particular Web page

• A straightforward way to state this in English would be in the form of a simple statement such as:

http://www.cs.columbia.edu/~kaiser/index.html has an author whose value is Gail Kaiser

February 22, 2011 COMS 6125 26

Making Statements about Resources

• We need a way to identify the thing we want to describe (the Web page)

• We need a way to identify a specific property (author) of the thing that we want to describe

• We need a way to identify the thing we want to assign as the value of this property (who the author is), for the thing we want to describe

February 22, 2011 COMS 6125 27

Making Statements about Resources

• In the example, we used the Web page's URL (Uniform Resource Locator) to identify it - subject

• We used the word “author” to identify the property we want to talk about - predicate

• And the phrase “Gail Kaiser” to identify the thing (a person) we want to say is the value of this property - object

February 22, 2011 COMS 6125 28

Many Statements can be made

• We could state other properties of this Web page by writing additional English statements of the same general form

http://www.cs.columbia.edu/~kaiser/index.html has a modification-date whose value is January 07, 2011

http://www.cs.columbia.edu/~kaiser/index.html has a size whose value is 18,985 bytes

February 22, 2011 COMS 6125 29

But what do these Statements actually

mean?• Subject and object can each be identified by a

URL, just as used in a link on a Web page• The verbs – predicates – can also be identified

by URLs, which enables anyone to define a new concept, a new predicate, just by defining a URL for it somewhere on the Web (a “Web resource”)

• The URLs ensure that concepts are not just words in a document, but are tied to a unique definition that everyone can find on the Web

February 22, 2011 COMS 6125 30

Web Resources• RDF is a language for representing

information about resources on the World Wide Web

• It is particularly intended for representing metadata about Web resources, such as the title, author, modification date and size of a Web page

February 22, 2011 COMS 6125 31

Generalized Resources• By generalizing the concept of a “Web

resource”, RDF can be used to represent information about things that can be identified on the Web, even when they can't be directly retrieved on the Web

• Examples include the author of the web page

February 22, 2011 COMS 6125 32

Reconsider Examplehttp://www.cs.columbia.edu/~kaiser/

index.html has an author whose value is Gail Kaiser

Neither the notion of a “author” nor Gail Kaiser can be retrieved from the Web

Thus we need URIs in addition to URLs

February 22, 2011 COMS 6125 33

Concept Graphs• RDF is based on the idea of identifying

things using URIs• And describing resources (subjects) in

terms of simple properties (verbs or predicates) and property values (objects)

• This enables RDF to represent related concepts as a graph of nodes and arcs representing the resources, their properties and values

February 22, 2011 COMS 6125 34

Concept Graph Example• XML syntax• Chained triples form a graph

http://bank.cs.columbia.edu/classes/cs6125/

site-owner

Kaiserkaiser+6125@...

emailW3C

describes

http://www.w3.org/RDF

site-owner

<rdf:Description rdf:about=“#Kaiser”> <email>kaiser+6125@...</email></rdf:Description>

February 22, 2011 COMS 6125 35

Information Exchange• RDF provides a common framework for expressing

this information so it can be exchanged between applications without loss of meaning

• The ability to exchange information between different applications means that the information may be made available to applications other than those for which it was originally created

• Application designers can leverage the availability of common RDF parsers and processing tools

• RDF is written in XML format further leveraging XML tools and experience

February 22, 2011 COMS 6125 36

What is RDF (again) ?• RDF is a data model

– the model is domain-neutral and application-neutral

– the model can be viewed as directed, labeled graphs or as an object-oriented model (object/attribute/value)

• RDF data model is an abstract, conceptual layer independent of XML

• consequently, XML is a transfer syntax for RDF, not a component of RDF

• RDF data might never occur in XML form

February 22, 2011 COMS 6125 37

RDF Model

• RDF “statements” consist ofresources (= nodes)

which have propertieswhich have values (= nodes,strings)

= subject= predicate= object

February 22, 2011 COMS 6125 38

RDF Model

http://www.w3.org/TR/REC-rdf-syntax/

“Dave Beckett”

editor

“http://www.w3.org/TR/REC-rdf-syntax/ has the editor Dave Beckett”

resource valueproperty

February 22, 2011 COMS 6125 39

RDF Model Example

http://www.w3.org/TR/REC-rdf-syntax/

“Dave Beckett”

dc:Creator

“2004-02-10”

dc:Date

“W3C”

dc:Publisher

February 22, 2011 COMS 6125 40

Complex Values• So far, values of properties have been

strings• A graph node (corresponding to a resource)

also can be the value of a property–arbitrarily complex tree and graph structures are possible

–syntactically, values can be embedded (i.e., lexically in-line) or referenced (linked)

February 22, 2011 COMS 6125 41

Complex Values

http://www.w3.org/TR/REC-rdf-syntax/

“Dave Beckett”

dc:Creator

“mailto:dave@dajobe.org”

p:EMail

p:Name

February 22, 2011 COMS 6125 42

Complex Values• Corresponding triples

{ “http://www.w3.org/TR/REC-rdf-syntax/”, dc:Creator, x }

{ x, p:Name, “Dave Beckett” }{ x, p:EMail, “dave@dajobe.org” }

http://www.w3.org/TR/REC-rdf-syntax/

“Dave Beckett”

dc:Creator

“mailto:dave@dajobe.org”

p:EMail

p:Name

February 22, 2011 COMS 6125 43

Containers• Containers are collections - allow grouping of

resources (or literal values)• It is possible to make statements about the

container (as a whole) or about its members individually Different types of containers– bag - unordered collection– seq - ordered collection (= “sequence”)– alt - represents alternatives

• It is possible to create collections based on URI patterns – e.g., all files in a particular web site

• Duplicate values are permitted - no mechanism to enforce unique value constraints

February 22, 2011 COMS 6125 44

Containers

http://www.w3.org/TR/REC-rdf-syntax

“Dave Beckett”

rdf:_1

rdf:Seq

dc:Creator

rdf:Type

“Brian McBride”

rdf:_2

February 22, 2011 COMS 6125 45

Higher-order Statements• One can make RDF statements about other RDF

statements• Example: “The Library of Congress affiliates Dave

Beckett as the author of the RDF Syntax spec”• Allow us to express beliefs (and other modalities)• Important for trust models, digital signatures, etc.• Constitute metadata about metadata• Represented by modeling RDF in RDF itself

Reification

http://www.w3.org/TR/REC-rdf-syntax “Dave Beckett”dc:Creator

“Library of Congress”

dc:Creator

• The dotted box corresponds to the following statements

• { x, rdf:predicate, “dc:creator” }• { x, rdf:subject, “http://www.w3.org/TR/REC-rdf-syntax }• { x, rdf:object, “Dave Beckett” }• { x, rdf:type, “rdf:statement” }

February 22, 2011 46COMS 6125

Reification• Reification allows a computer to process an

abstraction as if it were any other datum • RDF is not really second-order• But it does provide a built-in predicate

vocabulary for reification

February 22, 2011 47COMS 6125

February 22, 2011 COMS 6125 48

Reification

pers05 ISBN...Author-of

NYT claims

<rdf:Description rdf:about=“#NYT”> <claims> <rdf:Description rdf:about=“#pers05”> <authorOf>ISBN...</authorOf> </rdf:Description> </claims></rdf:Description>

Any statement can be an object (graphs can be nested)

49

RDF Schema • Defines small vocabulary for RDF:

• Class, subClassOf, type• Property, subPropertyOf• domain, range

• Organizes this vocabulary in a typed hierarchy

• Vocabulary can be used to define other vocabularies for your application domain

Person

Student Researcher

subClassOfsubClassOf

type

hasSuperVisordomain range

Swap

type

hasSuperVisor Gail

February 22, 2011 COMS 6125 50

<rdf:Description ID="MotorVehicle"> <rdf:type resource="http://www.w3.org/...#Class"/> <rdf:subClassOf rdf:resource="http://www.w3.org/...#Resource"/></rdf:Description>

<rdf:Description ID="Truck"> <rdf:type resource="http://www.w3.org/...#Class"/> <rdf:subClassOf rdf:resource="#MotorVehicle"/></rdf:Description>

<rdf:Description ID="registeredTo"> <rdf:type resource="http://www.w3.org/...#Property"/> <rdf:domain rdf:resource="#MotorVehicle"/> <rdf:range rdf:resource="#Person"/></rdf:Description>

<rdf:Description ID=”ownedBy"> <rdf:type resource="http://www.w3.org/...#Property"/> <rdf:subPropertyOf rdf:resource="#registeredTo"/></rdf:Description>

RDF Schema syntax in XML

February 22, 2011 COMS 6125 51

Conclusions about RDF• Next step up from plain XML

– modeling primitives– possible to define vocabulary

• However:– no precisely described meaning– no inference model

• Problematic examples: • “Columbus believed that the world is flat”• “Gloria believes that the Web should be delivered

on CD-ROM”

February 22, 2011 COMS 6125 52

Where do we get the precisely defined

meaning?• Two databases may use different identifiers for

the same concept, such as zip code vs. postal code

• A program that wants to compare or combine information across the two databases has to know that these two terms mean the same thing

• The program must have a way to discover such common meanings for whatever databases it encounters

• A solution to this problem is provided by collections of information called ontologies

February 22, 2011 COMS 6125 53

Semantic Web Layers

February 22, 2011 COMS 6125 54

What is an Ontology?

• In philosophy, an ontology is a theory about the nature of existence, of what types of things exist; ontology as a discipline studies such theories

• Semantic Web researchers (and various other communities) have co-opted the term for their own jargon

• For Semantic Web researchers, an ontology is a document or file that formally defines the relationships among terms

• The most typical kind of ontology for the Web has a taxonomy and a set of inference rules

February 22, 2011 COMS 6125 55

What is a Taxonomy?

Taxonomy = segmentation, classification and ordering of elements into a classification system according to the relationships between each other

Object

Person Topic Document

ResearcherStudent Semantics

OntologyDoctoral Student PhD Student F-Logic

Menu

February 22, 2011 COMS 6125 56

Taxonomies• A taxonomy defines classes of objects and

relations among them• For example, an address may be defined as a

type of location, and city codes may be defined to apply only to locations

• If city codes must be of type city and cities generally have Web sites, we can discuss the Web site associated with a city code even if no database links a city code directly to a Web site

February 22, 2011 COMS 6125 57

An Ontology also provides a form of Thesaurus

Object

Person Topic Document

Researcher

Student

Semantics

PhD StudentDoctoral Student

• Terminology for specific domain• Graph with primitives, fixed relationships (similar, synonym)

similarsynonym

OntologyF-Logic

Menu

February 22, 2011 COMS 6125 58

An Ontology also provides a Topic Map

• Topics (nodes), relationships and occurrences (to documents)• Useful for navigation and visualization

Object

Person Topic Document

ResearcherStudent Semantics

PhD StudentDoctoral Student

knows described_in

writes

AffiliationTel

OntologyF-Logic

similarsynonym

Menu

OntologyF-Logic

similar

PhD StudentDoctoral Student

The Taxonomy is Augmented by Inference Rules

Object

Person Topic Document

Tel

Semantics

knows described_in

writes

Affiliation

described_in is_about

knowsP writes D is_about T P T

DT T D

Rules

ResearcherStudent

instance_of

is_a

is_a

is_a

Swapneel Sheth

59

February 22, 2011 COMS 6125 60

Inference Rules• An ontology may express the rule “If a city code is

associated with a state code, and an address uses that city code, then that address has the associated state code”

• A program could then deduce, for instance, that a Columbia University address, being in New York City, must be in New York State, which is in the U.S., and therefore should be formatted to U.S. standards

• The computer doesn't truly “understand” any of this information

• But it can now manipulate the terms much more effectively in ways that are useful and meaningful to the human user

February 22, 2011 COMS 6125 61

Solution to Terminology Problems

• The meaning of terms or XML tags used on a Web page can be defined by pointers from the page to an ontology

• The same problems as before now arise if I point to an ontology that defines addresses as containing a zip code and you point to one that uses postal code

• This can be resolved if ontologies (or other Web services) provide equivalence relations: one or both of our ontologies may contain the information that my zip code is equivalent to your postal code

February 22, 2011 COMS 6125 62

Using Ontologies• Ontologies can be used in a simple fashion to

improve the accuracy of Web searches• The search program can look for only those

pages that refer to a precise concept instead of all the ones using ambiguous keywords

• More advanced applications could use ontologies to relate the information on a page to the associated knowledge structures and inference rules

February 22, 2011 COMS 6125 63

Example• Suppose you wish to find the Ms. Cook

you met at a trade conference last year• You don't remember her first name, but

you remember that she worked for one of your clients and that her brother was a student at your alma mater

February 22, 2011 COMS 6125 64

Example• An intelligent search program can sift

through all the pages of people whose name is “Cook”

• Sidestep all the pages relating to cooks, cooking, the Cook Islands and so forth

• Find the person named Cook who works for a company that's on your client list

• And follow links to Web pages of their relatives to track down if any are in school at the right place

February 22, 2011 COMS 6125 65

Agents• The real power of the Semantic Web will be

realized when people create (many) programs that collect Web content from diverse sources, process the information and exchange the results with other programs

• The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available

February 22, 2011 COMS 6125 66

Proofs• The Semantic Web promotes this

synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data comes with semantics

• An important facet of agents' functioning will be the exchange of “proofs”

February 22, 2011 COMS 6125 67

Example• Suppose Ms. Cook's contact information

has been located by an online service, and places her in Baghdad

• You want to check this, so your computer asks the service for a proof of its answer

• An inference engine on your computer verifies this proof, i.e., that this Ms. Cook indeed matches the one you were seeking, and it can show you the relevant Web pages if you still have doubts

February 22, 2011 COMS 6125 68

Service Discovery

• Many automated Web-based services already exist without semantics

• But current service discovery initiatives attack the problem at a structural or syntactic level, and rely heavily on standardization of a predetermined set of functionality descriptions

February 22, 2011 COMS 6125 69

Service Discovery• Other programs such as agents have no way to

locate a service that will perform a specific function• This process can happen only when there is a

common language to describe a service in a way that lets other agents “understand” both the function offered and how to take advantage of it

• The consumer and producer agents can reach a shared understanding by exchanging ontologies, which provide the vocabulary needed for discussion

• Semantics also makes it easier to take advantage of a service that only partially matches a request

February 22, 2011 COMS 6125 70

Non-Web Applications• The Semantic Web can extend into our

physical world• URIs can point to anything, including physical

entities, which means we can use RDF to describe devices such as cell phones and TVs

• Such devices can advertise their functionality —what they can do and how they are controlled —much like software agents

• Semantic descriptions of device capabilities and functionality will let us achieve “home automation” with minimal human intervention

February 22, 2011 COMS 6125 71

Examples• When you answer your phone, other sound is

automatically turned down– Instead of having to program each specific

appliance, you could program such a function once and for all to cover every local device that advertises having a volume control — the TV, the DVD player, the media players on the laptop, …

• Your Web-enabled microwave oven consults the frozen-food manufacturer's Web site for optimal cooking parameters

February 22, 2011 COMS 6125 72

OWL Delivers Ontologies that Work

on the Web• What's needed next is a way to develop

domain specific vocabularies• An ontology defines the terms used to

describe and represent an area of knowledge

• Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them, making that knowledge reusable

February 22, 2011 COMS 6125 73

OWL = Web Ontology Language

• For defining structured, Web-based ontologies enabling richer integration and interoperability of data among descriptive communities

• Uses URIs for naming • Uses RDF and RDF Schema for description• Adds vocabulary for describing relations

between classes (e.g. disjointness), cardinality (e.g. "exactly one"), characteristics of properties (e.g. symmetry)

February 22, 2011 COMS 6125 74

Semantic Web Layers

February 22, 2011 COMS 6125 75

Semantic Web Layers• The Unicode and URI layers make sure

that we use international character sets and provide means for identifying the objects in the Semantic Web

• The XML layer with namespaces and schema definitions make sure we can integrate the Semantic Web definitions with other XML-based standards

February 22, 2011 COMS 6125 76

Semantic Web Layers• RDF and RDFSchema make it possible to

make statements about objects with URIs and define vocabularies that can be referred to by URIs

• RDFSchema defines the XML vocabulary for defining classes, subclasses, properties and subproperties

• The Ontology layer (OWL) supports the evolution of vocabularies as it can define relations between the different concepts

February 22, 2011 COMS 6125 77

Semantic Web Layers• The top layers, Logic, Proof and Trust,

are “under development”• The Logic layer will enable the writing

of rules• The Proof layer will execute the rules • The Trust layer together with the Digital

Signature layer will provide mechanisms for applications to determine whether to trust the given proof or not

February 22, 2011 COMS 6125 78

Semantic Web Layers

RFC

Standard

Standard

Standard

Work in Progress

February 22, 2011 COMS 6125 79

Next Assignments• Full paper due Tuesday March 8th

• Project Proposal due Tuesday March 8th

February 22, 2011 COMS 6125 80

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

Prof. Gail KaiserProf. Gail Kaiser

Spring 2011Spring 2011

top related