web semantic & mining

50
Company LOGO The Semantic Web Mining The Next Evolution of the WWW

Upload: mohammed-al-haj

Post on 06-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 1/50

Company

LOGO

The Semantic Web MiningThe Next Evolution of the WWW

Page 2: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 2/50

Overview

What is the Semantic Web?

Background

Components of the Semantic Web

Why the Semantic Web is needed

Machine Learning & the Semantic Web What is Text Mining?

Mining the Web

How Is All This Related to the Semantic Web?

Mining the Semantic Web

Uses of the Semantic Web

Implementing the Semantic Web

Examples

2

Page 3: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 3/50

Dream

I have a dream for the Web [in which computers]

become capable of analyzing all the data on the Web ±

the content, links, and transactions between people and

computers. A µSemantic Web¶, which should make this

possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily

lives will be handled by machines talking to machines.

The µintelligent agents¶ people have touted for ages will

finally materialize.

 ± Tim Berners-Lee, 1999

3

Page 4: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 4/50

Computers don¶t understand Meaning

³My mouse is broken. I need anew one«´ 

³My mouse is broken´ vs. ³My mouse is dead ́

4

Page 5: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 5/50

What is the Semantic Web?

The Semantic Web is a group of methodsand technologies to allow machines tounderstand the meaning - or "semantics" - of 

information on the World Wide Web.

7

Before Semantic Web

Web content

UsersCreatorsWWWand

Beyond

8

Semantic Web Structure

Semantic

AnnotationsOntologies Logical Support

Languages ToolsApplications /

Services

Web content

UsersCreatorsWWW

and

Beyond

Semantic

Web

5

Page 6: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 6/50

What is the Semantic Web? (cont)

 A framework that:

} Adds meaning to data

} Provides a mechanism for organizing,

interpreting, and making use of that meaning

The Semantic Web is "an extended web of machine-readable information and automatedservices that amplify the Web far beyond

current capabilities" (Daconta et al., 2003)

6

Page 7: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 7/50

What is the Semantic Web? (cont)

 An enhancement to the current Web, not a

replacement

³The Semantic Web will bring structure to the

meaningful content of Web pages, creating an

environment where software agents roaming

from page to page can readily carry outsophisticated tasks for users´ (Berners-Lee et

al., 2001)

7

Page 8: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 8/50

Background

1968 ± Internet used as a communicationsnetwork by DOD

1989 ± Tim Berners-Lee (and others) at

CERN develop HTML from SGML Early 1990s ± Web browsers created to

interpret HTML

1996 ± XML developed

1990s+ ± Tim Berners-Lee & W3Ccontinue to pursue development theSemantic Web

8

Page 9: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 9/50

Components of the Semantic Web

Four major components:

} XML

} Resource Description Framework (RDF)

} Ontologies

} Agents

9

Page 10: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 10/50

Supplemental Components of the

Semantic Web

Supplemental components

} Uniform Resource Identifiers (URIs)

} Web services

} Inference rules

} Service discovery

} Semantic aware applications

} Security and trust} XML and RDF schemas

10

Page 11: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 11/50

11

Page 12: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 12/50

12

Page 13: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 13/50

XML

HTML (XHTML) is a series of predefined

tags that add presentation to data

<b>This text is bold</b>

XML is a series of user-defined tags that

add information and structure to data

<author>John Smith</author>

13

Page 14: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 14/50

XML (cont)

"XML has become the universal syntax for exchanging data between organizations"

(Daconta et al., 2003)

Issue:Some mechanism must exist for coordinating the

meaning of the user-defined tags and for understanding the context of that information

Company A: <name>Smith</name>Company B: <employee>Jones</employee>Company C: <name>Williams</name>

14

Page 15: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 15/50

Resource Description Framework

(RDF)

 An XML-based language used to describe

resources

Resources can include entities, concepts,

properties and relations

Captures the meta data about the

³externals´ of a document

Can use a serialized model, RDF triplets,special notation, or graphs to describe

data

15

Page 16: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 16/50

Resource Description Framework

(RDF) (cont)

RDF triplet (subject, predicate, object/literal):

Subject

Object

Literal

Predicate

Predicate

The company sells software

The company is named Microsoft

John Smith is the president of Company X

Company

Software

Microsoft

sells

Is named

16

Page 17: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 17/50

Ontologies

Provide the repositories for meaning

interpretations

Provide a mechanism for defining therelationship among different words and for 

the Semantic Web, relationships among

different resources

³the common words and concepts (the

meaning) used to describe and represent an

area of knowledge" (Daconta et al., 2003)

17

Page 18: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 18/50

Ontologies (cont)

Consist of:

} Taxonomies

³An organized set of terms.´ (McComb, 2004)

 A classification and a tree (Daconta et al., 2003) Hierarchal, tree-like structures similar to

organizational charts

} Sets of inference rules

Used to organize semantics

18

Page 19: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 19/50

Taxonomy

Object

Person Topic Document

Researcher Student Semantics

OntologyDoctoral Student

Taxonomy := Segmentation, classification and ordering of 

elements into a classification system according to their 

relationships between each other 

PhD Student F-Logic

Menu

19

Page 20: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 20/50

Thesaurus

Object

Person Topic Document

Researcher Student Semantics

PhD StudentDoktoral Student

Terminology for specific domain

Graph with primitives, 2 fixed relationships (similar, synonym)

originate from bibliography

similar synonym

OntologyF-Logic

Menu

20

Page 21: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 21/50

Topic Map

Object

Person Topic Document

Researcher Student Semantics

PhD StudentDoktoral Student

knows described_in

writes

 AffiliationTel

Topics (nodes), relationships and occurences (to documents)

ISO-Standard

typically for navigation- and visualisation

OntologyF-Logic

similar synonym

Menu

21

Page 22: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 22/50

OntologyF-Logic

similar 

OntologyF-Logic

similar 

PhD StudentDoktoral Student

Ontology (in our sense)

Object

Person Topic Document

Tel

PhD StudentPhD Student

Semantics

knows described_in

writes

 Affiliationdescribed_in is_about

knowsP writes D is_about T P T

DT T D

Rules

subTopicOf 

Representation Language: Predicate Logic (F-Logic)

Standards: RDF(S); coming up standard: OWL

Researcher Student

instance_of 

is_a

is_a

is_a

 Affiliation

 Affiliation

York Sure

 AIFB+49 721 608 6592

22

Page 23: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 23/50

Agents

 Also known as software agents

Provide automation services

Should not be designed to replace

humans or to make decisions

 Automated agents to perform tasks for 

users of the semantic web using this data

23

Page 24: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 24/50

24

Page 25: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 25/50

25

Page 26: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 26/50

26

Page 27: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 27/50

27

Page 28: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 28/50

28

Page 29: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 29/50

29

Page 30: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 30/50

What Machine Learning Can

Do for the Semantic Web

Upgrading the current web to a semanticweb involves a lot of work

Can partially be automated!

Examples:} Learning ontologies

} Automatic document classification

} Information integration

} ...

30

Page 31: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 31/50

Learning Ontologies

View:} Manually creating of ontologies is very labour-

intensive

} Fully automating creating of ontologies is not feasible

} Hence: develop tool that helps building ontologies Basic components:

} Good graphical interface (interaction man-machine)

} Powerful underlying machine learning techniques

31

Page 32: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 32/50

Some Useful Techniques

for Learning Ontologies

Term extraction from texts} Identification of concepts

Hierarchical Clustering} Clustering: finding groups of ³similar´ things

} Hierarchical clustering: clusters of clusters

} Taxonomy can be constructed through hierarchicalclustering of concepts

 Association rules} Find sets of terms that often occur together } May indicate important relations

E.g., events in texts often co-occur with locations

32

Page 33: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 33/50

33

What is Text Mining?

Text mining is about knowledge discovery fromlarge collections of unstructured text.

Its not the same as data mining, which is moreabout discovering patterns in structured data

stored in databases. Similar techniques are sometimes used,

however text mining has many additionalconstraints caused by the unstructured nature ofthe text and the use of natural language.

Page 34: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 34/50

Mining the Web

 Analyze data that are available on the Web

Distinguish 3 types:

} Web content mining

Look in contents of documents (text, ...)

} Web structure mining

Look at links between documents

} Web usage mining

Look at user logs (e.g. who accessed a web page, which

links often used, ...)

34

Page 35: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 35/50

How Is All This Related to

the Semantic Web?

} Machine learning can help with building the

Semantic Web

} The Semantic Web will help mining the Web,

making Web interfaces and agents moreintelligent

35

Page 36: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 36/50

What the Semantic Web

Can Do for Web Mining

Will make mining the web much easier 

Reason 1: removal of ambiguity

} More precise knowledge of what is meant with certain

terms

Reason 2: structured vs. unstructured data

} Learning from structured data is much easier than

from unstructured data

Reason 3: availability of background knowledge} Can be used to make better decisions when learning

36

Page 37: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 37/50

Removal of Ambiguity

Example: text document classification} E.g., given a text, tell in which newsgroups it belongs

Typical approaches: ³bag of words´} Look only at which words occur, in the text, and how

often} Each time a word occurs that occurs mainly in one

particular class, increase probability for that class

} But words are ambiguous!

} Increased classification accuracy can be expected byremoving ambiguity

37

Page 38: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 38/50

Mining From (Un)structured Data

Mining data = intensively querying data

 Answering a querying is

} Easy in structured data

Relational database, XML, ...} Harder in semi-structured data (e.g., HTML)

} Hard in unstructured data Information extraction needed

Could do this by learning a ³wrapper´ This involves one extra layer of learning

38

Page 39: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 39/50

Availability of Background Knowledge

Learning = finding relevant patterns in behaviour 

Important to have the right context to describethese patterns

Example:} Making interesting offers to clients

} ³People who bought this book also bought ...´

} = ³Instance-based´ learning Estimate profile of user 

Find users with similar profile Look at behaviour of those users to help current user 

39

Page 40: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 40/50

Availability of Background Knowledge

Can work better if more background knowledgeis available, e.g., type of book, author, ...} For instance, for books:

³similar profile´ = users that up till now bought same books as

this user } May not be many people

³similar´ = often bought books by same author 

} Probably many more people, allows for more reasonable guess

³similar´ = often bought books of same genre (fiction, ...)

} May work even better 

Ontologies (among other) provide suchbackground knowledge

40

Page 41: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 41/50

Web Mining Revisited

Semantic Web will change

} Content mining

Clearer view on contents and meaning of documents

} Structure mining

More relevant structure

} Usage mining

More relevant information on actions of user 

Will in general improve intelligence of systems

} E.g. mail filter gets a better view of contents of mails

41

Page 42: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 42/50

Mining the Semantic Web

Knowledge base

Hotel: Wellnesshotel

GolfCourse: Seaview

belongsTo(Seaview,

Wellnesshotel)

...

 Association

Rule Mining

Hotel( x ), GolfCourse(y ), belongsTo(y, x ) p hasStars( x ,5)

support = 0.4 % confidence = 89 %

belongsTo

FORALL X, Y

 Y: Hotel[cooperatesWith ->> X] <-

X:ProjectHotel[cooperatesWith ->> Y].

GolfCourse

Organization

Hotel

name

cooperatesWith

Ontology

42

Page 43: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 43/50

Semantic Web Usage Mining

 p3ee24304.dip.t-dialin.net - - [19/Mar/2002:12:03:51 +0100]

"GET /search.html?l=ostsee%20strand &syn=023785&ord=asc HTTP/1.0" 200 1759

 p3ee24304.dip.t-dialin.net - - [19/Mar/2002:12:05:06 +0100]

"GET /search.html?l=ostsee%20strand & p=low&syn=023785&ord=desc HTTP/1.0" 200 8450 p3ee24304.dip.t-dialin.net - - [19/Mar/2002:12:06:41 +0100]

"GET /mlesen.html?Item=3456&syn=023785 HTTP/1.0" 200 3478

Search by

Location

Search by

Location

and Price

R efine

search

Choose

item

Look at individual

Hotel.

From logfile analysis ...

... to semantic logfile analysis:

Basic idea: associate each requested page with one or more ontological entities,to better understand the process of navigation

[Berendt & Spiliopoulou 2000; Berendt 2002; Oberle 2003]

Use the gained knowledge to

understand search strategies

improve navigation design

personalization

43

Page 44: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 44/50

Text Document Clustering of 

Crawled Documents

WWW

Explanation

Clustering

Focused Crawling

44

Page 45: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 45/50

Uses of the Semantic Web

Improve e-business processes

Improve business-to-business (B2B)communication

³assist human users in their day-to-day onlineactivities´ (Antoniou & van Harmelen, 2004)

³build knowledge and understanding from raw data´(Daconta et al., 2003)} Improve knowledge management

} Improve information retrieval

}  Automate tasking

} Integrate data

} Maximize customer value and profits

45

Page 46: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 46/50

Implementing the Semantic Web

Convert data to XML format according to definedXML schemas

Expose applications as Web services

Build ontologies that specify semantic meaningsand the relationships between data

Create agents that make use of the semantic data,automate search processes, and automate other business processes

46

I C d ith

Page 47: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 47/50

Issues Concerned with

Implementing the Semantic Web

Cost

Security

Nonstandard technology issues

Semantic precision

47

Page 48: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 48/50

 Any Questions?

48

Page 49: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 49/50

References

 Antoniou, G., & van Harmelen, F. (2004).  A semantic Web primer .Cambridge, MA: The MIT press.

 Athauda, R. I. (2000). Integration and querying of heterogeneous,autonomous, distributed database systems (Vol. 61/06, pp. 3126):Florida International University.

Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web.

S cientific  American, 284(5

), 34

-4

3. Carey, P., & Kemper, M. (2003). N ew perspectives on creating Web

 pages with HTML and Dynamic HTML (2nd ed.). Boston: CourseTechnology.

Daconta, M. C., Obrst, L. J., & Smith, K. T. (2003). The S emantic Web: A guide to the future of XML, Web services, and knowledgemanagement . Indianapolis, IN: Wiley Publishing, Inc.

Ewalt, D. M. (2002, October 14). Semantic Web. InformationWeek, 35-44.

Galitz, W. O. (2002). The essential guide to user interface design. NewYork: John Wiley & Sons, Inc.

49

Page 50: Web Semantic & Mining

8/3/2019 Web Semantic & Mining

http://slidepdf.com/reader/full/web-semantic-mining 50/50

References

Gould, M. (1996). Rules in the virtual society. International R eview of Law, Computers & Technology, 10 (2), 199-218.

Kalakota, R., & Robinson, M. (2001). e-Business 2.0: R oadmap for success. Upper Saddle River, NJ: Addison-Wesley.

Lexico Publishing Group, L. (2004). Inference. Retrieved December 7,2004, from http://dictionary.reference.com/search?q=inference

McComb, D. (2004). S emantics in business systems: The savvy manager's guide. San Francisco, CA: Morgan Kaufmann Publishers.

Tiwana, A. (2002). The knowledge management toolkit . Upper SaddleRiver, NJ: Prentice Hall PTR.

Warren, P. (2003). The next steps for the WWW: Putting meaning intothe Web. Computing & Control Engineering, 14(2), 27-31.

Young, M. J. (2002). XML step by step (2nd ed.). Redmond, WA:Microsoft Press.

50