web classification ontology and taxonomy. 2 references using ontologies to discover domain-level web...
Post on 20-Dec-2015
218 views
TRANSCRIPT
![Page 1: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/1.jpg)
Web classification
Ontology and Taxonomy
![Page 2: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/2.jpg)
2
References
Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu
Learning to Construct Knowledge Bases from World Wide Web. {M. Craven, D. DiPasquo, A. Mitchell, K. Nigam, S
Slattery} Carnegie Mellon University-Pittsburg-USA; {D. Freitag A. McCallum} Just Reserch-Pittsburg-USA
![Page 3: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/3.jpg)
3
Definitions
Ontology An explicit formal specification of how to
represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them.
Taxonomy a classification of organisms into groups
based on similarities of structure or origin etc
![Page 4: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/4.jpg)
4
Goal
Capture and model behavioral patterns and profiles of users interacting with a web site.
Why? Collaborative filtering Personalization systems Improve organization and structural of the site Provide dynamic recommendations (www.recommend-
me.com)
![Page 5: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/5.jpg)
5
Algorithm 0 (by Rafa’s brother: Gabriel)
Recommend pages viewed by other users with similar page ranks.
Problems New item problem Doesn’t consider content similarity nor
item-to-item relationships.
![Page 6: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/6.jpg)
6
User session
User session s: <w(p1,s),w(p2,s),..,w(pn,s)> W(pi,s) is a weight in session s, associated with
page pi
Session clusters {cl1, cl2,…} cli is a subset of the set of sessions
Usage profile prcl={<p, weight(p,prcl) : weight(p,prcl) ≥ μ} Weight(p,prcl)=(1/|cl|) *∑w(p,s)
![Page 7: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/7.jpg)
7
Algorithm 11. For every session, create a vector containing
the viewed pages and a weight for each page.2. Each vector represent a point in a N-
dimensional space, so we may identify the clusters.
3. For a new session, check to which cluster this vector/point belongs, and recommend high scores pages of this cluster
Problems New item problem Doesn’t consider content similarity nor item-to-
item relationships
![Page 8: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/8.jpg)
8
Algorithm 2: keyword search Solves new item problem. Not good enough
A page can contain info for more than 1 object. A fundamental data can be pointed by the
page, not included. What exactly is a keyword.
Solution Domain ontologies for objects
![Page 9: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/9.jpg)
9
Domain Ontologies Domain-Level Aggregate Profile: Set of pseudo
objects each characterizing objects of different types occurring commonly across the user sessions.
Class - C Attributes – a: < Da, Ta, ≤a, Ψa>
Ta type of attribute DaDomain of the values for a (red, blue,..) ≤a ordering relation among Da
Ψa combination function
![Page 10: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/10.jpg)
10
Example – movie web site Classes:
movies, actors, directors, etc Attributes:
Movies: title, genre, starring actors Actors: name, filmography, gender, nationality
Functions: Ψactor(<{S,0.7; T, 0.2; U,0.1},1>, <{S,0.5;
T,0.5),0.7>) = sumi(wi*wo)/ sumi(wi) Ψyear({1991},{1994}) = {1991,1994}
Ψis_a({person,student},{person,TA})= {person}
![Page 11: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/11.jpg)
11
Movie
Title Genre Actor year
About a boy {Romantic; Comedy; Family}
{H. Grant:0.6; R. Weisz: 0.1;
T.Collete: 0.3}2002
![Page 12: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/12.jpg)
12
Creating an Aggregated Representation of a usage profile
pr={<o1wo1>, …,<onwon
>}
Oi object; woi=significance on the profile pr
Let assume all the object are instances of the same class
Create a new virtual object o’, with attributes ai’= Ψi(o1,…,on)
![Page 13: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/13.jpg)
13
Item level usage profileNameGenreActorYear
{A}Genre-allRomance
Romance Comedy
ComedyKids & family
{S:0.7; T:0.2; U:0.1}
{2002}
{B}Genre-allRomanceComedy
{S:0.5, T:0.5}
{1999}
{C}Genre-allRomance
{W:0.6,S:04}
{2001}
{A:1; B:1; C:1}
Genre-allRomance
{S:0.58; T:0.27;
W:0.09; U:0.05}
{1999 ,2002}
![Page 14: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/14.jpg)
14
A real (estate property) example
Property
Price Location Room num
}300K{ }Chicago{ }5{
![Page 15: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/15.jpg)
15
Item Level Usage Profile
WeightPriceLocationRoom num
1475KChicago5
0.7299KChicago4
0.18272kEvanston4
0.1899KChicago3
1365K{Chicago, Evanston}
4
![Page 16: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/16.jpg)
16
Algorithm 2 Do not just recommend other items
viewed by other users, recommend items similar to the class representative.
Advantages: More accuracy Need less examples No new item problem Consider also content similarity (item-to-
item relationship).
![Page 17: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/17.jpg)
17
Item Level Usage Profile
Weight
PriceLocationRoom#
1475KChicago5
0.7299KChicago40.180.18272k272kEvanstonEvanston44
0.180.1899K99KChicagoChicago33
1365K{Chicago, Evanston}4
1370KChicago4
![Page 18: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/18.jpg)
18
Final Algorithm
Given a web site1. Classify it contents into classes and
attributes.2. Merge the objects of each user profile
and create a pseudo object. 3. Recommend according to this pseudo-
object.
![Page 19: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/19.jpg)
19
Problems A per-topic solution Found patterns can be incomplete User patterns may change with time
(for movies) “I loved ET” problem. Need cookies and other methods to
identify users. How is weight calculated? Can need
many examples: “I loved American Beauty” problem.
How to automatically group the web-pages?
![Page 20: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/20.jpg)
20
Hafsaka?
![Page 21: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/21.jpg)
21
Constructing Knowledge Base from WWW Goal:
Automatically create computer understandable knowledge base from the web.
Why? To use in the previous described work, and similar Find all universities that offer Java Programming
courses Make me hotel and flight arrangements for the
upcoming Linux conference
![Page 22: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/22.jpg)
22
…Constructing Knowledge Base from WWW
How? Use machine learning to create information
extraction methods for each of the desired types of knowledge
Apply it, to extract symbolic, probabilistic statements directly from the web: Student-of(Rafa, sdbi)= 99%
Used method Provide an initial ontology (classes and relations) Training examples – 3 out of 4 university sites (8000 web
pages, 1400 web-page pairs)
![Page 23: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/23.jpg)
23
Fundamentals of CS Home PageInstructors:
JimTom
Jim’s Home PageI teach several courses:
Fundamental of CSIntro to AI
My research includesIntelligent web agents
Example of web pages
Classes: Faculty, Research-project, Student, Staff, (Person), Course, Department, OtherRelations: instructor-of, members-of-project, department-of.
![Page 24: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/24.jpg)
24
Entity:HomepageHomepage title
activity other
Person:Department _ofProject ofCourse taught byName of
course:instructor ofTAs of
FacultyProject lead byStudent of
JimCourses taught by
Fundamental of csIntro to AIHome-page:…
Fundamental of CSInstructor of: jim, tomHome-page:….
Research ProjectMembers of project
Ontology
Web KB instances
![Page 25: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/25.jpg)
25
Problem Assumption Class instance one-instance/one-webpage
≠ Multiple instances in one web-page≠ Multiple linked/related web-pages for instance≠ Elvis problem
Relation R(A,B) is represented by: Hyperlinks AB or ACD…B Inclusion in a particular context (I teach
Intro2cs) Statistical model of typical words
![Page 26: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/26.jpg)
26
To Learn
1. Recognizing class instances by classifying bodies of hypertext
2. Recognizing relations instances by classifying chains of hyperlinks
3. Extract text fields
![Page 27: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/27.jpg)
27
Recognizing class instances by classifying bodies of hypertext
1. Statistical bag-of-words approach1. Full Text2. Hyperlinks3. Title/Head
2. Learning first order rules Combine the previous 4 methods
![Page 28: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/28.jpg)
28
Statistical bag-of-words approach
Context-less classification Given a set of classes C={c1, c2,…cN} Given a document consisting of
nn≤2000 words {w1, w2, ..,wn} c*= argmaxc Pr(c | w1,…,wn)
![Page 29: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/29.jpg)
29
courstudfacustaffresedeptOtheAccuracy
Cours20217001055226.2
Stud042114172051943.3
Facu556118163026417.9
Staff0151400456.2
Rese8910562038413
Dept10831542091.7
Other193273120106493.6
Coverage
82.875.477.18.772.910035
predicted
actual
![Page 30: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/30.jpg)
30
Statistical bag-of-words approach: Pr(wi|c) log (Pr(wi|c)/Pr(wi|~c))
student faculty coursemy 0.0247 DDDD 0.0138 course 0.0151page 0.0109 of 0.0113 DD:DD 0.013home 0.0104 and 0.0109 homework 0.0106am 0.0085 professor 0.0088 will 0.0088university 0.0061 computer 0.0073 D 0.008computer 0.006 research 0.006 assignments 0.0079science 0.0059 science 0.0057 class 0.0073me 0.0058 university 0.0049 hours 0.0059at 0.0049 DDD 0.0042 assignment 0.0058here 0.0046 systems 0.0042 due 0.0058
reaserch-project department othergroup 0.006 department 0.0179 D 0.0374project 0.0049 science 0.0153 DD 0.0246research 0.0049 computer 0.0111 the 0.0153of 0.003 faculty 0.007 eros 0.001laboratory 0.0029 information 0.0069 hplayD 0.0097systems 0.0028 undergraduate0.0058 uDDb 0.0067and 0.0027 graduate 0.0047 to 0.0064our 0.0026 sta 0.0045 bluto 0.0052system 0.0024 server 0.0042 gt 0.005
![Page 31: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/31.jpg)
31
Accuracy/Coverage tradeoff for full-text classifiers
![Page 32: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/32.jpg)
32
Accuracy/coverage tradeoff for hyperlinks classifiers
![Page 33: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/33.jpg)
33
Accuracy/Coverage for title heading classifiers
![Page 34: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/34.jpg)
34
Learning first order rules
The previous method doesn’t consider relations between pages
A page is a course home-page if it contains the word textbook and TA and point to a page containing the word assignment.
FOIL is a learning system that constructs Horn clause programs from examples
![Page 35: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/35.jpg)
35
Relations Has_word(Page). Stemmed words: computer= computing=
comput. 200 occurrences but less than 30% in other class pages Link_to(page,page) m-estimate accuracy= (nc+(m*p))/(n+m)
nc: # of instances correctly classified by the rule N: Total # of instance classified by the rule m=2 P: proportion of instances in trainning set that belongs
to that class Predict each class with confidence = best_match /
total_#_of_matches
![Page 36: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/36.jpg)
36
New learned rules student(A) :- not(has_data(A)),
not(has_comment(A)), link_to(B,A), has_jame(B), has_paul(B), not(has_mail(B)).
faculty(A) :- has_professor(A), has_ph(A), link_to(B,A), has_faculti(B).
course(A) :- has_instructor(A), not(has_good(A)), link_to(A,B), not(link_to(B, 1)),has_assign(B).
![Page 37: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/37.jpg)
37
Accuracy/coverage for FOIL page classifiers
![Page 38: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/38.jpg)
38
Boosting
The best prediction classification depends on the class Combine the predictions using the
measure confidence
![Page 39: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/39.jpg)
39
Accuracy/coverage tradeoff for combined classifiers (2000 words vocabulary)
![Page 40: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/40.jpg)
40
Boosting
Disappointing: Somehow it is not uniformly better
Possible solutions Using reduced size dictionaries (next) Using other methods for combining
predictions (voting instead of best_match / total_#_of_matches)
![Page 41: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/41.jpg)
41
Accuracy/coverage tradeoff for combined classifiers (200 words vocabulary)
![Page 42: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/42.jpg)
42
Multi-Page segments The group is the longest prefix (indicated in
parentheses) (@/{user,faculty,people,home,projects}/*)/*.{html,htm} (@/{cs???,www/,*})/*.{html,htm} (@/{cs???,www/,*})/ …
A primary page is any page which URL matches: @/index.{html,htm} @/home.{html,htm} @/%1/%1.{html,htm} …
If no page in the group matches one of these patterns, then the page with the highest score for any non-other class is a primary page.
Any non-primary page is tagged as Other
![Page 43: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/43.jpg)
43
Accuracy/coverage tradeoff for the full text after URL grouping heuristics
![Page 44: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/44.jpg)
44
Conclusion- Recognizing Classes Hypertext provides redundant information
We can classify using several methods Full text Heading/title Hyperlinks Text in neighboring pages + Grouping pages
No method alone is good enough. Combine predictions (classify methods)
allows a better result.
![Page 45: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/45.jpg)
45
Learning to Recognize Relation Instances Assume: Relations are represented by hyper-links
Given the following background relations Class (Page) Link-to(Hyperlink,P1,P2) Has-word (H) – the word is part of the
Hyperlink All-words-capitalized (H) Has-alphanumeric-word (H) – I Teach CS2765 Has-neighborhood-word (H) – Neighborhood=
paragraph
![Page 46: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/46.jpg)
46
…Learning to Recognize Relation Instances
Try to learn the following Members-of-project(P1,P2) Intsructors_of_course(P1,P2) Department_of_person(P1,P2)
![Page 47: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/47.jpg)
47
Learned relations instructors of(A,B) :- course(A), person(B), link
to(C,B,A). Test Set: 133 Pos, 5 Neg
department of(A,B) :- person(A), department(B), link to(C,D,A), link to(E,F,D), link to(G,B,F), has neighborhood word graduate(E). Test Set: 371 Pos, 4 Neg
members of project(A,B) :- research project(A), person(B), link to(C,A,D), link to(E,D,B), has neighborhood word people(C). Test Set: 18 Pos, 0 Neg
![Page 48: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/48.jpg)
48
Accuracy/Coverage tradeoff for learned relation rules
![Page 49: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/49.jpg)
49
Learning to Extract Text Fields
Sometimes we want a small fragment of text, not the whole web-page or class (like Jon, Peter, etc) Make me hotel and flight arrangements
for the upcoming Linux conference
![Page 50: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/50.jpg)
50
Predefined predicates
Let F= w1, w2, … wj be a fragment of text length({<,>,=…}, N). some(Var, Path, Feat, Value): some (A,
[next_token, next_token], numeric, true)
position(Var, From, Relop, N): relpos(Var1, Var2, Relop, N):
![Page 51: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/51.jpg)
51
A wrongExample
ownername(Fragment) :- some(A, [prev token],
word, “gmt"), some(A, [ ], in title, true), some(A, [ ], word,
unknown), some(A, [ ], quadrupletonp,
false) length(<, 3)
Last-Modified: Wednesday, 26-Jun-96 01:37:46 GMT<title>
Bruce Randall Donald
</title><h1><img src="ftp://ftp.cs.cornell.edu/pub/brd/images/brd.gif"><p>Bruce Randall Donald<br>Associate Professor<br>
![Page 52: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/52.jpg)
52
Accuracy/coverage tradeoff for Name Extraction
![Page 53: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/53.jpg)
53
Conclusions Used machine learning algorithms to create
information extract methods for each desired type of knowledge.
WebKB achieves 70% accuracy at 30% coverage.
Bag-of-words (Hyperlinks, web-pages and full text) and First order learning can be used to boost the confidence
First order learning can be used to look outward from the page and consider its neighbors
![Page 54: Web classification Ontology and Taxonomy. 2 References Using Ontologies to Discover Domain-Level Web Usage Profiles {hdai,mobasher}@cs.depaul.edu Learning](https://reader036.vdocuments.net/reader036/viewer/2022062714/56649d405503460f94a1aee5/html5/thumbnails/54.jpg)
54
Problems Not as accurate as we want
You can get more accuracy at cost of coverage Use linguistic features (verbs) Add new methods to the booster (predict the
department of a professor, based on the department of his students advisees)
A per topic, per language, per … method. Needs hand made labeling to learn. Learners with high accuracy can be used to
teach learners with low accuracy.