detecting online commercial intention (oci)

20
1 Detecting Online Commercial Intention (OCI) Honghua Dai, Zaiqing Nie, Lee Wang, Lingzhi Zhao, Ji-Rong Wen, Ying L i WWW’06 Advisor: Chia-Hui Chang Student: Teng-Kai Fan Date: 2009-08-24

Upload: ceya

Post on 28-Nov-2014

2.402 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Detecting Online Commercial Intention (OCI)

1

Detecting Online Commercial Intention (OCI)

Honghua Dai, Zaiqing Nie, Lee Wang, Lingzhi Zhao, Ji-Rong Wen, Ying Li

WWW’06

Advisor: Chia-Hui ChangStudent: Teng-Kai Fan

Date: 2009-08-24

Page 2: Detecting Online Commercial Intention (OCI)

2

Outline

Introduction Defining (OCI): Online Commercial Intention Learning Online Commercial Intention

Web Page OCI Detector Query OCI Detector

Experiment Conclusion

Page 3: Detecting Online Commercial Intention (OCI)

3

Introduction

Two major online user activities: Browsing activity Searching activity

Three categories for user’s search intention: Navigational: reach to a particular web site. Informational: acquire information on web pages. Transactional: perform some “web-mediated”

activity.

Page 4: Detecting Online Commercial Intention (OCI)

4

Introduction cont.

OCI (Online Commercial Intention): understanding whether a user has intention to purchase or participate in commercial service.

Page 5: Detecting Online Commercial Intention (OCI)

5

Defining OCI(Online Commercial Intention) Defining OCI to be a function from a query

or a Web page to a binary value: Commercial or Non-Commercial.

The goal is to compute two functions OCI: Q → {Commercial, Non-Commercial} OCI: P → {Commercial, Non-Commercial}

Page 6: Detecting Online Commercial Intention (OCI)

6

Learning Online Commercial Intention Taxonomy-based

Using existing concept hierarchies or categories.

Machine learning approach Extracting features from page content and

building the classifiers based on those features. Labeling Process: Human-evaluation approach.

Page 7: Detecting Online Commercial Intention (OCI)

7

Web Page OCI Detector Input: a Web Page P Output: OCI (commercial or non-commercial) of P

SVM

Page 8: Detecting Online Commercial Intention (OCI)

8

Keyword Extraction and Selection Keyword extraction: both inner text and tag at

tributes of all the training data.

Feature selection:

Pr(k|C): the probability of the keyword k occurring in a Web page belonging to

class C.

12)|Pr()|Pr(

)}|Pr(),|{Pr()(

CkCk

CkCkMaxkSig

)|Pr()( CCkkFreq

Page 9: Detecting Online Commercial Intention (OCI)

9

Keyword Extraction and Selection cont. Define two aspects of properties for each

keyword k in a page p:

For a page p with n keywords can be represented in 2*n dimensions:

p

ppknit

pagein elementsfor number total

in inner text itsin appeared keyword that theelements of#),(

p

ppknta

pagein elementsfor number total

in attributes tagitsin appeared keyword that theelements of#),(

Page 10: Detecting Online Commercial Intention (OCI)

10

Query OCI Detector

Four type of data sources for query OCI: Constituent terms of search query.

Ex.: “airline ticket deals”, “digital camera price”.

Content of top landing pages recommended by search engine.

Content of search result page. Including title, short descriptions, and URL links.

The number of user clicks of landing pages recommended by search engine.

Page 11: Detecting Online Commercial Intention (OCI)

11

Detecting OCI based on Top Search Result Landing Pages Using top-10 result pages generated by

MSN.

Using the Web page OCI detector to detect the OCI of top 10 landing pages.

.query ofresult search in the rank has that page Web theis qnpnq

Page 12: Detecting Online Commercial Intention (OCI)

12

Detecting OCI based on Top Search Result Landing Pages cont.

Page 13: Detecting Online Commercial Intention (OCI)

13

Detecting OCI based on First Search Result Page

Page 14: Detecting Online Commercial Intention (OCI)

14

Experiments

Data 1408 US English queries. Collect the first search result page for 1408 queries. Collect the top 10 landing pages for 1408 queries Randomly pick 26186 English Web pages.

Labeling Analysis

Page 15: Detecting Online Commercial Intention (OCI)

15

Evaluation Methodology

For Web OCI detector, due to unbalanced problem, they selected all commercial pages and the equals number of non-commercial to train model.

For query OCI detector: Compare the model based on first search result page and top N result

landing pages. Using 3-fold cross validation.

Measures: Precision, Recall and F-Measure

Page 16: Detecting Online Commercial Intention (OCI)

16

Evaluating Page OCI Detector

CP (Precision), CR (Recall), CF (F-measure)

Page 17: Detecting Online Commercial Intention (OCI)

17

Evaluating Page OCI Detector cont.

Page 18: Detecting Online Commercial Intention (OCI)

18

Evaluating Query OCI Detector

Page 19: Detecting Online Commercial Intention (OCI)

19

OCI Analysis for a Stratified Query Sample based on Query Frequency Divided query frequency into 5: Single, Very low, Low, Mid, and High. Randomly select 10000 queries for each level.

Observation: Query set with high frequency have larger portion of queries with commercial intention.

Page 20: Detecting Online Commercial Intention (OCI)

20

Conclusion

They present the framework of building machine learning models to learn OCI (queries and Web pages) based on any web page content.