opinion integration through semi-supervised topic modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ ·...

30
1 Opinion Integration Through Opinion Integration Through Semi Semi supervised Topic Modeling supervised Topic Modeling Yue Lu and Chengxiang Zhai University of Illinois at UrbanaChampaign

Upload: others

Post on 19-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

1

Opinion Integration Through Opinion Integration Through SemiSemi‐‐supervised Topic Modelingsupervised Topic Modeling

Yue Lu and Chengxiang Zhai

University of Illinois at Urbana‐Champaign

Page 2: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

2

Why Opinion Integration?Why Opinion Integration?

• Web 2.0  huge amount of opinions

• What have been said about Hillary Clinton?190,451 posts

4,773,658 results

How to digest all?

Page 3: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

3

4,773,658 results

Two Kinds of OpinionsTwo Kinds of Opinions

Expert opinions

•CNET editor’s review

•Wikipedia article

•Well‐structured

•Easy to access

•Maybe biased

•Outdated soon

190,451 posts

Ordinary opinions

•Forum discussions

•Blog articles

•fragmental

•Hard to access

•Represent the majority

•Up to date

How to benefit from both?

Page 4: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

4

Research QuestionsResearch Questions

• How do we formalize the problem of opinion integration?

• How do we solve the problem in a general way?

• How do we evaluate it?

Page 5: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

5

Problem DefinitionProblem Definition

cute… tiny… ..thicker..

last many hrs

die out soon

could afford it

still expensive

DesignBatteryPrice..

DesignBatteryPrice..

Topic: iPod

Expert review with aspects

Weblogs

Integrated Summary

DesignBattery

Price

DesignBattery

Price

iTunes … easy to use…

warranty …better to extend..

Similar opinions

Supplementaryopinions

Output

Review

Aspects

Extra Aspects

Input

Page 6: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

6

ChallengesChallenges

cute… tiny… ..thicker..

last many hrs

die out soon

could afford it

still expensive

Integrated Summary

DesignBattery

Price

DesignBattery

Price

iTunes … easy to use…

warranty …better to extend..

Rev

iew

Asp

ects

Extr

a A

spec

ts

Similar opinions

Supplementaryopinions

1. How to align

opinions to expert aspects or extra

aspects?

1. How to align

opinions to expert aspects or extra

aspects?

2. How to distinguish similar opinions with supplementary ones?

2. How to distinguish similar opinions with supplementary ones?

3. How to extract representative

opinions with support?

3. How to extract representative

opinions with support?

Page 7: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

7

Two Major StepsTwo Major Steps

• Step 1: opinion sentences retrieval

• Step 2: opinion integration using probabilistic topic models (3 subtasks)

Weblogs on iPod

Query = “iPod”

General Weblogs

Page 8: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

8

Subtask 1: Aspect AlignmentSubtask 1: Aspect Alignment

cute… tiny… ..thicker..

last many hrs die out soon

could afford it still expensiveWeblogs

DesignBattery

Price

DesignBattery

Price

iTunes … easy to use…

warranty …better to extend..

Align opinion sentences to aspects

Rev

iew

Asp

ects

Extr

a A

spec

ts

Page 9: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

9

Subtask 2: Opinions Separation Subtask 2: Opinions Separation 

cute… tiny… ..thicker..

last many hrs die out soon

could afford it still expensive

DesignBattery

Price

DesignBattery

Price

iTunes … easy to use…

warranty …better to extend..

Separate sim opinions from supp onesR

evie

w A

spec

tsEx

tra

Asp

ects

Page 10: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

10

Subtask 3: Opinion SummarySubtask 3: Opinion Summary

cute… tiny… ..thicker..

last many hrs die out soon

could afford it still expensive

DesignBattery

Price

DesignBattery

Price

iTunes … easy to use…

warranty …better to extend..

• Representativesentence:• Support = 3

Representative Opinion(RO)

Summarize each block with representative sentences and support

Rev

iew

Asp

ects

Extr

a A

spec

ts

Page 11: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

11

Basic PLSA: Generation ProcessBasic PLSA: Generation Process

w

Topics

Collection background

λB

B

Document

Is 0.05the 0.04a 0.03 ..

θ1

θ2

θk

πd1

πd2

πdk

battery 0.3 life 0.2..

design 0.1screen 0.05

price 0.2purchase 0.15

Generate a word in a document

Generate a word in a document

Topic model = unigram language model= multinomial distribution

[Hofmann 99], [Zhai et al. 04]

Page 12: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

12

Basic PLSA: EstimationBasic PLSA: Estimation

w

Topics

Collection background

λB

B

Document

Is ?the ?a ?

θ1

θ2

θk

πd1

πd2

πdk

battery ? life ?

design ?screen ?

price ?purchase ?

Generate a word in a document

Generate a word in a document

[Hofmann 99], [Zhai et al. 04]

?

?

? Log-likelihood of the collection

Log-likelihood of the collection

Estimated with Maximum Likelihood Estimator (MLE) through an EM algorithm

Page 13: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

13

Basic PLSA: Problem?Basic PLSA: Problem?

Design..Battery..Price..…

Design..Battery..Price..…

Expert review with aspects

Weblogson iPod

iPod nanoiPod shuffleiPod touch…

Basic PLSABasic PLSA

Extracted topics may not align with expert

review aspects

Extracted topics may not align with expert

review aspects

Solution: conjugate priorsSemi-supervised PLSA

Page 14: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

14

SemiSemi‐‐supervised PLSAsupervised PLSA

Maximum A Posterior (MAP) Estimation

Maximum A Posterior (MAP) Estimation

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE)Add Dirichlet priorsAdd Dirichlet priors

1 - λB

w

Topics

Collection background

λB

BIs 0.05the 0.04a 0.03 ..

θ1

θ2

θk

πd1

πd2

πdk

Document

battery life

design screen

r1

r2

•Confidence in priors•Regularization

Page 15: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

15

Subtask 1: Aspect AlignmentSubtask 1: Aspect Alignment

cute… tiny… ..thicker..

last many hrs die out soon

could afford it still expensiveWeblogs

DesignBattery

Price

DesignBattery

Price

iTunes … easy to use…

warranty …better to extend..

Rev

iew

Asp

ects

Extr

a A

spec

ts

Page 16: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

16

Align toreview aspects

Align toreview aspects

Discover extra aspects

Discover extra aspects

Subtask 1: Aspect AlignmentSubtask 1: Aspect Alignment

1 - λB

w

Topics

Collection background

λB

BIs 0.05the 0.04a 0.03 ..

θ1

θ2

θk

πd1

πd2

πdk

Document

battery life

design screen

r1

r2

“aspect words”(nouns)

Opinion sentence aligned to the most relevant aspect

Page 17: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

17

Subtask 2: Opinions Separation Subtask 2: Opinions Separation 

cute… tiny… ..thicker..

last many hrs die out soon

could afford it still expensive

DesignBattery

Price

DesignBattery

Price

iTunes … easy to use…

warranty …better to extend..

Rev

iew

Asp

ects

Extr

a A

spec

ts

Page 18: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

18

Subtask 2: Opinions SeparationSubtask 2: Opinions Separation

Similar opinionsSimilar opinions

Supplementary opinions

Supplementary opinions

1 - λB

w

Topics

Collection background

λB

BIs 0.05the 0.04a 0.03 ..

θsim

θsupp

πd1

πd2

Document

long

manyr1

Sub-Collection: Battery

“opinion words”about Battery

Page 19: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

19

Subtask 3: Opinion SummarySubtask 3: Opinion Summary

cute… tiny… ..thicker..

last many hrs die out soon

could afford it still expensive

DesignBattery

Price

DesignBattery

Price

iTunes … easy to use…

warranty …better to extend..

• Representativesentence:• Support = 3

Representative Opinion(RO) R

evie

w A

spec

tsEx

tra

Asp

ects

Page 20: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

20

Subtask 3: Opinion SummarySubtask 3: Opinion Summary

1 - λB

w

RO clusters

Collection background

λB

BIs 0.05the 0.04a 0.03 ..

θ1

θ2

θk

πd1

πd2

πdk

DocumentSub-Collection: A Block

Centroid Sentence 1

Centroid Sentence 2

…Support = cluster size

Page 21: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

21

Experiment SetupExperiment Setup

• Expert review data: 

Topic Source # words # aspects

iPhone CNET 4434 19

Barack Obama Wikipedia 312 14

• Ordinary opinion data: 

Topic Query Terms # articles # sentences

iPhone iPhone 552 3000

Barack Obama Barack+Obama 639 1000

Page 22: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

22

Results: Product (Results: Product (iPhoneiPhone))

• Opinion Integration with review aspectsReview article Similar opinions Supplementary opinions

You can make emergency calls, but you can't use any other functions…

N/A … methods for unlocking the iPhone have emerged on the Internet in the past few weeks, although they involve tinkering with the iPhone hardware…

rated battery life of 8 hours talk time, 24 hours of musicplayback, 7 hours of video playback, and 6 hours on Internet use.

iPhone will Feature Up to 8 Hours of Talk Time, 6 Hours of Internet Use, 7 Hours of Video Playback or 24 Hours of AudioPlayback 

Playing relatively high bitrateVGA H.264 videos, our iPhonelasted almost exactly 9 freaking hours of continuous playback with cell and WiFi on (but Bluetooth off). 

Unlock/hack iPhone

Activation

Battery

Confirm the opinions from

the review

Additional info under real usage

Page 23: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

23

Results: Product (Results: Product (iPhoneiPhone))

• Opinions on extra aspects

support Supplementary opinions on extra aspects

15 You may have heard of iASign … an iPhone Dev Wiki tool that allows you to activate your phone without going through the iTunes rigamarole. 

13 Cisco has owned the trademark on the name "iPhone" since 2000, when it acquired InfoGear Technology Corp., which originally registered the name. 

13 With the imminent availability of Apple's uber cool iPhone, a look at 10 things current smartphones like the Nokia N95 have been able to do for a while and that the iPhone can't currently match... 

Another way to activate iPhone

iPhone trademark originally owned

by Cisco

A better choice for smart phones?

Page 24: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

24

Results: Product (Results: Product (iPhoneiPhone))

• Support statistics for review aspectsPeople care about price

People comment a lot about the unique

wi-fi feature

Controversy: activation requires contract with AT&T

Page 25: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

25

Quantitative EvaluationQuantitative Evaluation

• Goal– Evaluate human agreement (how hard is opinion integration?)

– Evaluate how our approach could reproduce human choice (how well is our method doing?)

• Method– Ask 3 users to perform 3 tasks 

– Tasks designed from the Obama example

Page 26: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

26

Task 1:Distinguish Extra AspectsTask 1:Distinguish Extra Aspects

• Result– Low human agreement (1/7)– Our method recovers 52.4% of user choices on avg

cute… tiny… ..thicker..

last many hrs

die out soon

could afford it

still expensive

DesignBattery

Price

DesignBattery

Price

iTunes … easy to use…

warranty …better to extend..

34 opinions

7 extra aspects

Page 27: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

27

Task 2: Aspect AlignmentTask 2: Aspect Alignment

Results:– Users agree on 13/27 = 48% sentences

– Our method recovers 10.67/27 = 40% sentences on avg.

cute… tiny… ..thicker..

last many hrs

die out soon

could afford it

still expensive

DesignBattery

Price

DesignBattery

Price

Rev

iew

Asp

ects• Mix 27 opinions

• Label each with one of 14 aspects

Page 28: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

28

Quantitative Evaluation: Task 3Quantitative Evaluation: Task 3

• Mix one sim opinion with many supp opinions

• Select one opinion most similar to the review opinion

• Result: recovers 60%of human choice

cute… tiny… ..thicker..

last many hrs

die out soon

could afford it

still expensive

DesignBattery

Price

DesignBattery

PriceRev

iew

Asp

ects

Page 29: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

29

SummarySummary

• Novel problem: opinion integration

• Unified approach: semi‐supervised probabilistic topic modeling

• Many potential interesting applications

• Future Work– More rigorous evaluation

– More general setup: many expert reviews instead of one

Page 30: Opinion Integration Through Semi-supervised Topic Modelingsifaka.cs.uiuc.edu/~yuelu2/pub/ · 2008-05-10 · 3 4,773,658 results Two Kinds of Opinions Expert opinions •CNET editor’s

30

Thank you!Thank you!