social context summarization · •solving via dual wing factor graph model •experimental result...

34
1 Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, Juanzi Li Tsinghua University IBM, China Research Lab July 25, 2011, SIGIR 2011, Beijing Social Context Summarization

Upload: others

Post on 16-Oct-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

1

Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, Juanzi Li

Tsinghua University IBM, China Research Lab

July 25, 2011, SIGIR 2011, Beijing

Social Context Summarization

Page 2: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

2

Outline

• Motivation

• Prior Work

• Social Context Summarization

• Solving via Dual Wing Factor Graph Model

• Experimental Result and Case Study

• Conclusion and Future Work

Page 3: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

3

Motivation

• Document summarization

Document Summary

Page 4: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

4

Motivation

• Context and social context

Web document Related documents

Hyperlinks

Click-through data

Discussing Tweets

Opinions/comments

Associated authors/users

User network

Message network

Page 5: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

5

Motivation

• How do (social) contexts help summarization?

Related documents

Discussing Tweets

User network

Message network

Textual information • Integrate context information to

help estimate the informativeness of sentences

Social influence • If a user is an opinion leader, his/her comments

should be more important than others.

Information propagation • If a comment has been forwarded or replied by

many other users, the comment should be more important than others.

Page 6: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

6

Challenges

• To incorporate users, user generated contents, social networks, and implicit networks (such as the forward/reply network).

How to formally define the social context?

• To capture the dependency between the social context information and the quality of summaries.

How to formalize the problem in a principled framework?

• The social context contains inevitable noise.

How to validate on a real-world data set?

Page 7: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

7

Outline

• Motivation

• Prior Work

• Social Context Summarization

• Solving via Dual Wing Factor Graph Model

• Experimental Result and Case Study

• Conclusion and Future Work

Page 8: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

8

Prior Work

• Document summarization w/o context

0.5 0.7 0.4 0.5 0.6

0.3 0.7 0.4 0.5 0.6 0.5 0.4 0.5 0.3 0.6

Sentence Scoring

Linguistic features • rhetorical structure, lexical chains Statistical features • term significance, sentences similarity

Sentence Selection

Unsupervised approaches Supervised approaches • Binary classification: Naïve

Bayes [Kupiec:95], Maximum Entropy [Osborne:02], Genetic Algorithm [Yeh:05]

• Sequence labeling: HMM [Conroy:01], CRF [Shen:07]

Page 9: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

9

Prior Work

• Binary classification vs. sequence labeling

– Example: document d with sentence set Sd = {s1, s2, s3, s4} associated with Yd = {y1, y2, y3, y4}.

s1 s2 s3 s4

y1 y2 y3 y4

f1 f2 f3 f4g12 g23 g34

4

1

( )i

i

p f

x

Sequence labeling

s1 s2 s3 s4

y1 y2 y3 y4

f1 f2 f3 f4

Binary classification

4 3

, 1

1 1

( )i i i

i i

p f g

x

Page 10: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

10

Prior Work

• Binary classification vs. sequence labeling

– Example: document d with sentence set Sd = {s1, s2, s3, s4} associated with Yd = {y1, y2, y3, y4}.

s1 s2 s3 s4

y1 y2 y3 y4

f1 f2 f3 f4g12 g23 g34

4

1

( )i

i

p f

x

Sequence labeling

s1 s2 s3 s4

y1 y2 y3 y4

f1 f2 f3 f4

Binary classification

4 3

, 1

1 1

( )i i i

i i

p f g

x

Factor graph representation

Nodes • represent variables or factors Edges • associate factors with variables

Page 11: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

11

Prior Work

• Document summarization with context

–Contents from external documents [Wan:08], cited articles [Mei:08], documents selected by users [Mani:98]

–Text surrounding the hyperlink [Amitay:00, Delort:03]

–Search engine click-through data [Sun:05]

–Commented sentence selection [Hu:08] or opinionated text [Ganesan:10, Paul:10]

Page 12: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

12

Prior Work

• Document summarization with context

–Contents from external documents [Wan:08], cited articles [Mei:08], documents selected by users [Mani:98]

–Text surrounding the hyperlink [Amitay:00, Delort:03]

–Search engine click-through data [Sun:05]

–Commented sentence selection [Hu:08] or opinionated text [Ganesan:10, Paul:10]

LIMITATION

Lacks of a unified approach

To explore the non-textual information from social context

To take full advantage of conventional techniques in summarization

!

Page 13: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

13

Outline

• Motivation

• Prior Work

• Social Context Summarization

• Solving via Dual Wing Factor Graph Model

• Experimental Result and Case Study

• Conclusion and Future Work

Page 14: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

14

Social Context Summarization

• Social Context and SCAN (in Twitter scenario)

Web document d

Associated tweet set Md = {mi}

Associated user set Ud = {ui}

User relations Ed

u = {(ui, uj)}

Tweet relations Ed

m = {(mi, mj)}

Social context of d Cd = <Md, Ud>

s1 s2 s3 s4

Sentence relations Ed

s = {(si, sj)} Social Context Augmented

Network (SCAN) Gd = (Sd, Cd, Ed),

Ed = Eds ∪ Ed

m ∪ Edu

Page 15: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

15

Social Context Summarization

• SCAN

– Three-layer perspective

• Summarization objective

DocumentLayer

MicroblogLayer

UserLayer

f0llow

f0llow

u1

u2

u3

reply

retweetreply

d2

retweet

m6

m5

m7

m8

d1

s1 s2 s3 s4

m9

m10s1 s3

m7 m9

A summary consists of

Important tweets Md*

Important sentences Sd*

Subproblem 1: Key Sentence

Extraction

Subproblem 2: Tweet

Summarization

Page 16: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

16

Social Context Summarization

• SCAN

– Three-layer perspective

• Summarization objective

DocumentLayer

MicroblogLayer

UserLayer

f0llow

f0llow

u1

u2

u3

reply

retweetreply

d2

retweet

m6

m5

m7

m8

d1

s1 s2 s3 s4

m9

m10s1 s3

m7 m9

A summary consists of

Important tweets Md*

Important sentences Sd*

Subproblem 1: Key Sentence

Extraction

Subproblem 2: Tweet

Summarization

QUESTION

How to capture the mutual reinforcement between the two subproblems to facilitate generating a HQ summary?

?

Page 17: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

17

Outline

• Motivation

• Prior Work

• Social Context Summarization

• Solving via Dual Wing Factor Graph Model

• Experimental Result and Case Study

• Conclusion and Future Work

Page 18: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

18

Sentence & Social Context Scoring

Sentence & Social Context Selection

Summarized Social Context

Social Context

Modeling

Document Modeling

Testing Set

Web Document

Social Context

TrainingSet

Social Context

Web Document

Solving via Dual Wing Factor Graph Model

• Overview

– Supervised framework

Take advantage of conventional techniques in summarization

Explore the non-textual information from social context

Reinforce!

Page 19: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

19

m5 m6 m7 m8

y5 y6 y7 y8

f5 f6 f7g56 g67

g68

f8

Solving via Dual Wing Factor Graph Model

• Modeling

s1 s2 s3 s4

y1 y2 y3 y4

f1 f2 f3 f4g12 g23 g34Document

Layer

MicroblogLayer

UserLayer

f0llow

f0llow

u1

u2

u3

reply

retweetreply

d2

retweet

m6

m5

m7

m8

d1

s1 s2 s3 s4

m9

m10

Local attribute factor

Sentence dependency factor

Tweet dependency factor

Page 20: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

20

Solving via Dual Wing Factor Graph Model

• Modeling

m5 m6 m7 m8

y5 y6 y7 y8

f5 f6 f7g56 g67

g68

f8

m5 m6 m7 m8

y5 y6 y7 y8

f5 f6 f7

g56 g67

g68

f8

s1 s2 s3 s4

y1 y2 y3 y4

f1 f2 f3 f4g12 g23 g34

h15 h16 h37 h47

s1 s2 s3 s4

y1 y2 y3 y4

f1 f2 f3 f4g12 g23 g34Document

Layer

MicroblogLayer

UserLayer

f0llow

f0llow

u1

u2

u3

reply

retweetreply

d2

retweet

m6

m5

m7

m8

d1

s1 s2 s3 s4

m9

m10

Local attribute factor

Sentence dependency factor

Tweet dependency factor

Cross-domain dependency factor

Page 21: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

21

Solving via Dual Wing Factor Graph Model

• Modeling

m5 m6 m7 m8

y5 y6 y7 y8

f5 f6 f7g56 g67

g68

f8

m5 m6 m7 m8

y5 y6 y7 y8

f5 f6 f7

g56 g67

g68

f8

s1 s2 s3 s4

y1 y2 y3 y4

f1 f2 f3 f4g12 g23 g34

h15 h16 h37 h47

s1 s2 s3 s4

y1 y2 y3 y4

f1 f2 f3 f4g12 g23 g34Document

Layer

MicroblogLayer

UserLayer

f0llow

f0llow

u1

u2

u3

reply

retweetreply

d2

retweet

m6

m5

m7

m8

d1

s1 s2 s3 s4

m9

m10

Local attribute factor

Sentence dependency factor

Tweet dependency factor

Cross-domain dependency factor

QUESTION

How to define these factors?

?

Page 22: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

22

Solving via Dual Wing Factor Graph Model

• Modeling

– Local attribute factor

• Maximum entropy

– Dependency factor

• Binary value

, ,( , ) e x p

i c c i c i c if y x y

,

i f s o m e c o n d i t i o n h o ld s( , , )

1 o t h e r w i s e

c

i j c c i j

eg y y

y𝑖 ≠ 1 or y𝑗 ≠ 1

What condition?

Sentence dependency

Tweet dependency

Cross-domain dependency

y𝑖 ≤ y𝑗 y𝑖 = y𝑗

si sj mi mj si mj

Only for similarity > θg or θh

Page 23: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

23

Solving via Dual Wing Factor Graph Model

• Modeling

– Objective function

• Parameter estimation

– L-BFGS

• Inference algorithm

– Loopy belief propagation

– Could be distributed easily!

, ,

,

1m a x ( , ) ( , , ) ( , , )

i c c i i j c c i j i j i j

i j T c C

f y g y y h y yZ

Parameters

Page 24: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

24

Outline

• Motivation

• Prior Work

• Social Context Summarization

• Solving via Dual Wing Factor Graph Model

• Experimental Result and Case Study

• Conclusion and Future Work

Page 25: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

25

Experimental Result and Case Study

• Data preparation

• Distribution of frequency

• Evaluate on only HQ websites

– CNN, BBC, MTV, ESPN, Mashable.

4,874,389 users

404,544,462 tweets

200,000 most frequent URLs

12,964,166 tweets

Ad. pages

Page 26: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

26

Experimental Result and Case Study

• Annotation method

– Manual annotation on

– 1,145 HITs (Human Intelligent Tasks) from 158 different users

• Evaluation methods

– F1 and ROUGE-1,2

• Baseline methods

– Supervised: SVM, logistic regression (LR), CRF, SVM+, LR+ (include neighbors’ features)

– Unsupervised: random, Doc/ParaLead, PageRank

Page 27: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

27

Experimental Result and Case Study

• Feature

– 6 basic features for sentences

– 5 basic features for tweets

• author’s PageRank (influence from user network)

• Result

00.10.20.30.40.50.60.7

00.10.20.30.40.50.60.7

F1 R1 R2

sentence tweet

Improved with social context Not improved

Page 28: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

28

Experimental Result and Case Study

• Analysis on results

– Why not significantly improved on Tweet Summarization?

– Significantly improved on the MTV and ESPN domains

• Fewer documents and corresponding tweets

Users' interests Annotations Tweets

Page 29: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

29

Experimental Result and Case Study

• Analysis on results

– Impact of thresholds θg and θh

Smaller θh More LQ relations

Greater θh Less HQ relations

Smaller θg More LQ relations

Greater θg Less HQ relations

Page 30: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

30

Experimental Result and Case Study

• Analysis on results

– Factor contribution analysis

sentence

tweet

Sentence position in paragraph

Sentence length

Averaged TF-IDF

Tweet length

Page 31: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

31

Experimental Result and Case Study

• Case study

This is both weird and macabre at the same time.

Weekend At Bernie's come true (or "Wochenende an Bernie" in German)

BBC News, 'Women try to take body on plane at Liverpool airport'

Please don't disturb my friend, he's dead tired <--- Lmao

Please don't disturb my friend, he's dead tired

And in comedy news today, two women try to re-enact Weekend at Bernie’s at Liverpool Airport

Police have arrested two women after they tried to take the body of a dead relative on to a plane at Liverpool John Lennon Airport.

The women - his widow and step-daughter - said they thought he was asleep.

"I [did not] kill my Willi.

My Willi is my god.

I [have loved] my Willi for 22 years."

And she insisted that with his eyes closed they believed he was asleep.

“A dead person you cannot carry to Germany, there are too many people checking and security.

How can you bring a dead person to Germany?”

0.50 0.55 0.50 0.49 0.59 0.50

0.49 0.50 0.50 0.48 0.50

0.50

0.50

0.49

0.51

0.50

0.51

0.51

0.51

0.51

0.49

0.49

0.61 0.55 0.50 0.39 0.76 0.52

0.50 0.55 0.50 0.49 0.59 0.50

0.50 0.50 0.52 0.50 0.50

0.51 0.50 0.51 0.50 0.50

Sentence relationTweet relationSentence localTweet localSentence-tweet relation

High-freq terms in Doc: women, dead person, body, Liverpool Airport High-freq term in Tweets: Liverpool airport, Weekend At Bernie’s

“Women try to take body on plane at Liverpool airport

additional

Page 32: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

32

Outline

• Motivation

• Prior Work

• Social Context Summarization

• Solving via Dual Wing Factor Graph Model

• Experimental Result and Case Study

• Conclusion and Future Work

Page 33: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

33

Conclusion and Future Work • Conclusion

– Formally defined the social context that incorporates users, user generated contents, and associated networks.

– Proposed a Dual Wing Factor Graph Model to capture the dependency among SCANs.

– Conducted experiments on real world data set, and the proposed method outperforms the baseline methods.

• Future work – Explore friendship among user network

more “elegantly”.

Page 34: Social Context Summarization · •Solving via Dual Wing Factor Graph Model •Experimental Result and Case Study •Conclusion and Future Work . 3 Motivation •Document summarization

34

Thanks! QA

Zi Yang

July 25, 2011, SIGIR 2011, Beijing