mining bulletin board systems using community generation ming li, zhongfei (mark) zhang, and zhi-hua...

22
Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10 1

Upload: anne-ariel-simmons

Post on 03-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Mining Bulletin Board Systems Using Community Generation

Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua ZhouPAKDD’08

Reporter: Che-Wei, LiangDate: 2008.07.10

1

Page 2: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Outline

• Introduction• General Model• Interest-Sharing Group Identification• Predicting User Behavior Using Generated

Community• Experiment

2

Page 3: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Introduction

• Bulletin Board System (BBS)– Information exchanging and sharing platform– Consists of a number of boards– Users can read/post messages on different topics

• Users with similar interests may have similar actions

• Effective discovery of relationships between users of a BBS is essential

3

Page 4: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

4

Page 5: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

General Model

• Consider the posted messages,– Use title to fully determine the topics of message– Extracted key words of titles – Mapped to collected topics

• A BBS user tends to join in a discussion on topics that he or she is interested– Messages that users posted may reflect users’ interests– Users’ interests are time-dependent– Frequency of messages posted should also be assessed

5

Page 6: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

General Model

• Access pattern of BBS users– View of Topics• A set of topics and user access frequencies of the

messages posted to different boards by different users along the timeline

– View of Boards• A set of boards and frequencies of messages posted to

the boards along the timeline

6

Page 7: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

General Model

• BBS model– A collection of users, each being represented by

two timelines of actions on Boards view and Topics view

7

Page 8: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Interest-Sharing Group Identification

8

Page 9: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Interest-Sharing Group Identification

• Given two timelines of actions X and Y of two users idx and idy

• A Straight forward way – Similarity between Xi and Yj =

9

Page 10: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Interest-Sharing Group Identification

• Average frequency differences of actions

• Local similarity between Xi and Yj

10

Page 11: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Interest-Sharing Group Identification

• Hybrid similarity between Xi and Y

• Global similarity between X and Y

11

Page 12: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Predict User Behavior Using Generated Community

• Given a user idi, – Predict what action idi may take in the near future

• Actions that have been taken by idi may be closely related to idi’s future actions– Possible solution• Compute posterior probability

12

Page 13: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Predict User Behavior Using Generated Community

• Resolved with interest-sharing groups– Similar users may take similar actions at some

time instants

13

Page 14: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

BPUC algorithm

14

Page 15: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Experiment

• Data Set– BBS of Nanjing University– messages collected from January 1st, 2003 to

December 1st, 2005 on 17 most popular boards.– 4512 topics of 17 boards, 1109 users.

• Evaluation set – 42 volunteers, 18 users interested in modern

weapons, 12 users are fond of programming skills; rest of users are interested in computer games

15

Page 16: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

16

Page 17: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Experiments on Community Generation

• Neighborhood accuracy– Describes how accurate the neighbors of a user in

a generated community share similar interests to that of the user

• Component accuracy– Measures how well these generated groups

represent certain interests that are common to the individuals of the groups

17

Page 18: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Experiments on Community Generation

• Example– A generated community, 7 links between similar

users, 10 links between dissimilar users

– Neighborhood accuracy = (7+10)/21 = 0.810Component accuracy = (7+0)/21 = 0.333

18

Page 19: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Experiments on Community Generation

• Compare with CORAL

19

Page 20: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Experiments on Community Generation

20

Page 21: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Experiments on Community Generation

• Running time comparison

21

Page 22: Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date: 2008.07.10

Experiments on User Behavior Prediction

• 1056 days for training the probability model• Last 10 days for testing

22