mining bulletin board systems using community generation ming li, zhongfei (mark) zhang, and zhi-hua...
TRANSCRIPT
Mining Bulletin Board Systems Using Community Generation
Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua ZhouPAKDD’08
Reporter: Che-Wei, LiangDate: 2008.07.10
1
Outline
• Introduction• General Model• Interest-Sharing Group Identification• Predicting User Behavior Using Generated
Community• Experiment
2
Introduction
• Bulletin Board System (BBS)– Information exchanging and sharing platform– Consists of a number of boards– Users can read/post messages on different topics
• Users with similar interests may have similar actions
• Effective discovery of relationships between users of a BBS is essential
3
4
General Model
• Consider the posted messages,– Use title to fully determine the topics of message– Extracted key words of titles – Mapped to collected topics
• A BBS user tends to join in a discussion on topics that he or she is interested– Messages that users posted may reflect users’ interests– Users’ interests are time-dependent– Frequency of messages posted should also be assessed
5
General Model
• Access pattern of BBS users– View of Topics• A set of topics and user access frequencies of the
messages posted to different boards by different users along the timeline
– View of Boards• A set of boards and frequencies of messages posted to
the boards along the timeline
6
General Model
• BBS model– A collection of users, each being represented by
two timelines of actions on Boards view and Topics view
7
Interest-Sharing Group Identification
8
Interest-Sharing Group Identification
• Given two timelines of actions X and Y of two users idx and idy
• A Straight forward way – Similarity between Xi and Yj =
9
Interest-Sharing Group Identification
• Average frequency differences of actions
• Local similarity between Xi and Yj
10
Interest-Sharing Group Identification
• Hybrid similarity between Xi and Y
• Global similarity between X and Y
11
Predict User Behavior Using Generated Community
• Given a user idi, – Predict what action idi may take in the near future
• Actions that have been taken by idi may be closely related to idi’s future actions– Possible solution• Compute posterior probability
12
Predict User Behavior Using Generated Community
• Resolved with interest-sharing groups– Similar users may take similar actions at some
time instants
13
BPUC algorithm
14
Experiment
• Data Set– BBS of Nanjing University– messages collected from January 1st, 2003 to
December 1st, 2005 on 17 most popular boards.– 4512 topics of 17 boards, 1109 users.
• Evaluation set – 42 volunteers, 18 users interested in modern
weapons, 12 users are fond of programming skills; rest of users are interested in computer games
15
16
Experiments on Community Generation
• Neighborhood accuracy– Describes how accurate the neighbors of a user in
a generated community share similar interests to that of the user
• Component accuracy– Measures how well these generated groups
represent certain interests that are common to the individuals of the groups
17
Experiments on Community Generation
• Example– A generated community, 7 links between similar
users, 10 links between dissimilar users
– Neighborhood accuracy = (7+10)/21 = 0.810Component accuracy = (7+0)/21 = 0.333
18
Experiments on Community Generation
• Compare with CORAL
19
Experiments on Community Generation
20
Experiments on Community Generation
• Running time comparison
21
Experiments on User Behavior Prediction
• 1056 days for training the probability model• Last 10 days for testing
22