tag clouds revisited date : 2011/12/12 source : cikm’11 speaker : i- chih chiu advisor : dr. koh....

26
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Upload: ambrose-cross

Post on 27-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Tag Clouds Revisited

Date : 2011/12/12Source : CIKM’11Speaker : I- Chih ChiuAdvisor : Dr. Koh. Jia-ling

1

Page 2: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Index

• Introduction• Tag Selection Framework• Tag Selection Strategies

Based on FrequencyBased on DiversityBased on Rank Aggregation

• Evaluation Methodology• Experimental Evaluation• Conclusions

2

Page 3: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Introduction• Tagging has become a very common feature in Web 2.0

applications, providing a simple and effective way for users to freely annotate resources to facilitate their discovery and management.

• Tag clouds have become popular as a summarized representation of a collection of tagged resources.

3

Page 4: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Introduction

• MotivationHow effective is the strategy of ranking tags in item collections

based on their frequency?Are there any better strategies for this task?

4

Page 5: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Tag Selection Framework• Definition:

G : A set of (possibly overlapping)groupsU : A set of objectsT : A set of tags: The set of tags assigned to an object u.: The set of objects tagged with t

5t1 t2 t3 t4 t5

t1 t3 t4

t1 t2 t5

t1 t3

t1 t3 t5

u1

u2

u3

u5u4

group G

𝑇 (𝐺)={𝑡 1 , 𝑡 2 ,𝑡 3 ,𝑡 4 , 𝑡 5 }

Page 6: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Tag Selection Framework

• Define the overall utility value of TG

6

(t) is the rank of a tag (t) is a scoring function

() is a discount function

group G

TG={t1,t3,t5}

Assume = 0.5

t1 t2 t3 t4 t5

t1 t3 t4

t1 t2 t5

t1 t3

t1 t3 t5

u1

u2

u3

u5u4

={t1,t2,t3,t4,t5}

Page 7: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Tag Selection Framework

• The optimal tag cloud for G is the set TG that is a subset of T(G) with size k and maximizes the utility function F

• Propose different tag selection methods based on different approaches for defining the utility function f for the members of the tag cloud. 7

TG={t1,t2,t3}TG={t1,t2,t4}TG={t1,t2,t5}…TG={t3,t4,t5}

Page 8: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Tag Selection Strategies• Base on Frequency

Frequency scoringTF.IDF scoringGraph-based scoring

• Based on DiversityDiversityNovelty

• Based on Rank Aggregation

8

Page 9: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Based on Frequency• Frequency scoring

The number of objects to which a tag is assigned.

9t1 t2 t3 t4 t5

t1 t3 t4

t1 t2 t5

t1 t3

t1 t3 t5

u1

u2

u3

u5u4

group G

= 1

= 0.4

= 0.8

= 0.4

= 0.6

Page 10: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Based on Frequency• TF.IDF scoring

The computation of the utility score of a tag t with respect to a group G relies not only on the contents of this particular group but also on the contents of the other groups in the collection.

10

𝑓𝑟 (𝑡 ,𝐺 )=¿𝑈 (𝑡 )∨ ¿¿𝐺∨¿¿

¿

t1 t2 t3 t4 t5

t1 t3 t4

t1 t2 t5

t1 t3

t1 t3 t5

u1

u2

u3

u5u4

group G

Page 11: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Based on Frequency• Graph-based scoring

Considering combinations of tags that occur together rather than individual tags may be more informative.

11

Google similarity distancensky=log|U(tsky)|=log 9

nsea=log|U(tsea)|=log 8

nsky,sea=log|U(tsky) U(tsea)|=log 4]668.1exp[]8log13log

4log9logexp[

𝑓 0 (𝑡 )= 𝑓𝑟 (𝑡 ,𝐺 )=¿𝑈 (𝑡)∨ ¿¿𝐺∨¿ ¿

¿

Page 12: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Based on Frequency• Graph-based scoring

12

Page 13: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Based on Diversity• Diversity

To select tags that are as dissimilar as possible from each other, in the sense that appear indifferent sets of objects.

13

t={beach} sim(beach,sea) sim(beach,sky)

t={forest} sim(forest,sea) sim(forest,sky)

sea,sky

Page 14: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Based on Diversity• Novelty

To emphasize on the novelty of newly selected tags, while the cloud is constructed.

14

: discount function

• This function can be defined to return 1 if nv,TG

= 0, and 0 otherwise.

• For example, a tag t appears only in a single object u, and there is already another tag of u in the cloud, then the utility score of t is 0.

sea,sky

sea,beach,sunu

TG

Page 15: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Based on Rank Aggregation• The order in which the tags appear in these objects.• Define a utility function based on the Borda Count method.

15t1 t2 t3 t4 t5

t1 t3 t4

t1 t2 t5

t1 t3

t1 t3 t5

u1

u2

u3

u5u4

group G

𝑓 (𝑡 5 )=

23+23+23

5=0.4

Assume = 0.5

𝑓 (𝑡 1 )=

12+25+25+25+27

5=0.397

Page 16: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Evaluation Methodology• Metrics for Search and Navigation

CoverageOverlapSelectivity

• User Navigation Model

• Group Recommendation Accuracy

16

Page 17: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Metrics for Search and Navigation

• CoverageSince a tag cloud aims at providing an entry point for searching

and navigating.For every object, at least one of its tags should appear in the tag

cloud.

17t1 t2 t3 t4 t5

t1 t3 t4

t1 t2 t5

t1 t3

t1 t3 t5

u1

u2

u3

u5u4

group G

Page 18: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Metrics for Search and Navigation

• Overlap It would like to avoid cases where different tags in the cloud,

when selected, lead to the same or very similar subsets of objects.

18t1 t2 t3 t4 t5

t1 t3 t4

t1 t2 t5

t1 t3

t1 t3 t5

u1

u2

u3

u5u4

group G

Page 19: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Metrics for Search and Navigation

• SelectivityA tag cloud should facilitate users to drill down to specific objects

of interest.

19t1 t2 t3 t4 t5

t1 t3 t4

t1 t2 t5

t1 t3

t1 t3 t5

u1

u2

u3

u5u4

group G

Page 20: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

User Navigation Model• The goal is to measure the total cost for finding an item.

cp : The cost of scanning one page of objects.ct : Set the cost of selecting a tag is equal to scanning np pages,

i.e., ct=npcp

n1 : Tag selections , n2 : Page scans.n0

20

Page 21: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Group Recommendation Accuracy

• They have considered the task of using the tag cloud to find items of interest within a group.

• Another important and common task is to recommend groups for new items.

21t1 t2 t3 t4 t5

t1 t3 t4

t1 t2 t5

t1 t3

t1 t3 t5

u1

u2

u3

u5u4

group G

t2 t5 t6

u

𝑠𝑖𝑚 (𝑢 ,𝐺 )=

10.5 ∙1+13

=29

Assume = 0.5 TG={t2,t4}

Page 22: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Experimental Evaluation

• DatasetTop 60 groups.2000 photos for each group

22

Page 23: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Results

• Coverage, Overlap and Selectivity

Increasing the size of the tag cloud improves the performance of all methods in all metrics.

23

Page 24: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Results

• Navigation Cost

The navigation cost is affected by coverage and selectivity.The navigation cost decreases for all methods as the tag cloud

size increases. 24

Page 25: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Results

• Recommendation Accuracy

The goal is to recommend groups for this photo based on their tag clouds.

25

Page 26: Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1

Conclusions• Methods employing diversification or rank aggregation can

improve the performance of tag clouds with respect to these metrics, compared to the traditional frequency-based ranking.

• There exist several interesting directions for future work, these include extracting semantics of tags and exploiting content-based similarity of objects.

26