item selection by “hub-authority” profit ranking presented by: thomas su
Post on 19-Dec-2015
222 views
TRANSCRIPT
Item Selection By “Hub-Authority” Profit Ranking
Presented by: Thomas Su
Agenda
• Introduction
• Overview of HITS
• Our Approach – Hub-Authority Profit Ranking
• Estimate Profitability
• Empirical Study
• Experiment Results
Introduction
• Difficulty of Item Selection/Ranking problem – the “cross-selling effect”.
• Size-constrained selection• Cost-constrained selection
# of items selected
Optimal cutoff
Estimated profit
Selection cost
Web Page Ranking Algorithm – HITS (Hyperlink-Induced Topic Search)
• Ranking the relevance of web pages on a given topic
• The Mutually Reinforcing Relationship– Hub pages– Authority Pages
• Started by finding a set of candidates for a given topic
Computing Hub/Authority Weights for web pages
• Each page i is associated with a non-negative hub weight h(i) and authority weight a(i)
• Before each iteration, the weights are normalized so that a(i)2 = 1 and h(i)2 = 1
• For each iteration, the weights are updated as follows:
– a(i) = all page j that have a link to i h(j)
– h(i) = all page j such that i have a link to j a(j)
• The iteration continues until a(i) and h(i) converge to stable values.
Hub-Authority Profit Ranking• Find frequent items and 2-itemsets• Each item is a node in the graph• Conf(i j) forms the link between item i and j
– The larger the conf(i j), the stronger the link– Every frequent item i has a link to itself since conf(i
i) = 100%
• Item j is a good authority if it is necessary for many other items i.
• Item i is a good hub if it implies buying many other items j.
Model the Individual Profit
• Individual profit – the recorded profit of item i in all transactions.
• Can’t treat individual profit as the initial authority weights – since the convergence is independent of the initial
weights.
Model the Individual Profit (con’t)
• Solution: incorporate the individual profit into links.
• For each iteration,– Updating authority weights:
a(i) = j i prof(j) conf(j,i) h(j)
– Updating hub weights:
h(i) = i j prof(i) conf(i,j) a(i)
Computing the weights- Iterative Algorithm
• Initialize hub/authority weights to (1, 1, 1,..,1)• For i = 1, 2, .., k
– Update the authority weights
– Update the hub weights
– Normalize authority weights
– Normalize hub weights
• Return hub/authority weights
Example
• Suppose we have frequent items, X, Y, and Z– prof(X) = $5
– prof(Y) = $1
– prof(Z) = $0.1
– conf(X Y) = 0.2 conf(Y X) = 0.06
– conf(X Z) = 0.8 conf(Z X) = 0.2
– conf(Y Z) = 0.5 conf(Z Y) = 0.375
Example (con’t)
X Y
Zprof(X) conf(XX) = 5.0
prof(Y) conf(YX) = 0.06
prof(X) conf(XY) = 1.0
Example (con’t) – updating a(X)
X Y
Z
a(X) = prof(X) conf(XX) h(X) +
prof(Y) conf(YX) h(Y) +
prof(Z) conf(ZX) h(Z)
Example (con’t)
• prof(X) = $5, prof(Y) = $1, prof(Z) = $0.1• a(X) = 0.767, a(Y) = 0.166, a(Z) = 0.620
• Ranking based on authority weights is different from individual profit– The cross-selling effect increase the importance of Z
Estimate Profitability
• Estimate the profit generated by the selected items.
• Consider the transaction (A, B, C)– All items are selected
– None of the items are selected
– Only some of the items are selected.• Compute the loss based on the statistics
Empirical Study
Drug Store Synthetic
Transaction # 193,995 10,000
Item # 26,128 1,000
Total profit $1,006,970 $317,579
minsupp 0.1% 0.05% 0.5% 0.1%
Freq. items 332 999 602 879
Freq. pairs 39 115 15 11322
Experiment Results
drug store dataset minssup=0.1%
28000
30000
32000
34000
36000
38000
40000
100 150 200 250 300 325
# of selected items
estim
ated
pro
fit
HAP
PROFSET
Naïve
*PROFSET[4]
Experiment Results (con’t)
Drgu Store dataset min_sup=0.05%
90000
110000
130000
150000
170000
190000
300 400 500 600 700 800 900
# of selected items
estim
ated
pro
fit
Hub-authority
PROFSET
Naïve
Experiment Results (con’t)
synathic dataset minssup=0.5%
0
20000
40000
60000
80000
100000
120000
140000
100 200 300 400 500 600
# of selected items
estim
ated
pro
fit
HAP
PROFSET
Naive
Experiment Results (con’t)
synathic dataset minssup=0.1%
0
50000
100000
150000
200000
250000
300000
100 200 300 400 500 600 700 800 850
# of selected items
estim
ated
pro
fit
HAP
PROFSET
Naive
References
• [1] R. Agrawal. IBM synthetic data generator. In http://www.almaden.ibm.com/cs/quest/Syndata.html#assocSynData. IBM
• [2] R. Agrawal, T. Imielinski, and A.N. Swami. “Mining association rules between sets of items in large database.” In SIGMOD pp. 207-216, 1993.
• [3] R. Agrawal and R. Srikant. “Fast algoritm for mining association rules.” In VLDB, pp. 487-499, September 1994.
• [4] T. Brijs, B. Goethals, G. Swinnen, K. Vanhoof, and G. Wets, A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model..In ACM SIGKDD, August 2000.
References
• [5] T. Brijs, G. Swinnen, K.Vanhoof, and G. Wets, “Using Association Rules for Product Assortment Decisions: A Case Study.” In KDD-99, page 254-260, August 1999.
• [6] ILOG CPLEX: http://www.ilog.com/products/cplex/
• [7] G. Golub and C. F. V. Loan. Matrix computations. Johns Hopkins University Press, 1989.
• [8] J. M. Kleinberg. Authoritative source in a hyperlink environment. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, page 668-677. ACM, 1998.
• [9] MINILAB: http://www.minitab.com/