item selection by “hub-authority” profit ranking presented by: thomas su

Item Selection By “Hub-Authority” Profit Ranking

Presented by: Thomas Su

Agenda

• Introduction

• Overview of HITS

• Our Approach – Hub-Authority Profit Ranking

• Estimate Profitability

• Empirical Study

• Experiment Results

Introduction

• Difficulty of Item Selection/Ranking problem – the “cross-selling effect”.

• Size-constrained selection• Cost-constrained selection

# of items selected

Optimal cutoff

Estimated profit

Selection cost

Web Page Ranking Algorithm – HITS (Hyperlink-Induced Topic Search)

• Ranking the relevance of web pages on a given topic

• The Mutually Reinforcing Relationship– Hub pages– Authority Pages

• Started by finding a set of candidates for a given topic

Computing Hub/Authority Weights for web pages

• Each page i is associated with a non-negative hub weight h(i) and authority weight a(i)

• Before each iteration, the weights are normalized so that a(i)2 = 1 and h(i)2 = 1

• For each iteration, the weights are updated as follows:

– a(i) = all page j that have a link to i h(j)

– h(i) = all page j such that i have a link to j a(j)

• The iteration continues until a(i) and h(i) converge to stable values.

Hub-Authority Profit Ranking• Find frequent items and 2-itemsets• Each item is a node in the graph• Conf(i j) forms the link between item i and j

– The larger the conf(i j), the stronger the link– Every frequent item i has a link to itself since conf(i

i) = 100%

• Item j is a good authority if it is necessary for many other items i.

• Item i is a good hub if it implies buying many other items j.

Model the Individual Profit

• Individual profit – the recorded profit of item i in all transactions.

• Can’t treat individual profit as the initial authority weights – since the convergence is independent of the initial

weights.

Model the Individual Profit (con’t)

• Solution: incorporate the individual profit into links.

• For each iteration,– Updating authority weights:

a(i) = j i prof(j) conf(j,i) h(j)

– Updating hub weights:

h(i) = i j prof(i) conf(i,j) a(i)

Computing the weights- Iterative Algorithm

• Initialize hub/authority weights to (1, 1, 1,..,1)• For i = 1, 2, .., k

– Update the authority weights

– Update the hub weights

– Normalize authority weights

– Normalize hub weights

• Return hub/authority weights

Example

• Suppose we have frequent items, X, Y, and Z– prof(X) = $5

– prof(Y) = $1

– prof(Z) = $0.1

– conf(X Y) = 0.2 conf(Y X) = 0.06

– conf(X Z) = 0.8 conf(Z X) = 0.2

– conf(Y Z) = 0.5 conf(Z Y) = 0.375

Example (con’t)

X Y

Zprof(X) conf(XX) = 5.0

prof(Y) conf(YX) = 0.06

prof(X) conf(XY) = 1.0

Example (con’t) – updating a(X)

X Y

Z

a(X) = prof(X) conf(XX) h(X) +

prof(Y) conf(YX) h(Y) +

prof(Z) conf(ZX) h(Z)

Example (con’t)

• prof(X) = $5, prof(Y) = $1, prof(Z) = $0.1• a(X) = 0.767, a(Y) = 0.166, a(Z) = 0.620

• Ranking based on authority weights is different from individual profit– The cross-selling effect increase the importance of Z

Estimate Profitability

• Estimate the profit generated by the selected items.

• Consider the transaction (A, B, C)– All items are selected

– None of the items are selected

– Only some of the items are selected.• Compute the loss based on the statistics

Empirical Study

Drug Store Synthetic

Transaction # 193,995 10,000

Item # 26,128 1,000

Total profit $1,006,970 $317,579

minsupp 0.1% 0.05% 0.5% 0.1%

Freq. items 332 999 602 879

Freq. pairs 39 115 15 11322

Experiment Results

drug store dataset minssup=0.1%

28000

30000

32000

34000

36000

38000

40000

100 150 200 250 300 325

# of selected items

estim

ated

pro

fit

HAP

PROFSET

Naïve

*PROFSET[4]

Experiment Results (con’t)

Drgu Store dataset min_sup=0.05%

90000

110000

130000

150000

170000

190000

300 400 500 600 700 800 900

# of selected items

estim

ated

pro

fit

Hub-authority

PROFSET

Naïve


synathic dataset minssup=0.5%

0

20000

40000

60000

80000

100000

120000

140000

100 200 300 400 500 600

# of selected items

estim

ated

pro

fit

HAP

PROFSET

Naive


synathic dataset minssup=0.1%

0

50000

100000

150000

200000

250000

300000

100 200 300 400 500 600 700 800 850

# of selected items

estim

ated

pro

fit

HAP

PROFSET

Naive

References

• [1] R. Agrawal. IBM synthetic data generator. In http://www.almaden.ibm.com/cs/quest/Syndata.html#assocSynData. IBM

• [2] R. Agrawal, T. Imielinski, and A.N. Swami. “Mining association rules between sets of items in large database.” In SIGMOD pp. 207-216, 1993.

• [3] R. Agrawal and R. Srikant. “Fast algoritm for mining association rules.” In VLDB, pp. 487-499, September 1994.

• [4] T. Brijs, B. Goethals, G. Swinnen, K. Vanhoof, and G. Wets, A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model..In ACM SIGKDD, August 2000.

http://www.almaden.ibm.com/cs/quest/Syndata.html#assocSynData

References

• [5] T. Brijs, G. Swinnen, K.Vanhoof, and G. Wets, “Using Association Rules for Product Assortment Decisions: A Case Study.” In KDD-99, page 254-260, August 1999.

• [6] ILOG CPLEX: http://www.ilog.com/products/cplex/

• [7] G. Golub and C. F. V. Loan. Matrix computations. Johns Hopkins University Press, 1989.

• [8] J. M. Kleinberg. Authoritative source in a hyperlink environment. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, page 668-677. ACM, 1998.

• [9] MINILAB: http://www.minitab.com/

http://www.ilog.com/products/cplex/

item selection by “hub-authority” profit ranking presented by: thomas su

Documents

hub weights

j ai slide

item j

initial authority weights

initial weights

individual profit individual

frequent item i

page i