Decision-theoretic Clustering of Strategies
Nolan Bard, Deon Nicholas, Csaba Szepesvári, and Michael Bowling
AAAI Spring Symposium 2015on Applied Computational Game Theory
March 23th, 2015
(appearing in AAMAS 2015)
U
VA!
A!
CK"
K"
PQ#
Q#
RJ$
J$
G10!
10!
University of AlbertaComputer Poker Research Group
Motivation
Given: Knowledge of agents/entities
Goal: Maximize utility by exploiting data
Problem: Limited response personalization
• Resource constrained
• Online learning cost
2
Motivation
3
...Portfolio P
Utility
Solution
• Cluster agents/entities into groups
• Tailor responses to the aggregate clusters
• But…
4
One of these things…
5
Rock
Paper Scissors
E2
3
1
One of these things…
6
Rock
Paper Scissors
P
S
RE2
3
1
Objective
7
4
7
2
9
3
5(E
)ntit
ies
(R)esponsesk-element partition of rows:
argmax
P2Partk(E)
X
C2P
max
r2R
X
e2C
u(e, r)
P = {C1, . . . , Ck} 2 Partk(E)
Segmentation Problems
8
4
7
2
9
3
5(E
)ntit
ies
(R)esponsesk-element partition of rows:
argmax
P2Partk(E)
X
C2P
max
r2R
X
e2C
u(e, r)
P = {C1, . . . , Ck} 2 Partk(E)
Cluster based on actionability. [Kleinberg et al.]
Maximum Coverage
9
4
7
2
9
3
5(E
)ntit
ies
(R)esponsesk-element subset of columns:
argmax
R0✓R|R0|=k
X
e2E
max
r2R0u(e, r)
4
7
2
9
3
5
(E)n
titie
s
(R)esponsesSegmentation Problems
10
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
argmax
R0✓R|R0|=k
X
e2E
max
r2R0u(e, r)
argmax
P2Partk(E)
X
C2P
max
r2R
X
e2C
u(e, r)
Segmentation Problems
11
argmax
R0✓R|R0|=k
X
e2E
max
r2R0u(e, r)
argmax
P2Partk(E)
X
C2P
max
r2R
X
e2C
u(e, r)
argmax
le1 ,...,lem|S
e2E le|=k
X
e2E
u(e, rle)
Segmentation Problems
12
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
O(|E||R|)
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
23
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
Segmentation Problems
O(k|E|)
Maximum Coverage
48
4
7
2
9
3
5
(E)n
titie
s
(R)esponses
argmax
R0✓R|R0|=k
X
e2E
max
r2R0u(e, r)
Exact: NP-hard
Approximation[Nemhauser et al.]
• Greedy submodular• -approximation• Complexity:
(1� 1/e)
O(k|E||R|)
Nemhauser’s Greedy
1 2 5
5 4 1
3 4 1
(E)n
titie
s
(R)esponses
argmax
R0✓R|R0|=k
X
e2E
max
r2R0u(e, r)
Nemhauser’s Greedy
1 2 5
5 4 1
3 4 1
(E)n
titie
s
(R)esponses
X
e2E
max
r2R0u(e, r)
0
0
0
0
R’{}
Nemhauser’s Greedy
1 2 5
5 4 1
3 4 1
(E)n
titie
s
(R)esponses
X
e2E
max
r2R0u(e, r)
0
0
0
0
1
5
3
9
MarginalGain
R’{}
Nemhauser’s Greedy
1 2 5
5 4 1
3 4 1
(E)n
titie
s
(R)esponses
X
e2E
max
r2R0u(e, r)
0
0
0
0
2
4
4
10
MarginalGain
R’{}
Nemhauser’s Greedy
1 2 5
5 4 1
3 4 1
(E)n
titie
s
(R)esponses
X
e2E
max
r2R0u(e, r)
0
0
0
0
5
1
1
7
MarginalGain
R’{}
Nemhauser’s Greedy
1 2 5
5 4 1
3 4 1
(E)n
titie
s
(R)esponses
X
e2E
max
r2R0u(e, r)
2
4
4
10
R’{r2}
Nemhauser’s Greedy
1 2 5
5 4 1
3 4 1
(E)n
titie
s
(R)esponses
X
e2E
max
r2R0u(e, r)
2
4
4
10
R’{r2}
2
5
4
1
MarginalGain
Nemhauser’s Greedy
1 2 5
5 4 1
3 4 1
(E)n
titie
s
(R)esponses
X
e2E
max
r2R0u(e, r)
2
4
4
10
R’{r2}
5
4
4
3
MarginalGain
Nemhauser’s Greedy
1 2 5
5 4 1
3 4 1
(E)n
titie
s
(R)esponses
X
e2E
max
r2R0u(e, r)
5
4
4
13
R’{r2,r3}
Problem
Infinitely/exponentially large response space?
• Nemhauser et al.’s greedy is infeasible
58
Structured Utility
• May be able to exploit structure in utility
59
argmax
P2Partk(E)
X
C2P
max
r2R
X
e2C
u(e, r)
Structured Utility
• May be able to exploit structure in utility
60
argmax
P2Partk(E)
X
C2P
max
r2R
X
e2C
u(e, r)
f(C) ⌘ argmax
r2R
X
e2C
u(e, r)
Response Oracle
61
f(C) ⌘ argmax
r2R
X
e2C
u(e, r)
Example: Sequence-form games
• # Responses: at least exponential in infosets
• Best response: linear in infosets
Greedy Heuristic
Greedy agglomerative (“bottom up”) clustering
62
Initialize: singletonsRock
Paper Scissors
2
3
1
Greedy Heuristic
Greedy agglomerative (“bottom up”) clustering
63
Initialize: singletons
Clustering Loss
Iteration: merge with min marginal loss
Ci, Cj 2 P
X
e2E
max
r⇤2Ru(e, r⇤)�
X
C2P
max
r2R
X
e2C
u(e, r)
Rock
Paper Scissors
2
3
1
Greedy Heuristic
Greedy agglomerative (“bottom up”) clustering
64
Rock
Paper Scissors
2
3
1
Initialize: singletons
Clustering Loss
Iteration: merge with min marginal loss
Ci, Cj 2 P
X
e2E
max
r⇤2Ru(e, r⇤)�
X
C2P
max
r2R
X
e2C
u(e, r)
Greedy Heuristic
Greedy agglomerative (“bottom up”) clustering
• oracle calls (using memoization)
• Feasible given efficient oracle
• k not needed in advance
• Lazy evaluations and parallelizable
65
O(|E|2)
Results
66
Worst-case Approximation Bounds
67
u(Gk) � max
✓1
k,k
m
◆u⇤k � 1p
mu⇤k
u(Gk)� " 2pmu⇤k
Lower:
Upper:
Experimental Design
• Sampled 200 static strategies uniformly
• Compared to k-means/Lloyd’s
• k-means++ seeding [Arthur and Vassilvitskii]
• Feature vectors: sequence-form
• 50 random restarts
68
Qualitative: Kuhn Poker
69
K-means
70
0.0 0.2 0.4 0.6 0.8 1.0
⌘
0.0
0.2
0.4
0.6
0.8
1.0⇠
Greedy
71
0.0 0.2 0.4 0.6 0.8 1.0
⌘
0.0
0.2
0.4
0.6
0.8
1.0⇠
Quantitative
72
Kuhn Poker
73
2 4 6 8 10 12k (number of clusters)
0
10
20
30
40
50
60
Mea
n lo
ss v
s. s
ingl
eton
resp
onse
s (m
bb/g
)
Greedyk-meansOptimal
Leduc Hold’em
74
10 20 30 40 50 60k (number of clusters)
0
50
100
150
200
250
300
350
Mea
n lo
ss v
s. s
ingl
eton
resp
onse
s (m
bb/g
)
Greedyk-means
Questions?75
U
VA!
A!
CK"
K"
PQ#
Q#
RJ$
J$
G10!
10!
University of AlbertaComputer Poker Research Group
poker.cs.ualberta.ca
Contact: [email protected]