Download - Q J K A U V C P R G K A J Decision-theoretic …teamcore.usc.edu/people/feifang/AAAISS15/AgentClustering.pdf · Decision-theoretic Clustering of Strategies Nolan Bard, Deon Nicholas,

Decision-theoretic Clustering of Strategies

Nolan Bard, Deon Nicholas, Csaba Szepesvári, and Michael Bowling

AAAI Spring Symposium 2015on Applied Computational Game Theory

March 23th, 2015

(appearing in AAMAS 2015)

U

VA!

A!

CK"

K"

PQ#

Q#

RJ$

J$

G10!

10!

University of AlbertaComputer Poker Research Group

Motivation

Given: Knowledge of agents/entities

Goal: Maximize utility by exploiting data

Problem: Limited response personalization

• Resource constrained

• Online learning cost

2

Motivation

3

...Portfolio P

Utility

Solution

• Cluster agents/entities into groups

• Tailor responses to the aggregate clusters

• But…

4

One of these things…

5

Rock

Paper Scissors

E2

3

1

One of these things…

6

Rock

Paper Scissors

P

S

RE2

3

1

Objective

7

4

7

2

9

3

5(E

)ntit

ies

(R)esponsesk-element partition of rows:

argmax

P2Partk(E)

X

C2P

max

r2R

X

e2C

u(e, r)

P = {C1, . . . , Ck} 2 Partk(E)

Segmentation Problems

8

4

7

2

9

3

5(E

)ntit

ies

(R)esponsesk-element partition of rows:

argmax

P2Partk(E)

X

C2P

max

r2R

X

e2C

u(e, r)

P = {C1, . . . , Ck} 2 Partk(E)

Cluster based on actionability. [Kleinberg et al.]

Maximum Coverage

9

4

7

2

9

3

5(E

)ntit

ies

(R)esponsesk-element subset of columns:

argmax

R0✓R|R0|=k

X

e2E

max

r2R0u(e, r)

4

7

2

9

3

5

(E)n

titie

s

(R)esponsesSegmentation Problems

10

4

7

2

9

3

5

(E)n

titie

s

(R)esponses

argmax

R0✓R|R0|=k

X

e2E

max

r2R0u(e, r)

argmax

P2Partk(E)

X

C2P

max

r2R

X

e2C

u(e, r)


11

argmax

R0✓R|R0|=k

X

e2E

max

r2R0u(e, r)

argmax

P2Partk(E)

X

C2P

max

r2R

X

e2C

u(e, r)

argmax

le1 ,...,lem|S

e2E le|=k

X

e2E

u(e, rle)


12

4

7

2

9

3

5

(E)n

titie

s

(R)esponses


4

7

2

9

3

5

(E)n

titie

s

(R)esponses


4

7

2

9

3

5

(E)n

titie

s

(R)esponses

O(|E||R|)

4

7

2

9

3

5

(E)n

titie

s

(R)esponses


23

4

7

2

9

3

5

(E)n

titie

s

(R)esponses


4

7

2

9

3

5

(E)n

titie

s

(R)esponses


O(k|E|)

Maximum Coverage

48

4

7

2

9

3

5

(E)n

titie

s

(R)esponses

argmax

R0✓R|R0|=k

X

e2E

max

r2R0u(e, r)

Exact: NP-hard

Approximation[Nemhauser et al.]

• Greedy submodular• -approximation• Complexity:

(1� 1/e)

O(k|E||R|)

Nemhauser’s Greedy

1 2 5

5 4 1

3 4 1

(E)n

titie

s

(R)esponses

argmax

R0✓R|R0|=k

X

e2E

max

r2R0u(e, r)


1 2 5

5 4 1

3 4 1

(E)n

titie

s

(R)esponses

X

e2E

max

r2R0u(e, r)

0

0

0

0

R’{}


1 2 5

5 4 1

3 4 1

(E)n

titie

s

(R)esponses

X

e2E

max

r2R0u(e, r)

0

0

0

0

1

5

3

9

MarginalGain

R’{}


1 2 5

5 4 1

3 4 1

(E)n

titie

s

(R)esponses

X

e2E

max

r2R0u(e, r)

0

0

0

0

2

4

4

10

MarginalGain

R’{}


1 2 5

5 4 1

3 4 1

(E)n

titie

s

(R)esponses

X

e2E

max

r2R0u(e, r)

0

0

0

0

5

1

1

7

MarginalGain

R’{}


1 2 5

5 4 1

3 4 1

(E)n

titie

s

(R)esponses

X

e2E

max

r2R0u(e, r)

2

4

4

10

R’{r2}


1 2 5

5 4 1

3 4 1

(E)n

titie

s

(R)esponses

X

e2E

max

r2R0u(e, r)

2

4

4

10

R’{r2}

2

5

4

1

MarginalGain


1 2 5

5 4 1

3 4 1

(E)n

titie

s

(R)esponses

X

e2E

max

r2R0u(e, r)

2

4

4

10

R’{r2}

5

4

4

3

MarginalGain


1 2 5

5 4 1

3 4 1

(E)n

titie

s

(R)esponses

X

e2E

max

r2R0u(e, r)

5

4

4

13

R’{r2,r3}

Problem

Infinitely/exponentially large response space?

• Nemhauser et al.’s greedy is infeasible

58

Structured Utility

• May be able to exploit structure in utility

59

argmax

P2Partk(E)

X

C2P

max

r2R

X

e2C

u(e, r)

Structured Utility

• May be able to exploit structure in utility

60

argmax

P2Partk(E)

X

C2P

max

r2R

X

e2C

u(e, r)

f(C) ⌘ argmax

r2R

X

e2C

u(e, r)

Response Oracle

61

f(C) ⌘ argmax

r2R

X

e2C

u(e, r)

Example: Sequence-form games

• # Responses: at least exponential in infosets

• Best response: linear in infosets

Greedy Heuristic

Greedy agglomerative (“bottom up”) clustering

62

Initialize: singletonsRock

Paper Scissors

2

3

1

Greedy Heuristic


63

Initialize: singletons

Clustering Loss

Iteration: merge with min marginal loss

Ci, Cj 2 P

X

e2E

max

r⇤2Ru(e, r⇤)�

X

C2P

max

r2R

X

e2C

u(e, r)

Rock

Paper Scissors

2

3

1

Greedy Heuristic


64

Rock

Paper Scissors

2

3

1

Initialize: singletons

Clustering Loss

Iteration: merge with min marginal loss

Ci, Cj 2 P

X

e2E

max

r⇤2Ru(e, r⇤)�

X

C2P

max

r2R

X

e2C

u(e, r)

Greedy Heuristic


• oracle calls (using memoization)

• Feasible given efficient oracle

• k not needed in advance

• Lazy evaluations and parallelizable

65

O(|E|2)

Results

66

Worst-case Approximation Bounds

67

u(Gk) � max

✓1

k,k

m

◆u⇤k � 1p

mu⇤k

u(Gk)� " 2pmu⇤k

Lower:

Upper:

Experimental Design

• Sampled 200 static strategies uniformly

• Compared to k-means/Lloyd’s

• k-means++ seeding [Arthur and Vassilvitskii]

• Feature vectors: sequence-form

• 50 random restarts

68

Qualitative: Kuhn Poker

69

K-means

70

0.0 0.2 0.4 0.6 0.8 1.0

⌘

0.0

0.2

0.4

0.6

0.8

1.0⇠

Greedy

71

0.0 0.2 0.4 0.6 0.8 1.0

⌘

0.0

0.2

0.4

0.6

0.8

1.0⇠

Quantitative

72

Kuhn Poker

73

2 4 6 8 10 12k (number of clusters)

0

10

20

30

40

50

60

Mea

n lo

ss v

s. s

ingl

eton

resp

onse

s (m

bb/g

)

Greedyk-meansOptimal

Leduc Hold’em

74

10 20 30 40 50 60k (number of clusters)

0

50

100

150

200

250

300

350

Mea

n lo

ss v

s. s

ingl

eton

resp

onse

s (m

bb/g

)

Greedyk-means

Questions?75

U

VA!

A!

CK"

K"

PQ#

Q#

RJ$

J$

G10!

10!

University of AlbertaComputer Poker Research Group

poker.cs.ualberta.ca

Contact: [email protected]

mailto:[email protected]

Download - Q J K A U V C P R G K A J Decision-theoretic …teamcore.usc.edu/people/feifang/AAAISS15/AgentClustering.pdf · Decision-theoretic Clustering of Strategies Nolan Bard, Deon Nicholas,

Top Related