nesterov’s excessive gap technique and poker andrew gilpin cmu theory lunch feb 28, 2007 joint...
Post on 21-Dec-2015
217 views
TRANSCRIPT
![Page 1: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/1.jpg)
Nesterov’s excessive gap technique and poker
Andrew GilpinCMU Theory Lunch
Feb 28, 2007
Joint work with:Samid Hoda, Javier Peña, Troels Sørensen, Tuomas Sandholm
![Page 2: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/2.jpg)
Outline
• Two-person zero-sum sequential games
• First-order methods for convex optimization
• Nesterov’s excessive gap technique (EGT)
• EGT for sequential games
• Heuristics for EGT
• Application to Texas Hold’em poker
![Page 3: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/3.jpg)
We want to solve:
If Q1 and Q2 are simplices, this is the Nash equilibrium problem for two-person zero-sum matrix games
If Q1 and Q2 are complexes, this is the Nash equilibrium problem for two-person zero-sum sequential games
![Page 4: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/4.jpg)
What’s a complex?
It’s just like a simplex, but more complex.
Each player’s complex encodes her set ofrealization plans in the game
In particular, player 1’s complex is
where E and e depend on the game…
![Page 5: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/5.jpg)
A B C D E F G H
![Page 6: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/6.jpg)
Recall our problem:
where Q1 and Q2 are complexes
Since Q1 and Q2 have a linear description,this problem can be solved as an LP. However,current LP solution methods do not scale
![Page 7: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/7.jpg)
(Un)scalability of LP solvers
• Rhode Island poker [Shi & Littman 01]– LP has 91 million rows and columns– Applying GameShrink automated abstraction algorithm yields an
LP with only 1.2 million rows and columns, and 50 million non-zeros [G. & Sandholm, 06a]
– Solution requires 25 GB RAM and over a week of CPU time
• Texas Hold’em poker– ~1018 nodes in game tree– Lossy abstractions need to be performed– Limitations of current solver technology primary limitation
to achieving expert-level strategies [G. & Sandholm 06b, 07a]
• Instead of standard LP solvers, what about a first-order method?
![Page 8: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/8.jpg)
Convex optimization
Suppose we want to solve
where f is convex.
For general f, convergence requires O(1/ε2) iterations(e.g., for subgradient methods)
For smooth, strongly convex f with Lipschitz-continuous gradient, can be done in O(1/ε½) iterations
Note that this formulation capturesALL convex optimization problems(can model feasible space using anindicator function)
Analysis based on black-box oracleaccess model. Can we do better bylooking inside the box?
![Page 9: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/9.jpg)
Strong convexity
A function is strongly convex if there exists such that
for all and all
is the strong convexity parameter of d
![Page 10: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/10.jpg)
Recall our problem:
where Q1 and Q2 are complexes
Equivalently:
where
and
![Page 11: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/11.jpg)
, ,
Unfortunately, Φ and f are non-smooth
Fortunately, they have a special structure
Let d1,d2 be smooth and strongly convex on Q1,Q2
These are called prox-functions
Now let μ > 0 and consider:
These are well-defined smooth functions
![Page 12: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/12.jpg)
Excessive gap condition
From weak duality, we have that f(y) ≤ Φ(x)
The excessive gap condition requires that
fμ(y) ≤ Φμ(x) (EGC)
The algorithm maintains (EGC), and gradually decreases μ
As μ decreases, the smoothed functions approach thenon-smooth functions, and thus iterates satisfying (EGC)converge to optimal solutions
![Page 13: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/13.jpg)
Nesterov’s main theorem
Theorem [Nesterov 05]There exists an algorithm such that after at most N iterations, the iterates have duality gap at most
Furthermore, each iteration only requires solving three problems of the form
and performing three matrix-vector product operations on A.
![Page 14: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/14.jpg)
Nice prox functions
A prox function d for Q is nice if it is:1. Strongly convex continuous everywhere in Q,
and differentiable in the relative interior of Q
2. The min of d over Q is 0
3. The following maps are easily computable:
![Page 15: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/15.jpg)
Nice simplex prox function 1: Entropy
![Page 16: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/16.jpg)
Nice simplex prox function 2: Euclidean
sargmax can be computed in O(n log n) time
![Page 17: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/17.jpg)
From the simplex to the complex
Theorem [Hoda, G., Peña 06]
A nice prox function can be constructed for
the complex via a recursive application of
any nice prox function for the simplex
![Page 18: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/18.jpg)
Prox function example
Let be any nice simplex prox function.The prox function for this matrix is:
![Page 19: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/19.jpg)
Solving
![Page 20: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/20.jpg)
(similar to b(i-vii))
![Page 21: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/21.jpg)
Heuristics [G., Hoda, Peña, Sandholm 07]
• Heuristic 1: Aggressive μ reduction– The μ given in the previous algorithm is a
conservative choice guaranteeing convergence– In practice, we can do much better by aggressively
pushing μ, while checking that the excessive gap condition is satisfied
• Heuristic 2: Balanced μ reduction– To prevent one μ from dominating the other, we also
perform periodic adjustments to keep them within a small factor of one another
![Page 22: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/22.jpg)
Matrix-vector multiplication in poker[G., Hoda, Peña, Sandholm 07]
• The main time and space bottleneck of the algorithm is the matrix-vector product on A
• Instead of storing the entire matrix, we can represent it as a composition of Kronecker products
• We can also effectively take advantage of parallelization in the matrix-vector product to achieve near-linear speedup
![Page 23: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/23.jpg)
Memory usage comparison
Instance CPLEX IPM CPLEX Simplex EGT
10k 0.082 GB >0.051 GB 0.012 GB
160k 2.25 GB >0.664 GB 0.035 GB
RI 25.2 GB >3.45 GB 0.15 GB
Texas >458 GB >458 GB 2.49 GB
![Page 24: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/24.jpg)
Poker
• Poker is a recognized challenge problem in AI because (among other reasons)– the other players’ cards are hidden;– bluffing and other deceptive strategies are needed in
a good player;– there is uncertainty about future events.
• Texas Hold’em: most popular variant of poker• Two-player game tree has ~1018 nodes
![Page 25: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/25.jpg)
Potential-aware automated abstraction[G., Sandholm, Sørensen 07]
• Most prior automated abstraction algorithms employ a myopic expected value computation as a similarity metric– This ignores hands like flush draws where although the
probability of winning is small, the payoff could be high
• Our newest algorithm considers higher-dimensional spaces consisting of histograms over abstracted classes of states from later stages of the game
• This enables our bottom-up abstraction algorithm to automatically take into account positive and negative potential
![Page 26: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/26.jpg)
Solving the four-round model
• Computed abstraction with– 20 first-round buckets– 800 second-round buckets– 4800 third-round buckets– 28800 fourth-round buckets
• Algorithm using 30 GB RAM– Simply representing as an LP requires 32 TB– Outputs new, improved solution every 2.5 days
![Page 27: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/27.jpg)
[G., Sandholm, Sørensen 07]
![Page 28: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/28.jpg)
[G., Sandholm, Sørensen 07]
![Page 29: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/29.jpg)
[G., Sandholm, Sørensen 07]
![Page 30: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/30.jpg)
Future research
• Customizing second-order (e.g. interior-point methods) for the equilibrium problem
• Additional heuristics for improving practical performance of EGT algorithm
• Techniques for finding an optimal solution from an ε-solution
![Page 31: Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas](https://reader035.vdocuments.net/reader035/viewer/2022062421/56649d645503460f94a46c50/html5/thumbnails/31.jpg)
Thank you ☺