Portfolios of Artificial Intelligences
+ playing with random seeds
1. What is a portfolio2. Offline portfolio3. Online portfolio4. Mathematics (sorry)5. Experiments
J.-B. Hoock, D. L. St-Pierre, O. Teytaud
Portfolio
I have K algorithms for solving a given task:Mcts
Alpha-Beta
Parametric script
Nested MC
I want to choose the best one
Two frameworks
OfflineI do some work before the competition
I combine all my algorithms into 1
Simple version:Compute some probability vector p
For each game, use Algo(i) with probability p(i)
OnlineFor each game, Use Algo(i) with probability p(i)
Update p when the game is over
1. What is a portfolio2. Offline portfolio3. Online portfolio4. Mathematics (sorry)5. Experiments
Offline Nash portfolio
K algorithms for black BAI(1),..., BAI(K)
K' algorithms for white WAI(1),...,WAI(K')
Def: Mij=proba( BAI(i) beats WAI(j) )
Define (p,q) = Nash equilibrium of Mp = best stochastic portfolio for Black (Nash sense)
q = best stochastic portfolio for White (Nash sense)
Portfolio:Black: Play BAI(i) with probability p(i)
White: Play WAI(j) with probability q(j)
Other offline portfolios
K algorithms for black BAI(1),..., BAI(K)
K' algorithms for white WAI(1),...,WAI(K)
Definitions:Uniform portfolio: p(i) = 1/K q(j)=1/K'
Fixed seed: p(i)=1, q(j)=1 for some i,j
Best arm: fixed seed with i best row / j best column
Portfolio:Black: Play BAI(i) with probability p(i)
White: Play WAI(j) with probability q(j)
1. What is a portfolio2. Offline portfolio3. Online portfolio4. Mathematics (sorry)5. Experiments
Online portfolio (for Black)
Just apply UCBT (or your favorite bandit)
Before playing a game:p(i) = frequency of win for BAI(i)
n(i) =number of times BAI(i) was used
N= sum of the n(i)
sc(i)= p(i) + Clog(N)/n(i) +C' sqrt( p(i)(1-p(i)) log(N) /n(i) )
choose i* maximizing sc(i*)
Play with BAI(i*)
1. What is a portfolio2. Offline portfolio3. Online portfolio4. Mathematics (sorry)5. Experiments
Nash
Computed exactly in polynomial time.
with precision e in expected time O( (K+K') log (K+K') 2 / e 2 )
The best portfolio in terms of
Worst case winrate against the WAI(i)
Worst case winrate against WAI(i) for i ~ some probability distribution
UCBT for Black
Nearly zero computational overhead
Asymptotically optimal winning rate against a stationary opponent, among the BAI(i)
We did not try discounted Ucb
1. What is a portfolio2. Offline portfolio3. Online portfolio4. Mathematics (sorry)5. Experimentson 9x9 Go
First portfolio: random seeds
Pick up a stochastic algorithm
Choose K random seeds
You get K algorithms
Hint: the random seed has a significant impact.Yes, it's by rote learning (kind of opening book).
Performance of Nash portfolio
(learnt offline), in generalization
Againstnew seeds
Vs uniform==> this means we outperform the default version(which is randomized seeds).
Portfolios are herea distribution on random seeds.
We get an improved algorithm(winning rate 66%) justwith that.
Performance of Nash portfolio
(learnt offline), in generalization
Againstnew seeds
Vs uniform:==> this means we outperform the default version(which is randomized seeds)
Portfolios are herea distribution on random seeds.
We get an improved algorithm(winning rate 66%) justwith that.
X-axis = K = K'
Remarks
Nash portfolio good
Best Arm seed very good
But we will see that best arm has weaknesses ==> it can be overfitted i.e.easily beaten by a learning opponent.
UCBT cruches fixedSeed and wins against uniform
X-axis = log2 (nb of games)(max. 512 games)Dots decreasingto 0.
Fixed seeds (deterministicalgorithms)are overfittedafter 64 games
UCBT cruches fixedSeed and wins against uniform
Dots decreasingto 0.
Fixed seeds (deterministicalgorithms)are overfittedafter 64 games
X-axis = log2 (nb of games)(max. 512 games)
Other experiments: variants of some algorithm
Gnugo with options (32 variants)
Nash-portfolio or UCBT portfolio: only a few percents of improvements over a single ad hoc variant.
==> less impressive than with random seeds
Conclusions
Nice applicationfor Nash-portfolio:Choose a stochastic algorithm
Build a matrix M of games randomSeed vs randomSeed
Compute the Nash equilibrium
You get a new probability distribution on random seeds
It should be strong than the original algorithm.
Nice application for UCBT-portfolioPlay against it
As long as you lose, it will keep the same line of play
Conclusions
Further workBetter Nash approximation
Increase fun (should Ucbt explore more or less? discount ?)
Bigger experiments (bigger games ? 19x19?)
Comments? We forgot to cite your paper? We did not try on your favorite game?Our results are bullshit? Please tell us:-)