building a computer mahjong player based on monte carlo … · 2019. 8. 23. · •computer poker...
TRANSCRIPT
![Page 1: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/1.jpg)
Building a Computer Mahjong Player Based onMonte Carlo Simulation and Opponent Models
Naoki Mizukami1 and Yoshimasa Tsuruoka1
1 The University of Tokyo
1
![Page 2: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/2.jpg)
Introduction
• Imperfect information games are challenging research• Contract bridge [Ginsberg 2001]• Skat [Buro et al 2009]• Texas Hold’em [Bowling et al 2015]
• We focus on Japanese Mahjong• Multiplayer• Imperfect information• Enormous number of information sets
• Mahjong : 1060
• Texas Hold’em: 1018
2
![Page 3: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/3.jpg)
Related work
• Computer poker• Nash equilibrium strategy
• CFR+ method has solved Heads-up limit hold’ em poker [Bowling et al 2015]
• Opponent modeling• Opponent modeling and Monte Carlo tree search for
exploitation [Van der Kleij 2010]
• The program updates a hand rank distribution in the current game state when the showdown occurs [Aaron 2002]
3
![Page 4: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/4.jpg)
Japanese Mahjong
• Rules• It play with four players• A player can win round by completing a winning hand
consisting of 13 tiles• One game of mahjong consists of 4 or 8 rounds
• Terms• Waiting
• A player’s hand needs only one tile to win
• Folding• A player gives up to win and only tries to avoid discarding a
winning tile for opponents• Is not action but strategy
4
![Page 5: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/5.jpg)
One-player mahjong [Mizukami et al 2014]• Implement folding system
• One-player Mahjong • A One-player Mahjong player only tries to win• It is trained by supervised learning using game records• It plays an important role in our Monte Carlo simulation
• Recognizing Folding situations• Folding system is realized by supervised learning• Positions in game records are annotated manually
• Result: Beyond average human players
• Problem: It is difficult to annotate required data
5
![Page 6: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/6.jpg)
Original game
Opponent modelingby supervised learning
Waiting
Winning tile
Hand score
Abstraction of opponent
Abstracted gameDecides moves
Proposed method
• Overview
• Advantage• It is not necessary to predict opponents’ specific hands• Can be trained models only using game records
Monte Carlo Simulation
6
![Page 7: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/7.jpg)
Training setting
• Game records• Internet Mahjong site called ``Tenhou’’
• Dataset• Training data 1.7 × 107
• Test data 100
• Models• Waiting: logistic regression model
• Winning tile: logistic regression model
• Hand score: Linear regression model
7
![Page 8: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/8.jpg)
Waiting
• The model predicts whether an opponent is waiting or not
Label: waiting
Input Discarded tiles
Opponent’s handrevealed melds
Output 𝑃 𝑜𝑝𝑝𝑜𝑛𝑒𝑛𝑡 = 𝑤𝑎𝑖𝑡𝑖𝑛𝑔 = 0.8
8
![Page 9: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/9.jpg)
Evaluation and result
• Evaluation• Area Under the Curve
•
Player AUC
Expert player 0.778
Prediction model 0.777
-Discarded tiles 0.772
-Number of revealed melds
0.770
・ Same prediction ability as the expert player・ Expert player: Top 0.1% of the players
9
![Page 10: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/10.jpg)
Winning tiles
• Model predicts opponents’ winning tiles
• In general, there are one or more winning tiles
→ Build prediction models for all kinds of tiles Discarded tiles
Opponent’s handrevealed melds
Winning tile or
Output ∶ 0. 0 ∶ 0.10 ∶ 0.15
Input
10
![Page 11: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/11.jpg)
Evaluation method
1: Input opponents’ information e.g winning tiles
2: Tiles that a player has are arranged in ascending orderof probability of being a winning tile for opponent
Evaluation value = 6 / (14-2)=0.5
Ranking about winning tiles for opponent
11
![Page 12: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/12.jpg)
Result
• Random: Tiles are arranged randomly
Player Evaluation value
Expert player 0.744
Prediction model 0.676
-Revealed melds 0.675
-Discarded tiles 0.673
Random 0.502
12
![Page 13: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/13.jpg)
Hand Score (HS)
• The model predicts the score that the player has to pay
Discarded tiles
Opponent’s handrevealed melds
Input
Hand Score 2,600
Output 2,000
13
![Page 14: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/14.jpg)
Evaluation method and result
• Evaluation method• Mean Squared Error (MSE)
Player MSE
Prediction model 0.37
-Revealed Melds 0.38
-Revealed fan value 0.38
Expert player 0.40
Performance of prediction model is higher than that of an expert player
14
![Page 15: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/15.jpg)
Waiting
Winning tile
Hand score
Abstraction of opponent
Abstracted game
Overview of proposed method
𝐻𝑆 = 𝒘𝑇𝒙
𝑃 𝑇𝑖𝑙𝑒 = 𝑤𝑖𝑛𝑛𝑖𝑛𝑔 =1
1 + exp(−𝒘𝑇𝒙𝑝)
𝑃 𝑝 = 𝑤𝑎𝑖𝑡𝑖𝑛𝑔 =1
1 + exp(−𝒘𝑇𝒙𝑝)
15
![Page 16: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/16.jpg)
Application of opponent models
• Using three prediction models to estimate an expected value
• LP (Losing probability)
• EL (Expected Loss)
L𝑃 𝑝, 𝑇𝑖𝑙𝑒 = 𝑃 𝑝 = 𝑤𝑎𝑖𝑡𝑖𝑛𝑔 × 𝑃(𝑇𝑖𝑙𝑒 = 𝑤𝑖𝑛𝑛𝑖𝑛𝑔)
E𝐿 𝑝, 𝑇𝑖𝑙𝑒 = 𝐿𝑃 𝑝, 𝑇𝑖𝑙𝑒 × 𝐻𝑆(𝑝, 𝑇𝑖𝑙𝑒)
16
![Page 17: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/17.jpg)
Monte Carlo simulation
• The program calculates Score 𝑇𝑖𝑙𝑒 for each tile• Program selects the tile that has the highest Score 𝑇𝑖𝑙𝑒
• Procedure of sim 𝑇𝑖𝑙𝑒1: Discard a tile2: Opponent’s turn 3: Program’s turn4: Repeat 2,35: Get reward
Score 𝑇𝑖𝑙𝑒 = sim 𝑇𝑖𝑙𝑒 × 𝑝∈𝑜𝑝𝑝𝑜𝑛𝑒𝑛𝑡𝑠
𝑝∈𝑜𝑝𝑝𝑜𝑛𝑒𝑛𝑡𝑠
1 − 𝐿𝑃 𝑝, 𝑇𝑖𝑙𝑒 − 𝐸𝐿 𝑝, 𝑇𝑖𝑙𝑒
My hand
𝑇𝑖𝑙𝑒1 𝑇𝑖𝑙𝑒2 ・・・
𝑠𝑖𝑚(𝑇𝑖𝑙𝑒1) 𝑠𝑖𝑚(𝑇𝑖𝑙𝑒2)
17
![Page 18: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/18.jpg)
Evaluation setting
• Compared to our previous work• Moves are computed in a second
• Length of a game is four rounds
• VS state-of-the-art program• Mattari Mahjong
• Duplicate mode• can generate same tile sequences
• can compare the result
• VS human players• Internet Mahjong site ``Tenhou’’
18
![Page 19: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/19.jpg)
Result
• VS Mattari Mahjong
• VS Human players
1st (%) 2nd(%) 3rd(%) 4th(%) Average rank games
Proposed method 24.1 28.1 24.8 23.0 2.46±0.04 2634
[Mizukami + 2014] 25.3 24.8 25.1 24.8 2.49±0.07 1441
1st (%) 2nd(%) 3rd(%) 4th(%) Average rank Games
Proposed method 25.2 25.6 24.7 24.5 2.48±0.07 1000
Mattari Mahjong 24.8 24.7 25.0 25.5 2.51±0.07 1000
[Mizukami+ 2014] 24.3 22.6 22.2 30.9 2.59±0.07 1000
19
![Page 20: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/20.jpg)
Conclusion and Future work
• Conclusion• Performance of the three prediction models is high
• Our program outperforms state-of-the-art program by Monte Carlo simulation
• Future work• Consider final rank
• Improve players’ actions in simulation
20
![Page 21: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/21.jpg)
Training of 1-player mahjong players
• A weight vector is updated so that the player can make moves as expert players.
• We used the averaged perceptron
21
3
-1
Evaluation value Record of a game’s
move1
2𝑊′ = 𝑊 + 𝑋 − 𝑋
Updateweight vector
x:𝑓𝑒𝑎𝑢𝑟𝑒 𝑣𝑒𝑐𝑡𝑜𝑟 𝑊 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑒𝑐𝑡𝑜𝑟
![Page 22: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/22.jpg)
Recognizing folding situations
• We train a classifier for folding situations using a machine learning approach
• This approach requires training data.
Positions in game records are annotated manually
22
Folding situationsThe player
folded.Because
were discarded
Humanplayers
Beforediscarding
![Page 23: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/23.jpg)
Setting
• Dataset• Training data 1.77 × 107
• Test data 100
• Features• Discarded tiles, number of revealed melds, and so on
• 6,888 dimension
• logistic regression model
23
𝑃 𝑝 = 𝑤𝑎𝑖𝑡𝑖𝑛𝑔 =1
1 + exp(−𝒘𝑇𝒙𝑝)
![Page 24: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/24.jpg)
Setting
• Dataset• Training data 1.77 × 107
• Test data 100
• Features• Discarded tiles, number of revealed melds, and so on
• 31,416 dimension
• logistic regression model
𝑃 𝑇𝑖𝑙𝑒 = 𝑤𝑖𝑛𝑛𝑖𝑛𝑔 =1
1 + exp(−𝒘𝑇𝒙𝑝)
24
![Page 25: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/25.jpg)
Setting
• Dataset• Training data 5.92 × 107
• Test data 100
• Features• Revealed Melds, Revealed fan value and so on
• 26,889 dimension
• Linear regression model
25
𝐻𝑆 = 𝒘𝑇𝒙
![Page 26: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/26.jpg)
Evaluation and result
• Evaluation• Area Under the Curve
•
Player AUC
Expert player 0.778
Prediction model 0.777
-Discarded tiles 0.772
-Number of revealed melds
0.770
・ Same prediction ability as the expert player・ Expert player: Top 0.1% of the players
False positive rate
Tru
e p
osi
tive
rat
e
26
![Page 27: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/27.jpg)
Flowchart of program’s turn
Win checkYES
Win
Decide one-player mahjong moves
Discard a tile and compute ODEV
Win check for opponentYES
Win
Next player
Pick up a tile
Fold
・ODEV (One-Depth Expected Value)
is an expected value that is calculated by
searching game trees until the program’s
next turn.
・Fold: a player picks up a tile and discards no tiles
27
![Page 28: Building a Computer Mahjong Player Based on Monte Carlo … · 2019. 8. 23. · •Computer poker •Nash equilibrium strategy •CFR+ method has solved Heads-up limit hold’ em](https://reader034.vdocuments.net/reader034/viewer/2022051902/5ff250fe1374df0dd72b765b/html5/thumbnails/28.jpg)
Flowchart of opponent’s turn
Win checkYES
Win
Change two parameters, fold and waiting
Pick up a tile
FoldYES
Next players
Discard a tile or fold
Win check for opponentYES
Win
Next players
Fold
Next player
・Opponent player has two binaryparameters indicating whether he is waiting or folding
28