Download - Hidden Information, Teamwork, and Prediction in Trick-Taking Card …mferey/publication/ai-400/... · 2020. 7. 11. · Card Play Baselines HadiElzayn, Mikhail Hayhoe, Harshat Kumar,

RulesofFourHundred Results

Betting

References

CardPlay

Baselines

Hadi Elzayn,MikhailHayhoe,HarshatKumar,MohammadFereydounian

Hidden Information, Teamwork, and Prediction in Trick-Taking Card Games

[email protected],[email protected],[email protected],[email protected]

FutureWork

Random Greedy Heuristic Over400

HeuristicWin% 100 100 - 10.5

Over400Win% 100 100 89.5 -

• Playerssitacrossfromteammembersandaredealt13cardseach• Playersbetexpectednumberof‘tricks’theyplantotake• Ineachroundafterbetsareplaced• Playerstaketurnschoosingacardtoplayinorder,startingwiththeplayerwhotooktheprevioustrick

• Suitoffirstcardplayeddetermines‘leadsuit’• Playersmustplay‘leadsuit’iftheyhaveit• Winneroftrickisdeterminedbyhighestcardof‘leadsuit’orhighestcardof‘trumpsuit’(Hearts)ifanywereplayed

• After13rounds,scoreofeachplayerisincreasedbynumberoftrickstheybetiftheymetorexceededit

• Scoreofplayerisdecreasedbybetiftheywereunabletomeetit• Gameconcludeswhenoneteammemberhas41pointsormore,andtheotherplayerhaspositivepoints

(a) Tricks won over 1,000 games, based on bet made. No bets were higher than 6

(b) Histogram showing difference between tricks won and bet made

(c) Average score showing the performance of the self play NN model

(d) Loss function for the betting NN, trained in tandem with the playing NN

Neural Network Architecture for Card Play

Neural Network Architecture for Betting

LearningProblem:Findanoptimalplayingpolicy𝜋!",∗ tomaximize

expectedrewardforplayer𝑖 giveneachotherandtheplayoftheothers:

ReinforcementLearningApproach:

Wedefinetherewardattheendofeachroundby

ApproachinformedbyNeuralFittedQ-iteration[2]inwhichtherewardisprovidedasalabeltothestaterepresentation.Wedefinethelabelas

Weincludetheindicatorasrewardshaping[3]tospeedupconvergence

Thestaterepresentation:• Similarto AlphaGo[4],weencodethestatebyamatrixrepresentationwherethelocationandvalueindicatethecard,timeitwasplayed,andbywhomitwasplayed

• Initialbetsandnumberoftrickswonarealsoencodedinthestaterepresentation

LearningProblem:Findanoptimalbettingpolicy𝛽",∗ tomaximizeexpectedrewardforplayer𝑖 giveneachotherandtheplayoftheothers:

SupervisedLearningApproach:Givensomeparticularcard playingstrategyandinitialhand,theplaycanexpecttowinsomenumberoftricks.Therefore,developamodelwhichpredictstrickstakenbytheendoftheroundgivensomeobservedinitialhandcompositions

GenerateData:Initialhandsserveasinputdata,numberoftrickswonfunctionsaslabel.

ImplementNeuralNetworkforregressiontominimizelossdefinedbysquaredifferencebetweenscorereceivedandthebestpossible.

Lossfunctionisasymmetric:Penalizesmoreforbetswhicharehigherthanthetricksobtained

RandomPlay-RandomBet:Randombetselectsarandomvaluefrom{2,3,4,5}duringthebettingstage,andineachroundchoosesarandomcardtoplayfromthesetofvalidcards.

GreedyPlay-ModelBet: Duringplay,greedysimplyselectsthehighestcardinitshandfromthesetofvalidcards,withoutconsiderationofitspartner

HeuristicPlay-ModelBet: Aheuristicstrategywasdefinedbasedonhumanknowledgeofthegame.Thisheuristicplayertakesintoaccountavailableknowledgeoftheteammember’sactionsaswellasopponents’actions

• ExamineapproachonsimilargamessuchasSpades,Hearts,andTarneeb• Considerinvariancediscovery forgeneralizeation

[1]A.J.Hoane Jr.M.CampbellandF.Hsu.Deepblue.ArtificialIntelligence,134(1-2):57-83,2002[2]M.Riedmiller.NeuralFittedQiteration– Firstexperienceswithadataefficientneuralreinforcementlearningmethod.EuropeanconferenceonMachineLearning, 05:317-328,2005[3]V.Minhetal.Human-levelcontrolthroughdeepreinforcementlearning.Nature,518:529-533,2015[4]D.Silveretal.MasteringthegameofGowithouthumanknowledge.Nature,550:354-359,2017

Download - Hidden Information, Teamwork, and Prediction in Trick-Taking Card …mferey/publication/ai-400/... · 2020. 7. 11. · Card Play Baselines HadiElzayn, Mikhail Hayhoe, Harshat Kumar,

Top Related