hidden information, teamwork, and prediction in trick-taking card...
Post on 26-Jan-2021
2 Views
Preview:
TRANSCRIPT
-
RulesofFourHundred Results
Betting
References
CardPlay
Baselines
Hadi Elzayn,MikhailHayhoe,HarshatKumar,MohammadFereydounian
Hidden Information, Teamwork, and Prediction in Trick-Taking Card Games
hads@sas.upenn.edu,mhayhoe@seas.upenn.edu,harshat@seas.upenn.edu,mferey@seas.upenn.edu
FutureWork
Random Greedy Heuristic Over400
HeuristicWin% 100 100 - 10.5
Over400Win% 100 100 89.5 -
• Playerssitacrossfromteammembersandaredealt13cardseach• Playersbetexpectednumberof‘tricks’theyplantotake• Ineachroundafterbetsareplaced• Playerstaketurnschoosingacardtoplayinorder,startingwiththeplayerwhotooktheprevioustrick
• Suitoffirstcardplayeddetermines‘leadsuit’• Playersmustplay‘leadsuit’iftheyhaveit• Winneroftrickisdeterminedbyhighestcardof‘leadsuit’orhighestcardof‘trumpsuit’(Hearts)ifanywereplayed
• After13rounds,scoreofeachplayerisincreasedbynumberoftrickstheybetiftheymetorexceededit
• Scoreofplayerisdecreasedbybetiftheywereunabletomeetit• Gameconcludeswhenoneteammemberhas41pointsormore,andtheotherplayerhaspositivepoints
(a) Tricks won over 1,000 games, based on bet made. No bets were higher than 6
(b) Histogram showing difference between tricks won and bet made
(c) Average score showing the performance of the self play NN model
(d) Loss function for the betting NN, trained in tandem with the playing NN
Neural Network Architecture for Card Play
Neural Network Architecture for Betting
LearningProblem:Findanoptimalplayingpolicy𝜋!",∗ tomaximize
expectedrewardforplayer𝑖 giveneachotherandtheplayoftheothers:
ReinforcementLearningApproach:
Wedefinetherewardattheendofeachroundby
ApproachinformedbyNeuralFittedQ-iteration[2]inwhichtherewardisprovidedasalabeltothestaterepresentation.Wedefinethelabelas
Weincludetheindicatorasrewardshaping[3]tospeedupconvergence
Thestaterepresentation:• Similarto AlphaGo[4],weencodethestatebyamatrixrepresentationwherethelocationandvalueindicatethecard,timeitwasplayed,andbywhomitwasplayed
• Initialbetsandnumberoftrickswonarealsoencodedinthestaterepresentation
LearningProblem:Findanoptimalbettingpolicy𝛽",∗ tomaximizeexpectedrewardforplayer𝑖 giveneachotherandtheplayoftheothers:
SupervisedLearningApproach:Givensomeparticularcard playingstrategyandinitialhand,theplaycanexpecttowinsomenumberoftricks.Therefore,developamodelwhichpredictstrickstakenbytheendoftheroundgivensomeobservedinitialhandcompositions
GenerateData:Initialhandsserveasinputdata,numberoftrickswonfunctionsaslabel.
ImplementNeuralNetworkforregressiontominimizelossdefinedbysquaredifferencebetweenscorereceivedandthebestpossible.
Lossfunctionisasymmetric:Penalizesmoreforbetswhicharehigherthanthetricksobtained
RandomPlay-RandomBet:Randombetselectsarandomvaluefrom{2,3,4,5}duringthebettingstage,andineachroundchoosesarandomcardtoplayfromthesetofvalidcards.
GreedyPlay-ModelBet: Duringplay,greedysimplyselectsthehighestcardinitshandfromthesetofvalidcards,withoutconsiderationofitspartner
HeuristicPlay-ModelBet: Aheuristicstrategywasdefinedbasedonhumanknowledgeofthegame.Thisheuristicplayertakesintoaccountavailableknowledgeoftheteammember’sactionsaswellasopponents’actions
• ExamineapproachonsimilargamessuchasSpades,Hearts,andTarneeb• Considerinvariancediscovery forgeneralizeation
[1]A.J.Hoane Jr.M.CampbellandF.Hsu.Deepblue.ArtificialIntelligence,134(1-2):57-83,2002[2]M.Riedmiller.NeuralFittedQiteration– Firstexperienceswithadataefficientneuralreinforcementlearningmethod.EuropeanconferenceonMachineLearning, 05:317-328,2005[3]V.Minhetal.Human-levelcontrolthroughdeepreinforcementlearning.Nature,518:529-533,2015[4]D.Silveretal.MasteringthegameofGowithouthumanknowledge.Nature,550:354-359,2017
top related