Transcript
  • RulesofFourHundred Results

    Betting

    References

    CardPlay

    Baselines

    Hadi Elzayn,MikhailHayhoe,HarshatKumar,MohammadFereydounian

    Hidden Information, Teamwork, and Prediction in Trick-Taking Card Games

    [email protected],[email protected],[email protected],[email protected]

    FutureWork

    Random Greedy Heuristic Over400

    HeuristicWin% 100 100 - 10.5

    Over400Win% 100 100 89.5 -

    • Playerssitacrossfromteammembersandaredealt13cardseach• Playersbetexpectednumberof‘tricks’theyplantotake• Ineachroundafterbetsareplaced• Playerstaketurnschoosingacardtoplayinorder,startingwiththeplayerwhotooktheprevioustrick

    • Suitoffirstcardplayeddetermines‘leadsuit’• Playersmustplay‘leadsuit’iftheyhaveit• Winneroftrickisdeterminedbyhighestcardof‘leadsuit’orhighestcardof‘trumpsuit’(Hearts)ifanywereplayed

    • After13rounds,scoreofeachplayerisincreasedbynumberoftrickstheybetiftheymetorexceededit

    • Scoreofplayerisdecreasedbybetiftheywereunabletomeetit• Gameconcludeswhenoneteammemberhas41pointsormore,andtheotherplayerhaspositivepoints

    (a) Tricks won over 1,000 games, based on bet made. No bets were higher than 6

    (b) Histogram showing difference between tricks won and bet made

    (c) Average score showing the performance of the self play NN model

    (d) Loss function for the betting NN, trained in tandem with the playing NN

    Neural Network Architecture for Card Play

    Neural Network Architecture for Betting

    LearningProblem:Findanoptimalplayingpolicy𝜋!",∗ tomaximize

    expectedrewardforplayer𝑖 giveneachotherandtheplayoftheothers:

    ReinforcementLearningApproach:

    Wedefinetherewardattheendofeachroundby

    ApproachinformedbyNeuralFittedQ-iteration[2]inwhichtherewardisprovidedasalabeltothestaterepresentation.Wedefinethelabelas

    Weincludetheindicatorasrewardshaping[3]tospeedupconvergence

    Thestaterepresentation:• Similarto AlphaGo[4],weencodethestatebyamatrixrepresentationwherethelocationandvalueindicatethecard,timeitwasplayed,andbywhomitwasplayed

    • Initialbetsandnumberoftrickswonarealsoencodedinthestaterepresentation

    LearningProblem:Findanoptimalbettingpolicy𝛽",∗ tomaximizeexpectedrewardforplayer𝑖 giveneachotherandtheplayoftheothers:

    SupervisedLearningApproach:Givensomeparticularcard playingstrategyandinitialhand,theplaycanexpecttowinsomenumberoftricks.Therefore,developamodelwhichpredictstrickstakenbytheendoftheroundgivensomeobservedinitialhandcompositions

    GenerateData:Initialhandsserveasinputdata,numberoftrickswonfunctionsaslabel.

    ImplementNeuralNetworkforregressiontominimizelossdefinedbysquaredifferencebetweenscorereceivedandthebestpossible.

    Lossfunctionisasymmetric:Penalizesmoreforbetswhicharehigherthanthetricksobtained

    RandomPlay-RandomBet:Randombetselectsarandomvaluefrom{2,3,4,5}duringthebettingstage,andineachroundchoosesarandomcardtoplayfromthesetofvalidcards.

    GreedyPlay-ModelBet: Duringplay,greedysimplyselectsthehighestcardinitshandfromthesetofvalidcards,withoutconsiderationofitspartner

    HeuristicPlay-ModelBet: Aheuristicstrategywasdefinedbasedonhumanknowledgeofthegame.Thisheuristicplayertakesintoaccountavailableknowledgeoftheteammember’sactionsaswellasopponents’actions

    • ExamineapproachonsimilargamessuchasSpades,Hearts,andTarneeb• Considerinvariancediscovery forgeneralizeation

    [1]A.J.Hoane Jr.M.CampbellandF.Hsu.Deepblue.ArtificialIntelligence,134(1-2):57-83,2002[2]M.Riedmiller.NeuralFittedQiteration– Firstexperienceswithadataefficientneuralreinforcementlearningmethod.EuropeanconferenceonMachineLearning, 05:317-328,2005[3]V.Minhetal.Human-levelcontrolthroughdeepreinforcementlearning.Nature,518:529-533,2015[4]D.Silveretal.MasteringthegameofGowithouthumanknowledge.Nature,550:354-359,2017


Top Related