hidden information, teamwork, and prediction in trick-taking card...

RulesofFourHundred Results

Betting

References

CardPlay

Baselines

Hadi Elzayn,MikhailHayhoe,HarshatKumar,MohammadFereydounian

Hidden Information, Teamwork, and Prediction in Trick-Taking Card Games

hads@sas.upenn.edu,mhayhoe@seas.upenn.edu,harshat@seas.upenn.edu,mferey@seas.upenn.edu

FutureWork

Random Greedy Heuristic Over400

HeuristicWin% 100 100 - 10.5

Over400Win% 100 100 89.5 -

• Playerssitacrossfromteammembersandaredealt13cardseach• Playersbetexpectednumberof‘tricks’theyplantotake• Ineachroundafterbetsareplaced• Playerstaketurnschoosingacardtoplayinorder,startingwiththeplayerwhotooktheprevioustrick

• Suitoffirstcardplayeddetermines‘leadsuit’• Playersmustplay‘leadsuit’iftheyhaveit• Winneroftrickisdeterminedbyhighestcardof‘leadsuit’orhighestcardof‘trumpsuit’(Hearts)ifanywereplayed

• After13rounds,scoreofeachplayerisincreasedbynumberoftrickstheybetiftheymetorexceededit

• Scoreofplayerisdecreasedbybetiftheywereunabletomeetit• Gameconcludeswhenoneteammemberhas41pointsormore,andtheotherplayerhaspositivepoints

(a) Tricks won over 1,000 games, based on bet made. No bets were higher than 6

(b) Histogram showing difference between tricks won and bet made

(c) Average score showing the performance of the self play NN model

(d) Loss function for the betting NN, trained in tandem with the playing NN

Neural Network Architecture for Card Play

Neural Network Architecture for Betting

LearningProblem:Findanoptimalplayingpolicy𝜋!",∗ tomaximize

expectedrewardforplayer𝑖 giveneachotherandtheplayoftheothers:

ReinforcementLearningApproach:

Wedefinetherewardattheendofeachroundby

ApproachinformedbyNeuralFittedQ-iteration[2]inwhichtherewardisprovidedasalabeltothestaterepresentation.Wedefinethelabelas

Weincludetheindicatorasrewardshaping[3]tospeedupconvergence

Thestaterepresentation:• Similarto AlphaGo[4],weencodethestatebyamatrixrepresentationwherethelocationandvalueindicatethecard,timeitwasplayed,andbywhomitwasplayed

• Initialbetsandnumberoftrickswonarealsoencodedinthestaterepresentation

LearningProblem:Findanoptimalbettingpolicy𝛽",∗ tomaximizeexpectedrewardforplayer𝑖 giveneachotherandtheplayoftheothers:

SupervisedLearningApproach:Givensomeparticularcard playingstrategyandinitialhand,theplaycanexpecttowinsomenumberoftricks.Therefore,developamodelwhichpredictstrickstakenbytheendoftheroundgivensomeobservedinitialhandcompositions

GenerateData:Initialhandsserveasinputdata,numberoftrickswonfunctionsaslabel.

ImplementNeuralNetworkforregressiontominimizelossdefinedbysquaredifferencebetweenscorereceivedandthebestpossible.

Lossfunctionisasymmetric:Penalizesmoreforbetswhicharehigherthanthetricksobtained

RandomPlay-RandomBet:Randombetselectsarandomvaluefrom{2,3,4,5}duringthebettingstage,andineachroundchoosesarandomcardtoplayfromthesetofvalidcards.

GreedyPlay-ModelBet: Duringplay,greedysimplyselectsthehighestcardinitshandfromthesetofvalidcards,withoutconsiderationofitspartner

HeuristicPlay-ModelBet: Aheuristicstrategywasdefinedbasedonhumanknowledgeofthegame.Thisheuristicplayertakesintoaccountavailableknowledgeoftheteammember’sactionsaswellasopponents’actions

• ExamineapproachonsimilargamessuchasSpades,Hearts,andTarneeb• Considerinvariancediscovery forgeneralizeation

[1]A.J.Hoane Jr.M.CampbellandF.Hsu.Deepblue.ArtificialIntelligence,134(1-2):57-83,2002[2]M.Riedmiller.NeuralFittedQiteration– Firstexperienceswithadataefficientneuralreinforcementlearningmethod.EuropeanconferenceonMachineLearning, 05:317-328,2005[3]V.Minhetal.Human-levelcontrolthroughdeepreinforcementlearning.Nature,518:529-533,2015[4]D.Silveretal.MasteringthegameofGowithouthumanknowledge.Nature,550:354-359,2017

hidden information, teamwork, and prediction in trick-taking card...

Documents

the maritime areas act areas act.pdf · maritime areas 5...

australia gps baselines

a climate for change:' a presentation by katharine hayhoe

straight baselines: indonesia

hayhoe climate

london city airport luke hayhoe gm commercial and customer...

straight baselines: thailand - state baselines: thailand...

security baselines

802.3 primer for baselines

dr. ruth hayhoe email: ruth-hayhoe@sympatico.ca …...

a colour atlas of haematological cytology by hayhoe, pg

delusional redd baselines

hints outlines/profiles/baselines

lsc tech baselines

47528090 a colour atlas of haematological cytology by hayhoe...

metric baselines

digital baselines amplify marketing effectiveness

martenson - shifting baselines v2

baselines and additionality

baselines for recognizing textual entailment