the implementation of machine learning in the game of checkers billy melicher computer systems lab...
TRANSCRIPT
![Page 1: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/1.jpg)
The Implementation of Machine Learning in the Game of Checkers
Billy Melicher
Computer Systems lab 08-09
![Page 2: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/2.jpg)
Abstract• Machine learning uses past information
to predict future states• Can be used in any situation where the
past will predict the future• Will adapt to situations
![Page 3: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/3.jpg)
Introduction
• Checkers is used to explore machine learning
• Checkers has many tactical aspects that make it good for studying
![Page 4: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/4.jpg)
Background
• Minimax• Heuristics • Learning
![Page 5: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/5.jpg)
Minimax
• Method of adversarial search• Every pattern(board) can be given a fitness
value(heuristic)• Each player chooses the outcome that is best
for them from the choices they have
![Page 6: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/6.jpg)
Minimax
Chart from wikipedia
![Page 7: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/7.jpg)
Minimax
• Has exponential growth rate• Can only evaluate a certain number of actions
into the future – ply
![Page 8: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/8.jpg)
Heuristic
• Heuristics predict out come of a board• Fitness value of board, higher value, better
outcome• Not perfect• Requires expertise in the situation to create
![Page 9: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/9.jpg)
Heuristics• H(s) = c0F0(s) + c1F1(s) + … + cnFn(s)
• H(s) = heuristic
• Has many different terms
• In checkers terms could be:
Number of checkers
Number of kings
Number of checkers on an edge
How far checkers are on board
![Page 10: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/10.jpg)
Learning by Rote
• Stores every game played• Connects the moves made for each board• Relates the moves made from a particular board to
the outcome of the board• More likely to make moves that result in a win, less
likely to make moves resulting in a loss• Good in end game, not as good in mid game
![Page 11: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/11.jpg)
How I store dataI convert each checker board into a 32 digit base 5 number where each digit corresponds to a playable square and each number corresponds to what occupies that square.
![Page 12: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/12.jpg)
Learning by Generalization
• Uses a heuristic function to guide moves• Changes the heuristic function after games
based on the outcome• Good in mid game but not as good in early
and end games• Requires identifying the features that affect
game
![Page 13: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/13.jpg)
Development
• Use of minimax algorithm with alpha beta pruning
• Use of both learning by Rote and Generalization
• Temporal difference learning
![Page 14: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/14.jpg)
Temporal Difference Learning
• In temporal difference learning, you adjust the heuristic based on the difference between the heuristic at one time and at another
• Equilibrium moves toward ideal function• U(s) <-- U(s) + α( R(s) + γU(s') - U(s))
![Page 15: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/15.jpg)
Temporal Difference Learning
• No proof that prediction closer to the end of the game will be better but common sense says it is
• Changes heuristic so that it better predicts the value of all boards
• Adjusts the weights of the heuristic
![Page 16: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/16.jpg)
Alpha Value
• The alpha value decreases the change of the heuristic based on how much data you have
• Decreasing returns• Necessary for ensuring rare occurrences do not
change heuristic too much
![Page 17: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/17.jpg)
Development
• Equation for learning applied to each weight:• w=(previous-current)(previous+current/2)• Equation for alpha value:• a=50/(49+n)
![Page 18: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/18.jpg)
Results
• Value of weight reaches equilibrium• Changes to reflect the learning of the program• Occasionally requires programmer intervention
when it reaches a false equilibrium
![Page 19: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/19.jpg)
Results
![Page 20: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/20.jpg)
Results
![Page 21: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/21.jpg)
Results
![Page 22: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/22.jpg)
Results
• Learning by rote requires a large data set• Requires large amounts of memory• Necessary for determining alpha value in
temporal difference learning
![Page 23: The Implementation of Machine Learning in the Game of Checkers Billy Melicher Computer Systems lab 08-09](https://reader036.vdocuments.net/reader036/viewer/2022062421/56649d055503460f949d8fed/html5/thumbnails/23.jpg)
Conclusions
• Good way to find equilibrium weight• Sometimes requires intervention• Doesn't require much memory• Substantial learning could be achieved with
relativelly few runs• Learning did not require the program to know
strategies but does require it to play towards a win