alphago: mastering the game of go with deep neural networks and tree search
TRANSCRIPT
![Page 1: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/1.jpg)
AlphaGo: Mastering the game of Go
with deep neural networks and tree search
Karel Ha
article by Google DeepMind
Optimization Seminar, 20th April 2016
![Page 2: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/2.jpg)
Why AI?
![Page 3: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/3.jpg)
Applications of AI
� spam filters
� recommender systems (Netflix, YouTube)
� predictive text (Swiftkey)
� audio recognition (Shazam, SoundHound)
� self-driving cars
1
![Page 4: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/4.jpg)
Applications of AI
� spam filters
� recommender systems (Netflix, YouTube)
� predictive text (Swiftkey)
� audio recognition (Shazam, SoundHound)
� self-driving cars
1
![Page 5: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/5.jpg)
Applications of AI
� spam filters
� recommender systems (Netflix, YouTube)
� predictive text (Swiftkey)
� audio recognition (Shazam, SoundHound)
� self-driving cars
1
![Page 6: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/6.jpg)
Applications of AI
� spam filters
� recommender systems (Netflix, YouTube)
� predictive text (Swiftkey)
� audio recognition (Shazam, SoundHound)
� self-driving cars
1
![Page 7: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/7.jpg)
Applications of AI
� spam filters
� recommender systems (Netflix, YouTube)
� predictive text (Swiftkey)
� audio recognition (Shazam, SoundHound)
� self-driving cars
1
![Page 8: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/8.jpg)
Artistic-Style Painting (1/2)
[1] Gatys, Ecker, and Bethge 2015 [2] Li and Wand 2016 2
![Page 9: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/9.jpg)
Artistic-Style Painting (1/2)
[1] Gatys, Ecker, and Bethge 2015 [2] Li and Wand 2016 2
![Page 10: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/10.jpg)
Artistic-Style Painting (2/2)
Champandard 2016 3
![Page 11: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/11.jpg)
C Code Generated Character by Character
Karpathy 2015 4
![Page 12: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/12.jpg)
Algebraic Geometry Generated Character by Character
Karpathy 2015 5
![Page 13: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/13.jpg)
Game of Thrones Generated Character by Character
JON
He leaned close and onions, barefoot from
his shoulder. “I am not a purple girl,” he
said as he stood over him. “The sight of
you sell your father with you a little choice.”
“I say to swear up his sea or a boy of stone
and heart, down,” Lord Tywin said. “I love
your word or her to me.”
Darknet (on Linux)
JON
Each in days and the woods followed his
king. “I understand.”
“I am not your sister Lord Robert?”
“The door was always some cellar to do his
being girls and the Magnar of Baratheon,
and there were thousands of every bite of
half the same as though he was not a great
knight should be seen, and not to look at
the Redwyne two thousand men.”
Darknet (on OS X)
http://pjreddie.com/darknet/rnns-in-darknet/ 5
![Page 14: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/14.jpg)
Game of Thrones Generated Character by Character
JON
He leaned close and onions, barefoot from
his shoulder. “I am not a purple girl,” he
said as he stood over him. “The sight of
you sell your father with you a little choice.”
“I say to swear up his sea or a boy of stone
and heart, down,” Lord Tywin said. “I love
your word or her to me.”
Darknet (on Linux)
JON
Each in days and the woods followed his
king. “I understand.”
“I am not your sister Lord Robert?”
“The door was always some cellar to do his
being girls and the Magnar of Baratheon,
and there were thousands of every bite of
half the same as though he was not a great
knight should be seen, and not to look at
the Redwyne two thousand men.”
Darknet (on OS X)
http://pjreddie.com/darknet/rnns-in-darknet/ 5
![Page 15: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/15.jpg)
Game of Thrones Generated Character by Character
JON
He leaned close and onions, barefoot from
his shoulder. “I am not a purple girl,” he
said as he stood over him. “The sight of
you sell your father with you a little choice.”
“I say to swear up his sea or a boy of stone
and heart, down,” Lord Tywin said. “I love
your word or her to me.”
Darknet (on Linux)
JON
Each in days and the woods followed his
king. “I understand.”
“I am not your sister Lord Robert?”
“The door was always some cellar to do his
being girls and the Magnar of Baratheon,
and there were thousands of every bite of
half the same as though he was not a great
knight should be seen, and not to look at
the Redwyne two thousand men.”
Darknet (on OS X)http://pjreddie.com/darknet/rnns-in-darknet/ 5
![Page 16: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/16.jpg)
DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
� We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
� The biggest risk to the world, is me, believe it or not.
� I am what ISIS doesn’t need.
� I’d like to beat that @HillaryClinton. She is a horror. I told my supporter Putin to say that all the time. He
has been amazing.
� I buy Hillary, it’s beautiful and I’m happy about it.
Hayes 2016 6
![Page 17: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/17.jpg)
DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
� We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
� The biggest risk to the world, is me, believe it or not.
� I am what ISIS doesn’t need.
� I’d like to beat that @HillaryClinton. She is a horror. I told my supporter Putin to say that all the time. He
has been amazing.
� I buy Hillary, it’s beautiful and I’m happy about it.
Hayes 2016 6
![Page 18: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/18.jpg)
DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
� We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
� The biggest risk to the world, is me, believe it or not.
� I am what ISIS doesn’t need.
� I’d like to beat that @HillaryClinton. She is a horror. I told my supporter Putin to say that all the time. He
has been amazing.
� I buy Hillary, it’s beautiful and I’m happy about it.
Hayes 2016 6
![Page 19: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/19.jpg)
DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
� We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
� The biggest risk to the world, is me, believe it or not.
� I am what ISIS doesn’t need.
� I’d like to beat that @HillaryClinton. She is a horror. I told my supporter Putin to say that all the time. He
has been amazing.
� I buy Hillary, it’s beautiful and I’m happy about it.
Hayes 2016 6
![Page 20: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/20.jpg)
DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
� We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
� The biggest risk to the world, is me, believe it or not.
� I am what ISIS doesn’t need.
� I’d like to beat that @HillaryClinton. She is a horror. I told my supporter Putin to say that all the time. He
has been amazing.
� I buy Hillary, it’s beautiful and I’m happy about it.
Hayes 2016 6
![Page 21: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/21.jpg)
DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
� We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
� The biggest risk to the world, is me, believe it or not.
� I am what ISIS doesn’t need.
� I’d like to beat that @HillaryClinton. She is a horror. I told my supporter Putin to say that all the time. He
has been amazing.
� I buy Hillary, it’s beautiful and I’m happy about it.
Hayes 2016 6
![Page 22: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/22.jpg)
Atari Player by Google DeepMind
https://youtu.be/0X-NdPtFKq0?t=21m13s
Mnih et al. 2015 7
![Page 24: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/24.jpg)
Heads-up Limit Holdem Poker Is Solved!
Cepheus http://poker.srv.ualberta.ca/
0.000986 big blinds per game on expectation
Bowling et al. 2015 8
![Page 25: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/25.jpg)
Heads-up Limit Holdem Poker Is Solved!
Cepheus http://poker.srv.ualberta.ca/
0.000986 big blinds per game on expectation
Bowling et al. 2015 8
![Page 26: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/26.jpg)
Basics of Machine Learning
![Page 27: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/27.jpg)
https://dataaspirant.com/2014/09/19/supervised-and-unsupervised-learning/ 8
![Page 28: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/28.jpg)
Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 9
![Page 29: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/29.jpg)
Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 9
![Page 30: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/30.jpg)
Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 9
![Page 31: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/31.jpg)
Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 9
![Page 32: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/32.jpg)
Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 9
![Page 33: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/33.jpg)
Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 9
![Page 34: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/34.jpg)
Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 9
![Page 35: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/35.jpg)
Regression
9
![Page 36: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/36.jpg)
Regression
9
![Page 37: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/37.jpg)
Mathematical Regression
https://thermanuals.wordpress.com/descriptive-analysis/sampling-and-regression/10
![Page 38: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/38.jpg)
Classification
https://kevinbinz.files.wordpress.com/2014/08/ml-svm-after-comparison.png 11
![Page 39: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/39.jpg)
Underfitting and Overfitting
Beware of overfitting!
It is like learning for a mathematical exam by memorizing proofs.
https://www.researchgate.net/post/How_to_Avoid_Overfitting 12
![Page 40: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/40.jpg)
Underfitting and Overfitting
Beware of overfitting!
It is like learning for a mathematical exam by memorizing proofs.
https://www.researchgate.net/post/How_to_Avoid_Overfitting 12
![Page 41: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/41.jpg)
Underfitting and Overfitting
Beware of overfitting!
It is like learning for a mathematical exam by memorizing proofs.
https://www.researchgate.net/post/How_to_Avoid_Overfitting 12
![Page 42: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/42.jpg)
Reinforcement Learning (RL)
Specially: games of self-play
https://youtu.be/0X-NdPtFKq0?t=16m57s 13
![Page 43: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/43.jpg)
Reinforcement Learning (RL)
Specially: games of self-play
https://youtu.be/0X-NdPtFKq0?t=16m57s 13
![Page 44: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/44.jpg)
Monte Carlo Tree Search
![Page 45: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/45.jpg)
Tree Search
Optimal value v∗(s) determines the outcome of the game:
� from every board position or state s
� under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
� b is the games breadth (number of legal moves per position)
� d is its depth (game length)
Silver et al. 2016 14
![Page 46: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/46.jpg)
Tree Search
Optimal value v∗(s) determines the outcome of the game:
� from every board position or state s
� under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
� b is the games breadth (number of legal moves per position)
� d is its depth (game length)
Silver et al. 2016 14
![Page 47: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/47.jpg)
Tree Search
Optimal value v∗(s) determines the outcome of the game:
� from every board position or state s
� under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
� b is the games breadth (number of legal moves per position)
� d is its depth (game length)
Silver et al. 2016 14
![Page 48: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/48.jpg)
Tree Search
Optimal value v∗(s) determines the outcome of the game:
� from every board position or state s
� under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
� b is the games breadth (number of legal moves per position)
� d is its depth (game length)
Silver et al. 2016 14
![Page 49: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/49.jpg)
Tree Search
Optimal value v∗(s) determines the outcome of the game:
� from every board position or state s
� under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
� b is the games breadth (number of legal moves per position)
� d is its depth (game length)
Silver et al. 2016 14
![Page 50: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/50.jpg)
Tree Search
Optimal value v∗(s) determines the outcome of the game:
� from every board position or state s
� under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
� b is the games breadth (number of legal moves per position)
� d is its depth (game length)
Silver et al. 2016 14
![Page 51: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/51.jpg)
Tree Search
Optimal value v∗(s) determines the outcome of the game:
� from every board position or state s
� under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
� b is the games breadth (number of legal moves per position)
� d is its depth (game length)
Silver et al. 2016 14
![Page 52: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/52.jpg)
Game tree of Go
Sizes of trees for various games:
� chess: b ≈ 35, d ≈ 80
� Go: b ≈ 250, d ≈ 150
⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100] times more complex
than chess.
https://deepmind.com/alpha-go.htmlHow to handle the size of the game tree?
� for the breadth: a neural network to select moves
� for the depth: a neural network to evaluate the current
position
� for the tree traverse: Monte Carlo tree search (MCTS)
Allis et al. 1994 15
![Page 53: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/53.jpg)
Game tree of Go
Sizes of trees for various games:
� chess: b ≈ 35, d ≈ 80
� Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100] times more complex
than chess.
https://deepmind.com/alpha-go.htmlHow to handle the size of the game tree?
� for the breadth: a neural network to select moves
� for the depth: a neural network to evaluate the current
position
� for the tree traverse: Monte Carlo tree search (MCTS)
Allis et al. 1994 15
![Page 54: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/54.jpg)
Game tree of Go
Sizes of trees for various games:
� chess: b ≈ 35, d ≈ 80
� Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100] times more complex
than chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
� for the breadth: a neural network to select moves
� for the depth: a neural network to evaluate the current
position
� for the tree traverse: Monte Carlo tree search (MCTS)
Allis et al. 1994 15
![Page 55: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/55.jpg)
Game tree of Go
Sizes of trees for various games:
� chess: b ≈ 35, d ≈ 80
� Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100] times more complex
than chess.
https://deepmind.com/alpha-go.htmlHow to handle the size of the game tree?
� for the breadth: a neural network to select moves
� for the depth: a neural network to evaluate the current
position
� for the tree traverse: Monte Carlo tree search (MCTS)
Allis et al. 1994 15
![Page 56: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/56.jpg)
Game tree of Go
Sizes of trees for various games:
� chess: b ≈ 35, d ≈ 80
� Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100] times more complex
than chess.
https://deepmind.com/alpha-go.htmlHow to handle the size of the game tree?
� for the breadth: a neural network to select moves
� for the depth: a neural network to evaluate the current
position
� for the tree traverse: Monte Carlo tree search (MCTS)
Allis et al. 1994 15
![Page 57: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/57.jpg)
Game tree of Go
Sizes of trees for various games:
� chess: b ≈ 35, d ≈ 80
� Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100] times more complex
than chess.
https://deepmind.com/alpha-go.htmlHow to handle the size of the game tree?
� for the breadth: a neural network to select moves
� for the depth: a neural network to evaluate the current
position
� for the tree traverse: Monte Carlo tree search (MCTS)
Allis et al. 1994 15
![Page 58: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/58.jpg)
Game tree of Go
Sizes of trees for various games:
� chess: b ≈ 35, d ≈ 80
� Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100] times more complex
than chess.
https://deepmind.com/alpha-go.htmlHow to handle the size of the game tree?
� for the breadth: a neural network to select moves
� for the depth: a neural network to evaluate the current
position
� for the tree traverse: Monte Carlo tree search (MCTS)Allis et al. 1994 15
![Page 59: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/59.jpg)
Monte Carlo tree search
16
![Page 60: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/60.jpg)
Neural networks
![Page 61: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/61.jpg)
Neural Networks (NN): Inspiration
� inspired by the neuronal structure of the mammalian cerebral
cortex
� but on much smaller scales� suitable to model systems with a high tolerance to error
� e.g. audio or image recognition
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 17
![Page 62: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/62.jpg)
Neural Networks (NN): Inspiration
� inspired by the neuronal structure of the mammalian cerebral
cortex
� but on much smaller scales� suitable to model systems with a high tolerance to error
� e.g. audio or image recognition
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 17
![Page 63: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/63.jpg)
Neural Networks (NN): Inspiration
� inspired by the neuronal structure of the mammalian cerebral
cortex
� but on much smaller scales
� suitable to model systems with a high tolerance to error
� e.g. audio or image recognition
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 17
![Page 64: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/64.jpg)
Neural Networks (NN): Inspiration
� inspired by the neuronal structure of the mammalian cerebral
cortex
� but on much smaller scales� suitable to model systems with a high tolerance to error
� e.g. audio or image recognition
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 17
![Page 65: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/65.jpg)
Neural Networks (NN): Inspiration
� inspired by the neuronal structure of the mammalian cerebral
cortex
� but on much smaller scales� suitable to model systems with a high tolerance to error
� e.g. audio or image recognitionhttp://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 17
![Page 66: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/66.jpg)
Neural Networks: Modes
Two modes
� feedforward for making predictions
� backpropagation for learning
Dieterle 2003 18
![Page 67: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/67.jpg)
Neural Networks: Modes
Two modes
� feedforward for making predictions
� backpropagation for learning
Dieterle 2003 18
![Page 68: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/68.jpg)
Neural Networks: Modes
Two modes
� feedforward for making predictions
� backpropagation for learning
Dieterle 2003 18
![Page 69: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/69.jpg)
Neural Networks: Modes
Two modes
� feedforward for making predictions
� backpropagation for learningDieterle 2003 18
![Page 70: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/70.jpg)
Neural Networks: an Example of Feedforward
http://stevenmiller888.github.io/mind-how-to-build-a-neural-network/ 19
![Page 71: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/71.jpg)
Gradient Descent in Neural Networks
Motto: ”Learn by mistakes!”
However, error functions are not necessarily convex or so “smooth”.
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
![Page 72: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/72.jpg)
Gradient Descent in Neural Networks
Motto: ”Learn by mistakes!”
However, error functions are not necessarily convex or so “smooth”.
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
![Page 73: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/73.jpg)
Gradient Descent in Neural Networks
Motto: ”Learn by mistakes!”
However, error functions are not necessarily convex or so “smooth”.
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
![Page 75: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/75.jpg)
Convolutional Neural Networks (CNN or ConvNet)
http://code.flickr.net/2014/10/20/introducing-flickr-park-or-bird/ 21
![Page 76: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/76.jpg)
(Deep) Convolutional Neural Networks
The hierarchy of concepts is captured in the number of layers: the deep in “Deep Learning”.
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 22
![Page 77: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/77.jpg)
(Deep) Convolutional Neural Networks
The hierarchy of concepts is captured in the number of layers: the deep in “Deep Learning”.
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 22
![Page 78: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/78.jpg)
Rules of Go
![Page 79: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/79.jpg)
Backgammon: Man vs. Fate
Chess: Man vs. Man
22
![Page 80: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/80.jpg)
Backgammon: Man vs. Fate
Chess: Man vs. Man
22
![Page 81: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/81.jpg)
Go: Man vs. Self
Robert Samal (White) versus Karel Kral (Black), Spring School of Combinatorics 2016 22
![Page 82: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/82.jpg)
Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
Handicap for difference in ranks: Black can place 1 or more stones
in advance (compensation for White’s greater strength).
23
![Page 83: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/83.jpg)
Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
Handicap for difference in ranks: Black can place 1 or more stones
in advance (compensation for White’s greater strength).
23
![Page 84: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/84.jpg)
Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
Handicap for difference in ranks: Black can place 1 or more stones
in advance (compensation for White’s greater strength).
23
![Page 85: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/85.jpg)
Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
Handicap for difference in ranks: Black can place 1 or more stones
in advance (compensation for White’s greater strength).
23
![Page 86: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/86.jpg)
Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
Handicap for difference in ranks: Black can place 1 or more stones
in advance (compensation for White’s greater strength).
23
![Page 87: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/87.jpg)
Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
Handicap for difference in ranks: Black can place 1 or more stones
in advance (compensation for White’s greater strength). 23
![Page 88: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/88.jpg)
Scoring Rules: Area Scoring
A player’s score is:
� the number of stones that the player has on the board
� plus the number of empty intersections surrounded by that
player’s stones
� plus komi(dashi) points for the White player
which is a compensation for the first move advantage of the Black player
https://en.wikipedia.org/wiki/Go_(game) 24
![Page 89: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/89.jpg)
Scoring Rules: Area Scoring
A player’s score is:
� the number of stones that the player has on the board
� plus the number of empty intersections surrounded by that
player’s stones
� plus komi(dashi) points for the White player
which is a compensation for the first move advantage of the Black player
https://en.wikipedia.org/wiki/Go_(game) 24
![Page 90: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/90.jpg)
Scoring Rules: Area Scoring
A player’s score is:
� the number of stones that the player has on the board
� plus the number of empty intersections surrounded by that
player’s stones
� plus komi(dashi) points for the White player
which is a compensation for the first move advantage of the Black player
https://en.wikipedia.org/wiki/Go_(game) 24
![Page 91: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/91.jpg)
Scoring Rules: Area Scoring
A player’s score is:
� the number of stones that the player has on the board
� plus the number of empty intersections surrounded by that
player’s stones
� plus komi(dashi) points for the White player
which is a compensation for the first move advantage of the Black player
https://en.wikipedia.org/wiki/Go_(game) 24
![Page 92: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/92.jpg)
Ranks of Players
Kyu and Dan ranks
or alternatively, Elo ratings
https://en.wikipedia.org/wiki/Go_(game) 25
![Page 93: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/93.jpg)
Ranks of Players
Kyu and Dan ranks
or alternatively, Elo ratings
https://en.wikipedia.org/wiki/Go_(game) 25
![Page 94: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/94.jpg)
Chocolate micro-break
25
![Page 95: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/95.jpg)
AlphaGo: Inside Out
![Page 96: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/96.jpg)
Policy and Value Networks
Silver et al. 2016 26
![Page 97: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/97.jpg)
Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 27
![Page 98: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/98.jpg)
SL Policy Network (1/2)
� 13-layer deep convolutional neural network
� goal: to predict expert human moves
� task of classification
� trained from 30 millions positions from the KGS Go Server
� stochastic gradient ascent:
∆σ ∝ ∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
� 44.4% accuracy (the state-of-the-art from other groups)
� 55.7% accuracy (raw board position + move history as input)
� 57.0% accuracy (all input features)
Silver et al. 2016 28
![Page 99: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/99.jpg)
SL Policy Network (1/2)
� 13-layer deep convolutional neural network
� goal: to predict expert human moves
� task of classification
� trained from 30 millions positions from the KGS Go Server
� stochastic gradient ascent:
∆σ ∝ ∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
� 44.4% accuracy (the state-of-the-art from other groups)
� 55.7% accuracy (raw board position + move history as input)
� 57.0% accuracy (all input features)
Silver et al. 2016 28
![Page 100: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/100.jpg)
SL Policy Network (1/2)
� 13-layer deep convolutional neural network
� goal: to predict expert human moves
� task of classification
� trained from 30 millions positions from the KGS Go Server
� stochastic gradient ascent:
∆σ ∝ ∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
� 44.4% accuracy (the state-of-the-art from other groups)
� 55.7% accuracy (raw board position + move history as input)
� 57.0% accuracy (all input features)
Silver et al. 2016 28
![Page 101: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/101.jpg)
SL Policy Network (1/2)
� 13-layer deep convolutional neural network
� goal: to predict expert human moves
� task of classification
� trained from 30 millions positions from the KGS Go Server
� stochastic gradient ascent:
∆σ ∝ ∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
� 44.4% accuracy (the state-of-the-art from other groups)
� 55.7% accuracy (raw board position + move history as input)
� 57.0% accuracy (all input features)
Silver et al. 2016 28
![Page 102: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/102.jpg)
SL Policy Network (1/2)
� 13-layer deep convolutional neural network
� goal: to predict expert human moves
� task of classification
� trained from 30 millions positions from the KGS Go Server
� stochastic gradient ascent:
∆σ ∝ ∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
� 44.4% accuracy (the state-of-the-art from other groups)
� 55.7% accuracy (raw board position + move history as input)
� 57.0% accuracy (all input features)
Silver et al. 2016 28
![Page 103: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/103.jpg)
SL Policy Network (1/2)
� 13-layer deep convolutional neural network
� goal: to predict expert human moves
� task of classification
� trained from 30 millions positions from the KGS Go Server
� stochastic gradient ascent:
∆σ ∝ ∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
� 44.4% accuracy (the state-of-the-art from other groups)
� 55.7% accuracy (raw board position + move history as input)
� 57.0% accuracy (all input features)
Silver et al. 2016 28
![Page 104: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/104.jpg)
SL Policy Network (1/2)
� 13-layer deep convolutional neural network
� goal: to predict expert human moves
� task of classification
� trained from 30 millions positions from the KGS Go Server
� stochastic gradient ascent:
∆σ ∝ ∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
� 44.4% accuracy (the state-of-the-art from other groups)
� 55.7% accuracy (raw board position + move history as input)
� 57.0% accuracy (all input features)
Silver et al. 2016 28
![Page 105: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/105.jpg)
SL Policy Network (1/2)
� 13-layer deep convolutional neural network
� goal: to predict expert human moves
� task of classification
� trained from 30 millions positions from the KGS Go Server
� stochastic gradient ascent:
∆σ ∝ ∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
� 44.4% accuracy (the state-of-the-art from other groups)
� 55.7% accuracy (raw board position + move history as input)
� 57.0% accuracy (all input features)
Silver et al. 2016 28
![Page 106: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/106.jpg)
SL Policy Network (1/2)
� 13-layer deep convolutional neural network
� goal: to predict expert human moves
� task of classification
� trained from 30 millions positions from the KGS Go Server
� stochastic gradient ascent:
∆σ ∝ ∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
� 44.4% accuracy (the state-of-the-art from other groups)
� 55.7% accuracy (raw board position + move history as input)
� 57.0% accuracy (all input features)
Silver et al. 2016 28
![Page 107: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/107.jpg)
SL Policy Network (1/2)
� 13-layer deep convolutional neural network
� goal: to predict expert human moves
� task of classification
� trained from 30 millions positions from the KGS Go Server
� stochastic gradient ascent:
∆σ ∝ ∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
� 44.4% accuracy (the state-of-the-art from other groups)
� 55.7% accuracy (raw board position + move history as input)
� 57.0% accuracy (all input features)
Silver et al. 2016 28
![Page 108: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/108.jpg)
SL Policy Network (2/2)
Small improvements in accuracy led to large improvements
in playing strength
Silver et al. 2016 29
![Page 109: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/109.jpg)
Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 30
![Page 110: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/110.jpg)
Rollout Policy
� Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.
� accuracy of 24.2%
� It takes 2µs to select an action, compared to 3 ms in case
of SL policy network.
Silver et al. 2016 31
![Page 111: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/111.jpg)
Rollout Policy
� Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.
� accuracy of 24.2%
� It takes 2µs to select an action, compared to 3 ms in case
of SL policy network.
Silver et al. 2016 31
![Page 112: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/112.jpg)
Rollout Policy
� Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.
� accuracy of 24.2%
� It takes 2µs to select an action, compared to 3 ms in case
of SL policy network.
Silver et al. 2016 31
![Page 113: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/113.jpg)
Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 32
![Page 114: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/114.jpg)
RL Policy Network (1/2)
� identical in structure to the SL policy network
� goal: to win in the games of self-play
� task of classification
� weights ρ initialized to the same values, ρ := σ
� games of self-play
� between the current RL policy network and a randomly
selected previous iteration
� to prevent overfitting to the current policy
� stochastic gradient ascent:
∆ρ ∝ ∂ log pρ(at |st)
∂ρzt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
![Page 115: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/115.jpg)
RL Policy Network (1/2)
� identical in structure to the SL policy network
� goal: to win in the games of self-play
� task of classification
� weights ρ initialized to the same values, ρ := σ
� games of self-play
� between the current RL policy network and a randomly
selected previous iteration
� to prevent overfitting to the current policy
� stochastic gradient ascent:
∆ρ ∝ ∂ log pρ(at |st)
∂ρzt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
![Page 116: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/116.jpg)
RL Policy Network (1/2)
� identical in structure to the SL policy network
� goal: to win in the games of self-play
� task of classification
� weights ρ initialized to the same values, ρ := σ
� games of self-play
� between the current RL policy network and a randomly
selected previous iteration
� to prevent overfitting to the current policy
� stochastic gradient ascent:
∆ρ ∝ ∂ log pρ(at |st)
∂ρzt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
![Page 117: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/117.jpg)
RL Policy Network (1/2)
� identical in structure to the SL policy network
� goal: to win in the games of self-play
� task of classification
� weights ρ initialized to the same values, ρ := σ
� games of self-play
� between the current RL policy network and a randomly
selected previous iteration
� to prevent overfitting to the current policy
� stochastic gradient ascent:
∆ρ ∝ ∂ log pρ(at |st)
∂ρzt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
![Page 118: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/118.jpg)
RL Policy Network (1/2)
� identical in structure to the SL policy network
� goal: to win in the games of self-play
� task of classification
� weights ρ initialized to the same values, ρ := σ
� games of self-play
� between the current RL policy network and a randomly
selected previous iteration
� to prevent overfitting to the current policy
� stochastic gradient ascent:
∆ρ ∝ ∂ log pρ(at |st)
∂ρzt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
![Page 119: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/119.jpg)
RL Policy Network (1/2)
� identical in structure to the SL policy network
� goal: to win in the games of self-play
� task of classification
� weights ρ initialized to the same values, ρ := σ
� games of self-play
� between the current RL policy network and a randomly
selected previous iteration
� to prevent overfitting to the current policy
� stochastic gradient ascent:
∆ρ ∝ ∂ log pρ(at |st)
∂ρzt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
![Page 120: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/120.jpg)
RL Policy Network (1/2)
� identical in structure to the SL policy network
� goal: to win in the games of self-play
� task of classification
� weights ρ initialized to the same values, ρ := σ
� games of self-play
� between the current RL policy network and a randomly
selected previous iteration
� to prevent overfitting to the current policy
� stochastic gradient ascent:
∆ρ ∝ ∂ log pρ(at |st)
∂ρzt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
![Page 121: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/121.jpg)
RL Policy Network (1/2)
� identical in structure to the SL policy network
� goal: to win in the games of self-play
� task of classification
� weights ρ initialized to the same values, ρ := σ
� games of self-play
� between the current RL policy network and a randomly
selected previous iteration
� to prevent overfitting to the current policy
� stochastic gradient ascent:
∆ρ ∝ ∂ log pρ(at |st)
∂ρzt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
![Page 122: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/122.jpg)
RL Policy Network (1/2)
� identical in structure to the SL policy network
� goal: to win in the games of self-play
� task of classification
� weights ρ initialized to the same values, ρ := σ
� games of self-play
� between the current RL policy network and a randomly
selected previous iteration
� to prevent overfitting to the current policy
� stochastic gradient ascent:
∆ρ ∝ ∂ log pρ(at |st)
∂ρzt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
![Page 123: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/123.jpg)
RL Policy Network (1/2)
� identical in structure to the SL policy network
� goal: to win in the games of self-play
� task of classification
� weights ρ initialized to the same values, ρ := σ
� games of self-play
� between the current RL policy network and a randomly
selected previous iteration
� to prevent overfitting to the current policy
� stochastic gradient ascent:
∆ρ ∝ ∂ log pρ(at |st)
∂ρzt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
![Page 124: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/124.jpg)
RL Policy Network (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
� 80% of win rate against the SL policy network
� 85% of win rate against the strongest open-source Goprogram, Pachi (Baudis and Gailly 2011)
� The previous state-of-the-art, based only on SL of CNN:
11% of “win” rate against Pachi
Silver et al. 2016 34
![Page 125: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/125.jpg)
RL Policy Network (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
� 80% of win rate against the SL policy network
� 85% of win rate against the strongest open-source Goprogram, Pachi (Baudis and Gailly 2011)
� The previous state-of-the-art, based only on SL of CNN:
11% of “win” rate against Pachi
Silver et al. 2016 34
![Page 126: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/126.jpg)
RL Policy Network (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
� 80% of win rate against the SL policy network
� 85% of win rate against the strongest open-source Goprogram, Pachi (Baudis and Gailly 2011)
� The previous state-of-the-art, based only on SL of CNN:
11% of “win” rate against Pachi
Silver et al. 2016 34
![Page 127: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/127.jpg)
RL Policy Network (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
� 80% of win rate against the SL policy network
� 85% of win rate against the strongest open-source Goprogram, Pachi (Baudis and Gailly 2011)
� The previous state-of-the-art, based only on SL of CNN:
11% of “win” rate against Pachi
Silver et al. 2016 34
![Page 128: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/128.jpg)
Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 35
![Page 129: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/129.jpg)
Value Network (1/2)
� similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
� goal: to estimate a value function
vp(s) = E[zt |st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy p)
� Double approximation: vθ(s) ≈ vpρ(s) ≈ v∗(s).
� task of regression
� stochastic gradient descent:
∆θ ∝ ∂vθ(s)
∂θ(z − vθ(s))
(to minimize the mean squared error (MSE) between the predicted vθ(s) and the true z)
Silver et al. 2016 36
![Page 130: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/130.jpg)
Value Network (1/2)
� similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
� goal: to estimate a value function
vp(s) = E[zt |st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy p)
� Double approximation: vθ(s) ≈ vpρ(s) ≈ v∗(s).
� task of regression
� stochastic gradient descent:
∆θ ∝ ∂vθ(s)
∂θ(z − vθ(s))
(to minimize the mean squared error (MSE) between the predicted vθ(s) and the true z)
Silver et al. 2016 36
![Page 131: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/131.jpg)
Value Network (1/2)
� similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
� goal: to estimate a value function
vp(s) = E[zt |st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy p)
� Double approximation: vθ(s) ≈ vpρ(s) ≈ v∗(s).
� task of regression
� stochastic gradient descent:
∆θ ∝ ∂vθ(s)
∂θ(z − vθ(s))
(to minimize the mean squared error (MSE) between the predicted vθ(s) and the true z)
Silver et al. 2016 36
![Page 132: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/132.jpg)
Value Network (1/2)
� similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
� goal: to estimate a value function
vp(s) = E[zt |st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy p)
� Double approximation: vθ(s) ≈ vpρ(s) ≈ v∗(s).
� task of regression
� stochastic gradient descent:
∆θ ∝ ∂vθ(s)
∂θ(z − vθ(s))
(to minimize the mean squared error (MSE) between the predicted vθ(s) and the true z)
Silver et al. 2016 36
![Page 133: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/133.jpg)
Value Network (1/2)
� similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
� goal: to estimate a value function
vp(s) = E[zt |st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy p)
� Double approximation: vθ(s) ≈ vpρ(s) ≈ v∗(s).
� task of regression
� stochastic gradient descent:
∆θ ∝ ∂vθ(s)
∂θ(z − vθ(s))
(to minimize the mean squared error (MSE) between the predicted vθ(s) and the true z)
Silver et al. 2016 36
![Page 134: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/134.jpg)
Value Network (2/2)
Beware of overfitting!
� Consecutive positions are strongly correlated.
� Value network memorized the game outcomes, rather than
generalizing to new positions.
� Solution: generate 30 million (new) positions, each sampled
from a seperate game
� almost the accuracy of Monte Carlo rollouts (using pρ), but
15000 times less computation!
Silver et al. 2016 37
![Page 135: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/135.jpg)
Value Network (2/2)
Beware of overfitting!
� Consecutive positions are strongly correlated.
� Value network memorized the game outcomes, rather than
generalizing to new positions.
� Solution: generate 30 million (new) positions, each sampled
from a seperate game
� almost the accuracy of Monte Carlo rollouts (using pρ), but
15000 times less computation!
Silver et al. 2016 37
![Page 136: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/136.jpg)
Value Network (2/2)
Beware of overfitting!
� Consecutive positions are strongly correlated.
� Value network memorized the game outcomes, rather than
generalizing to new positions.
� Solution: generate 30 million (new) positions, each sampled
from a seperate game
� almost the accuracy of Monte Carlo rollouts (using pρ), but
15000 times less computation!
Silver et al. 2016 37
![Page 137: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/137.jpg)
Value Network (2/2)
Beware of overfitting!
� Consecutive positions are strongly correlated.
� Value network memorized the game outcomes, rather than
generalizing to new positions.
� Solution: generate 30 million (new) positions, each sampled
from a seperate game
� almost the accuracy of Monte Carlo rollouts (using pρ), but
15000 times less computation!
Silver et al. 2016 37
![Page 138: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/138.jpg)
Value Network (2/2)
Beware of overfitting!
� Consecutive positions are strongly correlated.
� Value network memorized the game outcomes, rather than
generalizing to new positions.
� Solution: generate 30 million (new) positions, each sampled
from a seperate game
� almost the accuracy of Monte Carlo rollouts (using pρ), but
15000 times less computation!
Silver et al. 2016 37
![Page 139: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/139.jpg)
Evaluation Accuracy in Various Stages of a Game
Move number is the number of moves that had been played in the given position.
Each position evaluated by:
� forward pass of the value network vθ
� 100 rollouts, played out using the corresponding policy
Silver et al. 2016 38
![Page 140: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/140.jpg)
Evaluation Accuracy in Various Stages of a Game
Move number is the number of moves that had been played in the given position.
Each position evaluated by:
� forward pass of the value network vθ
� 100 rollouts, played out using the corresponding policy
Silver et al. 2016 38
![Page 141: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/141.jpg)
Evaluation Accuracy in Various Stages of a Game
Move number is the number of moves that had been played in the given position.
Each position evaluated by:
� forward pass of the value network vθ
� 100 rollouts, played out using the corresponding policySilver et al. 2016 38
![Page 142: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/142.jpg)
Elo Ratings for Various Combinations of Networks
Silver et al. 2016 39
![Page 143: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/143.jpg)
The Main Algorithm
Silver et al. 2016 39
![Page 144: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/144.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 145: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/145.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 146: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/146.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 147: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/147.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 148: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/148.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 149: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/149.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 150: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/150.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 151: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/151.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 152: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/152.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 153: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/153.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 154: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/154.jpg)
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
� action value Q(s, a)
� visit count N(s, a)
� prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
![Page 155: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/155.jpg)
MCTS Algorithm: Selection
At each time step t, an action at is selected from state st
at = arg maxa
(Q(st , a) + u(st , a))
where bonus
u(st , a) ∝P(s, a)
1 + N(s, a)
Silver et al. 2016 41
![Page 156: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/156.jpg)
MCTS Algorithm: Selection
At each time step t, an action at is selected from state st
at = arg maxa
(Q(st , a) + u(st , a))
where bonus
u(st , a) ∝P(s, a)
1 + N(s, a)
Silver et al. 2016 41
![Page 157: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/157.jpg)
MCTS Algorithm: Selection
At each time step t, an action at is selected from state st
at = arg maxa
(Q(st , a) + u(st , a))
where bonus
u(st , a) ∝P(s, a)
1 + N(s, a)
Silver et al. 2016 41
![Page 158: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/158.jpg)
MCTS Algorithm: Expansion
A leaf position may be expanded (just once) by the SL policy network pσ .
The output probabilities are stored as priors P(s, a) := pσ(a|s).
Silver et al. 2016 42
![Page 159: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/159.jpg)
MCTS Algorithm: Expansion
A leaf position may be expanded (just once) by the SL policy network pσ .
The output probabilities are stored as priors P(s, a) := pσ(a|s).
Silver et al. 2016 42
![Page 160: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/160.jpg)
MCTS Algorithm: Expansion
A leaf position may be expanded (just once) by the SL policy network pσ .
The output probabilities are stored as priors P(s, a) := pσ(a|s).
Silver et al. 2016 42
![Page 161: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/161.jpg)
MCTS: Evaluation
� evaluation from the value network vθ(s)
� evaluation by the outcome z using the fast rollout policy pπ until the end of game
Using a mixing parameter λ, the final leaf evaluation V (s) is
V (s) = (1− λ)vθ(s) + λz
Silver et al. 2016 43
![Page 162: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/162.jpg)
MCTS: Evaluation
� evaluation from the value network vθ(s)
� evaluation by the outcome z using the fast rollout policy pπ until the end of game
Using a mixing parameter λ, the final leaf evaluation V (s) is
V (s) = (1− λ)vθ(s) + λz
Silver et al. 2016 43
![Page 163: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/163.jpg)
MCTS: Evaluation
� evaluation from the value network vθ(s)
� evaluation by the outcome z using the fast rollout policy pπ until the end of game
Using a mixing parameter λ, the final leaf evaluation V (s) is
V (s) = (1− λ)vθ(s) + λz
Silver et al. 2016 43
![Page 164: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/164.jpg)
MCTS: Evaluation
� evaluation from the value network vθ(s)
� evaluation by the outcome z using the fast rollout policy pπ until the end of game
Using a mixing parameter λ, the final leaf evaluation V (s) is
V (s) = (1− λ)vθ(s) + λz
Silver et al. 2016 43
![Page 165: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/165.jpg)
MCTS: Evaluation
� evaluation from the value network vθ(s)
� evaluation by the outcome z using the fast rollout policy pπ until the end of game
Using a mixing parameter λ, the final leaf evaluation V (s) is
V (s) = (1− λ)vθ(s) + λz
Silver et al. 2016 43
![Page 166: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/166.jpg)
MCTS: Backup
At the end of simulation, each traversed edge is updated by accumulating:
� the action values Q
� visit counts N
Silver et al. 2016 44
![Page 167: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/167.jpg)
MCTS: Backup
At the end of simulation, each traversed edge is updated by accumulating:
� the action values Q
� visit counts N
Silver et al. 2016 44
![Page 168: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/168.jpg)
Once the search is complete, the algorithm
chooses the most visited move from the root
position.
Silver et al. 2016 44
![Page 169: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/169.jpg)
Percentage of Simulations
percentage frequency with which actions were selected from the root during simulations
Silver et al. 2016 45
![Page 170: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/170.jpg)
Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
� AlphaGo selected the move indicated by the red circle;
� Fan Hui responded with the move indicated by the white square;
� in his post-game commentary, he preferred the move (labelled 1) predicted by AlphaGo.
Silver et al. 2016 46
![Page 171: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/171.jpg)
Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
� AlphaGo selected the move indicated by the red circle;
� Fan Hui responded with the move indicated by the white square;
� in his post-game commentary, he preferred the move (labelled 1) predicted by AlphaGo.
Silver et al. 2016 46
![Page 172: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/172.jpg)
Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
� AlphaGo selected the move indicated by the red circle;
� Fan Hui responded with the move indicated by the white square;
� in his post-game commentary, he preferred the move (labelled 1) predicted by AlphaGo.
Silver et al. 2016 46
![Page 173: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/173.jpg)
Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
� AlphaGo selected the move indicated by the red circle;
� Fan Hui responded with the move indicated by the white square;
� in his post-game commentary, he preferred the move (labelled 1) predicted by AlphaGo.
Silver et al. 2016 46
![Page 174: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/174.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 175: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/175.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 176: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/176.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 177: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/177.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 178: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/178.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 179: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/179.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 180: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/180.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 181: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/181.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 182: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/182.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 183: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/183.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 184: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/184.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 185: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/185.jpg)
Scalability
� asynchronous multi-threaded search
� simulations on CPUs
� computation of neural networks on GPUs
AlphaGo:
� 40 search threads
� 40 CPUs
� 8 GPUs
Distributed version of AlphaGo (on multiple machines):
� 40 search threads
� 1202 CPUs
� 176 GPUs
Silver et al. 2016 47
![Page 186: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/186.jpg)
Elo Ratings for Various Combinations of Threads
Silver et al. 2016 48
![Page 187: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/187.jpg)
Results: the strength of AlphaGo
![Page 188: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/188.jpg)
Tournament with Other Go Programs
Silver et al. 2016 49
![Page 189: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/189.jpg)
Fan Hui
� professional 2 dan
� European Go Champion in 2013, 2014 and 2015
� European Professional Go Champion in 2016� biological neural network:
� 100 billion neurons
� 100 up to 1,000 trillion neuronal connections
https://en.wikipedia.org/wiki/Fan_Hui 50
![Page 190: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/190.jpg)
Fan Hui
� professional 2 dan
� European Go Champion in 2013, 2014 and 2015
� European Professional Go Champion in 2016� biological neural network:
� 100 billion neurons
� 100 up to 1,000 trillion neuronal connections
https://en.wikipedia.org/wiki/Fan_Hui 50
![Page 191: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/191.jpg)
Fan Hui
� professional 2 dan
� European Go Champion in 2013, 2014 and 2015
� European Professional Go Champion in 2016� biological neural network:
� 100 billion neurons
� 100 up to 1,000 trillion neuronal connections
https://en.wikipedia.org/wiki/Fan_Hui 50
![Page 192: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/192.jpg)
Fan Hui
� professional 2 dan
� European Go Champion in 2013, 2014 and 2015
� European Professional Go Champion in 2016
� biological neural network:
� 100 billion neurons
� 100 up to 1,000 trillion neuronal connections
https://en.wikipedia.org/wiki/Fan_Hui 50
![Page 193: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/193.jpg)
Fan Hui
� professional 2 dan
� European Go Champion in 2013, 2014 and 2015
� European Professional Go Champion in 2016� biological neural network:
� 100 billion neurons
� 100 up to 1,000 trillion neuronal connections
https://en.wikipedia.org/wiki/Fan_Hui 50
![Page 194: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/194.jpg)
Fan Hui
� professional 2 dan
� European Go Champion in 2013, 2014 and 2015
� European Professional Go Champion in 2016� biological neural network:
� 100 billion neurons
� 100 up to 1,000 trillion neuronal connections
https://en.wikipedia.org/wiki/Fan_Hui 50
![Page 195: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/195.jpg)
Fan Hui
� professional 2 dan
� European Go Champion in 2013, 2014 and 2015
� European Professional Go Champion in 2016� biological neural network:
� 100 billion neurons
� 100 up to 1,000 trillion neuronal connectionshttps://en.wikipedia.org/wiki/Fan_Hui 50
![Page 196: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/196.jpg)
AlphaGo versus Fan Hui
AlphaGo won 5:0 in a formal match on October 2015.
[AlphaGo] is very strong and stable, it seems
like a wall. ... I know AlphaGo is a computer,
but if no one told me, maybe I would think
the player was a little strange, but a very
strong player, a real person.
Fan Hui
51
![Page 197: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/197.jpg)
AlphaGo versus Fan Hui
AlphaGo won 5:0 in a formal match on October 2015.
[AlphaGo] is very strong and stable, it seems
like a wall. ... I know AlphaGo is a computer,
but if no one told me, maybe I would think
the player was a little strange, but a very
strong player, a real person.
Fan Hui
51
![Page 198: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/198.jpg)
AlphaGo versus Fan Hui
AlphaGo won 5:0 in a formal match on October 2015.
[AlphaGo] is very strong and stable, it seems
like a wall. ... I know AlphaGo is a computer,
but if no one told me, maybe I would think
the player was a little strange, but a very
strong player, a real person.
Fan Hui 51
![Page 199: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/199.jpg)
Lee Sedol “The Strong Stone”
� professional 9 dan
� the 2nd in international titles
� the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
� Lee Sedol would win 97 out of 100 games against Fan Hui.
� biological neural network comparable to Fan Hui’s (in number
of neurons and connections)
https://en.wikipedia.org/wiki/Lee_Sedol 52
![Page 200: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/200.jpg)
Lee Sedol “The Strong Stone”
� professional 9 dan
� the 2nd in international titles
� the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
� Lee Sedol would win 97 out of 100 games against Fan Hui.
� biological neural network comparable to Fan Hui’s (in number
of neurons and connections)
https://en.wikipedia.org/wiki/Lee_Sedol 52
![Page 201: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/201.jpg)
Lee Sedol “The Strong Stone”
� professional 9 dan
� the 2nd in international titles
� the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
� Lee Sedol would win 97 out of 100 games against Fan Hui.
� biological neural network comparable to Fan Hui’s (in number
of neurons and connections)
https://en.wikipedia.org/wiki/Lee_Sedol 52
![Page 202: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/202.jpg)
Lee Sedol “The Strong Stone”
� professional 9 dan
� the 2nd in international titles
� the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
� Lee Sedol would win 97 out of 100 games against Fan Hui.
� biological neural network comparable to Fan Hui’s (in number
of neurons and connections)
https://en.wikipedia.org/wiki/Lee_Sedol 52
![Page 203: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/203.jpg)
Lee Sedol “The Strong Stone”
� professional 9 dan
� the 2nd in international titles
� the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
� Lee Sedol would win 97 out of 100 games against Fan Hui.
� biological neural network comparable to Fan Hui’s (in number
of neurons and connections)
https://en.wikipedia.org/wiki/Lee_Sedol 52
![Page 204: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/204.jpg)
Lee Sedol “The Strong Stone”
� professional 9 dan
� the 2nd in international titles
� the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
� Lee Sedol would win 97 out of 100 games against Fan Hui.
� biological neural network comparable to Fan Hui’s (in number
of neurons and connections)https://en.wikipedia.org/wiki/Lee_Sedol 52
![Page 205: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/205.jpg)
I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4:1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
52
![Page 206: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/206.jpg)
I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4:1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
52
![Page 207: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/207.jpg)
I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4:1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
52
![Page 208: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/208.jpg)
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
Lee received $170,000 ($150,000 for participating in all the five
games, and an additional $20,000 for each game won).
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
![Page 209: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/209.jpg)
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
Lee received $170,000 ($150,000 for participating in all the five
games, and an additional $20,000 for each game won).
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
![Page 210: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/210.jpg)
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
Lee received $170,000 ($150,000 for participating in all the five
games, and an additional $20,000 for each game won).
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
![Page 211: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/211.jpg)
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
Lee received $170,000 ($150,000 for participating in all the five
games, and an additional $20,000 for each game won).
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
![Page 212: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/212.jpg)
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
Lee received $170,000 ($150,000 for participating in all the five
games, and an additional $20,000 for each game won).
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
![Page 213: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/213.jpg)
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
Lee received $170,000 ($150,000 for participating in all the five
games, and an additional $20,000 for each game won).
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
![Page 214: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/214.jpg)
Who’s next?
53
![Page 216: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/216.jpg)
AlphaGo versus Ke Jie?
� professional 9 dan
� the 1st in (unofficial) world ranking list
� the youngest player to win 3 major international tournaments
� head-to-head record against Lee Sedol 8:2
� biological neural network comparable to Fan Hui’s, and thus
by transitivity, also comparable to Lee Sedol’s
https://en.wikipedia.org/wiki/Ke_Jie 54
![Page 217: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/217.jpg)
AlphaGo versus Ke Jie?
� professional 9 dan
� the 1st in (unofficial) world ranking list
� the youngest player to win 3 major international tournaments
� head-to-head record against Lee Sedol 8:2
� biological neural network comparable to Fan Hui’s, and thus
by transitivity, also comparable to Lee Sedol’s
https://en.wikipedia.org/wiki/Ke_Jie 54
![Page 218: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/218.jpg)
AlphaGo versus Ke Jie?
� professional 9 dan
� the 1st in (unofficial) world ranking list
� the youngest player to win 3 major international tournaments
� head-to-head record against Lee Sedol 8:2
� biological neural network comparable to Fan Hui’s, and thus
by transitivity, also comparable to Lee Sedol’s
https://en.wikipedia.org/wiki/Ke_Jie 54
![Page 219: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/219.jpg)
AlphaGo versus Ke Jie?
� professional 9 dan
� the 1st in (unofficial) world ranking list
� the youngest player to win 3 major international tournaments
� head-to-head record against Lee Sedol 8:2
� biological neural network comparable to Fan Hui’s, and thus
by transitivity, also comparable to Lee Sedol’s
https://en.wikipedia.org/wiki/Ke_Jie 54
![Page 220: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/220.jpg)
AlphaGo versus Ke Jie?
� professional 9 dan
� the 1st in (unofficial) world ranking list
� the youngest player to win 3 major international tournaments
� head-to-head record against Lee Sedol 8:2
� biological neural network comparable to Fan Hui’s, and thus
by transitivity, also comparable to Lee Sedol’s
https://en.wikipedia.org/wiki/Ke_Jie 54
![Page 221: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/221.jpg)
AlphaGo versus Ke Jie?
� professional 9 dan
� the 1st in (unofficial) world ranking list
� the youngest player to win 3 major international tournaments
� head-to-head record against Lee Sedol 8:2
� biological neural network comparable to Fan Hui’s, and thus
by transitivity, also comparable to Lee Sedol’s
https://en.wikipedia.org/wiki/Ke_Jie 54
![Page 222: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/222.jpg)
I believe I can beat it. Machines can be very
strong in many aspects but still have
loopholes in certain calculations.
Ke Jie
Now facing AlphaGo, I do not feel the same
strong instinct of victory when I play a
human player, but I still believe I have the
advantage against it. It’s 60 percent in
favor of me.
Ke Jie
Even though AlphaGo may have defeated
Lee Sedol, it won’t beat me.
Ke Jie
54
![Page 223: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/223.jpg)
I believe I can beat it. Machines can be very
strong in many aspects but still have
loopholes in certain calculations.
Ke Jie
Now facing AlphaGo, I do not feel the same
strong instinct of victory when I play a
human player, but I still believe I have the
advantage against it. It’s 60 percent in
favor of me.
Ke Jie
Even though AlphaGo may have defeated
Lee Sedol, it won’t beat me.
Ke Jie
54
![Page 224: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/224.jpg)
I believe I can beat it. Machines can be very
strong in many aspects but still have
loopholes in certain calculations.
Ke Jie
Now facing AlphaGo, I do not feel the same
strong instinct of victory when I play a
human player, but I still believe I have the
advantage against it. It’s 60 percent in
favor of me.
Ke Jie
Even though AlphaGo may have defeated
Lee Sedol, it won’t beat me.
Ke Jie
54
![Page 225: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/225.jpg)
Conclusion
![Page 226: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/226.jpg)
Difficulties of Go
� challenging decision-making
� intractable search space
� complex optimal solution
It appears infeasible to directly approximate using a policy or value function!
Silver et al. 2016 55
![Page 227: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/227.jpg)
Difficulties of Go
� challenging decision-making
� intractable search space
� complex optimal solution
It appears infeasible to directly approximate using a policy or value function!
Silver et al. 2016 55
![Page 228: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/228.jpg)
Difficulties of Go
� challenging decision-making
� intractable search space
� complex optimal solution
It appears infeasible to directly approximate using a policy or value function!
Silver et al. 2016 55
![Page 229: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/229.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 230: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/230.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 231: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/231.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 232: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/232.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 233: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/233.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 234: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/234.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 235: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/235.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 236: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/236.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 237: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/237.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 238: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/238.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 239: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/239.jpg)
AlphaGo: summary
� Monte Carlo tree search
� effective move selection and position evaluation
� through deep convolutional neural networks
� trained by novel combination of supervised and reinforcement
learning
� new search algorithm combining
� neural network evaluation
� Monte Carlo rollouts
� scalable implementation
� multi-threaded simulations on CPUs
� parallel GPU computations
� distributed version over multiple machines
Silver et al. 2016 56
![Page 240: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/240.jpg)
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
� selecting those positions more intelligently (policy network)
� evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 57
![Page 241: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/241.jpg)
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
� selecting those positions more intelligently (policy network)
� evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 57
![Page 242: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/242.jpg)
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
� selecting those positions more intelligently (policy network)
� evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 57
![Page 243: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/243.jpg)
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
� selecting those positions more intelligently (policy network)
� evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 57
![Page 244: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/244.jpg)
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
� selecting those positions more intelligently (policy network)
� evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 57
![Page 245: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/245.jpg)
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
� selecting those positions more intelligently (policy network)
� evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 57
![Page 246: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/246.jpg)
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
� selecting those positions more intelligently (policy network)
� evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 57
![Page 247: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/247.jpg)
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
� selecting those positions more intelligently (policy network)
� evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 57
![Page 248: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/248.jpg)
Thank you!
Questions?
57
![Page 249: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/249.jpg)
Backup Slides
![Page 250: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/250.jpg)
Input features for rollout and tree policy
Silver et al. 2016
![Page 251: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/251.jpg)
Selection of Moves by the SL Policy Network
move probabilities taken directly from the SL policy network pσ (reported as a percentage if above 0.1%).
Silver et al. 2016
![Page 252: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/252.jpg)
Selection of Moves by the Value Network
evaluation of all successors s′ of the root position s, using vθ(s)
Silver et al. 2016
![Page 253: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/253.jpg)
Tree Evaluation from Value Network
action values Q(s, a) for each tree-edge (s, a) from root position s (averaged over value network evaluations only)
Silver et al. 2016
![Page 254: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/254.jpg)
Tree Evaluation from Rollouts
action values Q(s, a), averaged over rollout evaluations only
Silver et al. 2016
![Page 255: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/255.jpg)
Results of a tournament between different Go programs
Silver et al. 2016
![Page 256: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/256.jpg)
Results of a tournament between AlphaGo and distributed Al-
phaGo, testing scalability with hardware
Silver et al. 2016
![Page 257: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/257.jpg)
AlphaGo versus Fan Hui: Game 1
Silver et al. 2016
![Page 258: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/258.jpg)
AlphaGo versus Fan Hui: Game 2
Silver et al. 2016
![Page 259: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/259.jpg)
AlphaGo versus Fan Hui: Game 3
Silver et al. 2016
![Page 260: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/260.jpg)
AlphaGo versus Fan Hui: Game 4
Silver et al. 2016
![Page 261: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/261.jpg)
AlphaGo versus Fan Hui: Game 5
Silver et al. 2016
![Page 262: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/262.jpg)
AlphaGo versus Lee Sedol: Game 1
https://youtu.be/vFr3K2DORc8
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
![Page 263: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/263.jpg)
AlphaGo versus Lee Sedol: Game 2 (1/2)
https://youtu.be/l-GsfyVCBu0
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
![Page 264: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/264.jpg)
AlphaGo versus Lee Sedol: Game 2 (2/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
![Page 265: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/265.jpg)
AlphaGo versus Lee Sedol: Game 3
https://youtu.be/qUAmTYHEyM8
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
![Page 266: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/266.jpg)
AlphaGo versus Lee Sedol: Game 4
https://youtu.be/yCALyQRN3hw
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
![Page 267: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/267.jpg)
AlphaGo versus Lee Sedol: Game 5 (1/2)
https://youtu.be/mzpW10DPHeQ
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
![Page 268: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/268.jpg)
AlphaGo versus Lee Sedol: Game 5 (2/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
![Page 269: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/269.jpg)
Further Reading I
AlphaGo:
� Google Research Blog
http://googleresearch.blogspot.cz/2016/01/alphago-mastering-ancient-game-of-go.html
� an article in Nature
http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234
� a reddit article claiming that AlphaGo is even stronger than it appears to be:
“AlphaGo would rather win by less points, but with higher probability.”
https://www.reddit.com/r/baduk/comments/49y17z/the_true_strength_of_alphago/
� a video of how AlphaGo works (put in layman’s terms) https://youtu.be/qWcfiPi9gUU
Articles by Google DeepMind:
� Atari player: a DeepRL system which combines Deep Neural Networks with Reinforcement Learning (Mnih
et al. 2015)
� Neural Turing Machines (Graves, Wayne, and Danihelka 2014)
Artificial Intelligence:
� Artificial Intelligence course at MIT
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/
6-034-artificial-intelligence-fall-2010/index.htm
![Page 270: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/270.jpg)
Further Reading II
� Introduction to Artificial Intelligence at Udacity
https://www.udacity.com/course/intro-to-artificial-intelligence--cs271
� General Game Playing course https://www.coursera.org/course/ggp
� Singularity http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html + Part 2
� The Singularity Is Near (Kurzweil 2005)
Combinatorial Game Theory (founded by John H. Conway to study endgames in Go):
� Combinatorial Game Theory course https://www.coursera.org/learn/combinatorial-game-theory
� On Numbers and Games (Conway 1976)
� Computer Go as a sum of local games: an application of combinatorial game theory (Muller 1995)
Chess:
� Deep Blue beats G. Kasparov in 1997 https://youtu.be/NJarxpYyoFI
Machine Learning:
� Machine Learning course
https://youtu.be/hPKJBXkyTK://www.coursera.org/learn/machine-learning/
� Reinforcement Learning http://reinforcementlearning.ai-depot.com/
� Deep Learning (LeCun, Bengio, and Hinton 2015)
![Page 271: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/271.jpg)
Further Reading III
� Deep Learning course https://www.udacity.com/course/deep-learning--ud730
� Two Minute Papers https://www.youtube.com/user/keeroyz
� Applications of Deep Learning https://youtu.be/hPKJBXkyTKM
Neuroscience:
� http://www.brainfacts.org/
![Page 272: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/272.jpg)
References I
Allis, Louis Victor et al. (1994). Searching for solutions in games and artificial intelligence. Ponsen & Looijen.
Baudis, Petr and Jean-loup Gailly (2011). “Pachi: State of the art open source Go program”. In: Advances in
Computer Games. Springer, pp. 24–38.
Bowling, Michael et al. (2015). “Heads-up limit holdem poker is solved”. In: Science 347.6218, pp. 145–149. url:
http://poker.cs.ualberta.ca/15science.html.
Champandard, Alex J (2016). “Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks”. In:
arXiv preprint arXiv:1603.01768.
Conway, John Horton (1976). “On Numbers and Games”. In: London Mathematical Society Monographs 6.
Dieterle, Frank Jochen (2003). “Multianalyte quantifications by means of integration of artificial neural networks,
genetic algorithms and chemometrics for time-resolved analytical data”. PhD thesis. Universitat Tubingen.
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge (2015). “A Neural Algorithm of Artistic Style”. In:
CoRR abs/1508.06576. url: http://arxiv.org/abs/1508.06576.
Graves, Alex, Greg Wayne, and Ivo Danihelka (2014). “Neural turing machines”. In: arXiv preprint
arXiv:1410.5401.
Hayes, Bradley (2016). url: https://twitter.com/deepdrumpf.
Karpathy, Andrej (2015). The Unreasonable Effectiveness of Recurrent Neural Networks. url:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/ (visited on 04/01/2016).
![Page 273: AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search](https://reader031.vdocuments.net/reader031/viewer/2022030305/587071321a28ab48378b7969/html5/thumbnails/273.jpg)
References II
Kurzweil, Ray (2005). The singularity is near: When humans transcend biology. Penguin.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015). “Deep learning”. In: Nature 521.7553, pp. 436–444.
Li, Chuan and Michael Wand (2016). “Combining Markov Random Fields and Convolutional Neural Networks for
Image Synthesis”. In: CoRR abs/1601.04589. url: http://arxiv.org/abs/1601.04589.
Mnih, Volodymyr et al. (2015). “Human-level control through deep reinforcement learning”. In: Nature 518.7540,
pp. 529–533. url:
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf.
Muller, Martin (1995). “Computer Go as a sum of local games: an application of combinatorial game theory”.
PhD thesis. TU Graz.
Silver, David et al. (2016). “Mastering the game of Go with deep neural networks and tree search”. In: Nature
529.7587, pp. 484–489.