a scalable machine learning approach to go pierre baldi and lin wu uc irvine

A Scalable Machine Learning Approach to Go

Pierre Baldi and Lin WuUC Irvine

Contents

• Introduction on Go• Existing approaches• Our approach• Results• Conclusion & Future work

What is Go?

What is Go?

• Black & white play alternatively

• Stones with zero liberty will be removed

• The one who has more territory wins

Why is Go interested?

• Go is a hard game for computer.– The best Go computer programs are easily

defeated by an average human amateur• Board games have expert-level programs

– Chess: Deep blue (1997) & FRITZ (2002)– Checker: Chinook (1994)– Othello (Reversi): Logistello (2002)– Backgammon: TD-GAMMON (1992)

Why is Go interested for AI?

• Poses unique opportunities and challenges for AI and machine learning– Hard to build high quality evaluation function– Big branching factor, 200-300, compared with

35-40 for chess

Existing approaches

• Hard-coded programs• Evaluate the next move by playing large

number of random games• Use machine learning to learn the

evaluation functions

Existing approaches ── hard-coded programs• Hand-tailored pattern libraries• Hard-coded rules to choose among multiple

hits• Tactical search (or reading)• E.g. “Many Faces of Go”, “GnuGo”

Existing approaches ── hard-coded programs• Pros:

– Good performance• Cons:

– Intensive manual work– Pattern library is not complete– Hard to manage and improve

Existing approaches ── Random games• Play huge number of random games from

given position• Use the results of games to evaluate all the

legal moves• Choose the legal move with best evaluation• E.g: Gobble, Go81

Existing approaches ── Random games• Pros

– Easy to implement– Reasonable performance

• Cons– Small boards only, cannot scale to normal

board

Existing approaches ── Machine learning• Schraudolph et al., 1994

– TD0– Neural Network

• Graepel et al., 2001– Condensed graph by common fate property– SVM

• Stern, Graepel, and MacKay, 2005– Conditional Markov random field

Existing approaches ── Machine learning• Pros:

– Learn automatically • Cons:

– Poor performance

Out approach

• Use scalable algorithms to learn high quality evaluation functions automatically

• Imitate human evaluating process

Our approach ── Human evaluating process• Three key components

– The understanding of patterns– The ability to combine patterns– The ability to relate strategic rewards to tactical

ones

Our approach ── System components• 3x3 pattern library

– Learn tactical patterns automatically• A structure-rich Recursive Neural Network

– Propagate interaction between patterns– Learn the correlation between strategic rewards

(Targets) and tactical reward (Inputs)

Our approach ── RNN architecture

• Six planes– One input plane– One output plane– Four Hidden Planes

Our approach ── Update sequence

sharing.by weight function same the are and ,,, where

),,(

),,(

),,(

),,(

),,,,(

1,,1,,

1,,1,,

1,,1,,

1,,1,,

,,,,,,

SESWNWNE

SEji

SEjijiSE

SEji

SWji

SWjijiSW

SWji

NWji

NWjijiNW

NWji

NEji

NEjijiNE

NEji

SEji

SWji

NEji

NWjijioji

NNNN

HHINH

HHINH

HHINH

HHINH

HHHHINO

Our approach ── Provide relevant inputs• For intersections

– Intersection type: black, white, or empty– Influence: influence from the same & opposite color– Pattern stability: a statistical value calculated from 3x3

patterns• For groups

– Number of eyes– Number of 1st, 2nd, 3rd, and 4th order liberties– Number of liberties of the 1st and 2nd weakest opponents

Our approach ── Pattern stability (I)• 9x9 board is split into 10 unique locations

for 3x3 patterns with mirror and rotation symmetries considered

• Stability is measured for each intersection of each pattern within each unique location.

Our approach ── Pattern stability (II)• Ten unique pattern locations

Our approach ── Pattern stability (III)

1 asset constant tion regulariza a is data. trainingin the games of end at the

on intersectiat stone (or white)black a with ends pattern that timesofnumber theis ) )((or )( where

)()()()()(

is pattern of point gridfor )(stability pattern The

Ci

ppNWpNB

CpNWpNBpNWpNBpS

pipS

ii

ii

iii

i

Our approach ── Pattern stability results (I)

Our approach ── Pattern stability results (II)

Results ── Validation error

Results ── Results on move predictions

Results ── Matched move (I)

Results ── Matched move (II)

Conclusion & Future work

a scalable machine learning approach to go pierre baldi and lin wu uc irvine

Documents