atari game state representation using convolutional neural networks

39
Training a Multi Layer Perceptron with Expert Data and Game State Representation using Convolutional Neural Networks JOHN STAMFORD MSC INTELLIGENT SYSTEMS AND ROBOTICS

Upload: johnstamford

Post on 07-Aug-2015

228 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Atari Game State Representation using Convolutional Neural Networks

Training a Multi Layer Perceptron with Expert Data and Game State Representation using Convolutional Neural Networks

JOHN STAMFORDMSC INTELLIGENT SYSTEMS AND ROBOTICS

Page 2: Atari Game State Representation using Convolutional Neural Networks

Contents Background and Initial Brief

Previous Work

Motivation

Technical Frameworks

State Representation

Testing

Results

Conclusion

Future work

Page 3: Atari Game State Representation using Convolutional Neural Networks

Background / Brief Based on a project by Google/Deepmind

Build an App to capture gameplay data◦ Users play Atari games on a mobile device◦ We capture the data (somehow)

Use the data in machine learning◦ Reduce the costliness nature of Reinforcement

Learning

Page 4: Atari Game State Representation using Convolutional Neural Networks

Deepmind Bought by Google for £400 million

“Playing Atari with Deep Reinforcement Learning” (2013)

General Agent◦ No prior knowledge of the environment◦ Inputs (States) and Outputs (Actions)◦ Learn Policies◦ Mapping States and Actions

Deep Reinforcement Learning

Deep Q Networks (DQN)

2015 Paper Release (with source code LUA)

Page 5: Atari Game State Representation using Convolutional Neural Networks

Motivation Starts the Q-Learning Sample Code

◦ Deep Reinforcement Learning (Q-Learning)◦ Links to Deepmind (Mnih et al. 2013)

Costly nature of Reinforcement Learning◦ Trial and Error Approach◦ Issues with long term goals◦ Makes lots of mistakes◦ Celiberto et al. (2010) states...“this technique is not efficient enough to be used in applications with real world demands due to the time that the agent needs to learn”

Page 6: Atari Game State Representation using Convolutional Neural Networks

Background Q-Learning (RL)

◦ Learn the optimal policy, which action to take at each state◦ Represented as...

Q(s, a) Functioning: Watkins and Dayan (1992) state that...

◦ system observes its current state xn

◦ selects/performs an action an◦ observes the subsequent state yn and gets the reward rn

◦ updates the Qn (s, a) values using◦ a learning rate identified as α◦ discounted factor as γ

Qn(s,a) = (1 - αn)Qn-1(s, a) + αn[rn + γ(max(Qn-1(yn,a)))]

Page 7: Atari Game State Representation using Convolutional Neural Networks

Pseudo Code

Source: Mnih et al. (2013)

Page 8: Atari Game State Representation using Convolutional Neural Networks

Representation of Q(s,a)

Actions

Stat

es

Q Values

Page 9: Atari Game State Representation using Convolutional Neural Networks

Other Methods Imitation Learning (IL)

◦ Applied to robotics e.g. Nemec et al. (2010), Schmidts et al. (2011) and Kunze et al. (2013)

Could this be applied to the games agent?◦ Potentially by mapping the states and the actions from observed game play◦ Manually updating the policies

Hamahata et al. (2008) states that “imitation learning consisting of a simple observation cannot give us the sophisticated skill”

Page 10: Atari Game State Representation using Convolutional Neural Networks

Other Methods Combining RL and IL

◦ Kulkarni (2012, p. 4) refers to this as ‘semi-supervised learning’◦ Barto and Rosenstein (2004) suggesting the use of a model which acts as a

supervisor and an actor.

Supervisor Information (Barto and Rosenstein, 2004)

StateRepresentation

Page 11: Atari Game State Representation using Convolutional Neural Networks

The Plan (at this point) Reduce the costly impact of RL

◦ Use some form of critic or early reward system◦ If no Q Value exists for that state, then check with an expert

Capture Expert Data◦ States◦ Actions◦ Rewards

Build a model

Use the model to inform the Q Learning System

Page 12: Atari Game State Representation using Convolutional Neural Networks

Data Capture Plan

CaptureInput DataUsing Stella VCS based Android Solution

User ActionsUp, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right, Up, Down, Left, Right

Account for SEED VariantsetSeed(12345679)

Replay in the Lab

Extract Score & StatesUsing ALE

Page 13: Atari Game State Representation using Convolutional Neural Networks

The Big Problem We couldn’t account for the randomisation

◦ ALE is based on Stella◦ Version problems

◦ Tested various approaches◦ Replayed games over Skype

We could save the state..!◦ But had some problems

Other problems

Page 14: Atari Game State Representation using Convolutional Neural Networks

Technical Implementation

Arcade Learning Environment (ALE) (Bellemare et al 2013) ◦ General Agent Testing Environment using Atari Games◦ Supporting 50+ Games◦ Based Stella VCS Atari Emulator◦ Supports Agents in C++, Java and more...

Python 2.7 (Anaconda Distribution)

Theano (ML Framework written in Python)◦ Mnih et al. (2013)◦ Q-Learning Sample Code◦ Korjus (2014)

Linux then Windows 8, Cuda Support

Page 15: Atari Game State Representation using Convolutional Neural Networks

Computational Requirements

Test System◦ Simple CNN / MLP◦ 16,000 grayscale◦ 28x28 images

Results◦ Significant Difference with Cuda Support◦ CNN Process is very computationally

costly

MLP Speed Test Results

CNN Speed Test Results

Page 16: Atari Game State Representation using Convolutional Neural Networks

States and Actions States - Screen Data

◦ Raw Screen Data◦ SDL (SDL_Surface)

◦ BMP File

Actions – Controller Inputs

Resulted in….◦ Lots of Images matched to entries in a CSV File

Page 17: Atari Game State Representation using Convolutional Neural Networks

Rewards ALE Reward Data

void BreakoutSettings::step(const System& system) {

// update the reward int x = readRam(&system, 77); int y = readRam(&system, 76); reward_t score = 1 * (x & 0x000F) + 10 * ((x & 0x00F0) >> 4) + 100 * (y & 0x000F); m_reward = score - m_score; m_score = score;

// update terminal status int byte_val = readRam(&system, 57); if (!m_started && byte_val == 5) m_started = true; m_terminal = m_started && byte_val == 0;}

Page 18: Atari Game State Representation using Convolutional Neural Networks

State Representation Screen Pixel – 160 x 210 RGB

If we used them as inputs...◦ RGB: 100,800◦ Greyscale: 33,000

Mnih et al. (2013) use cropped 84 x 84 images◦ Good – High Resolutions, Lots of Features Present◦ Bad – When handling lots of training data

MNIST Example Set use 28 x 28◦ Good – Computationally Acceptable◦ Bad – Limited Detail

The problem◦ Unable to process large amounts of hi-res images◦ Low-res images gave poor results

Page 19: Atari Game State Representation using Convolutional Neural Networks

Original System - Image Processing

Image Resize Methods

Temporal Data (Frame Merging)

Page 20: Atari Game State Representation using Convolutional Neural Networks

Original System - Training Results

28x28 Images

64x64 Images

84x84 (4,100 images) = Memory Error

7 minutes for 16,000 28x28

18 minutes for 4,000 64x64

Page 21: Atari Game State Representation using Convolutional Neural Networks

Development

Original

Revised

Page 22: Atari Game State Representation using Convolutional Neural Networks

CNN FrameworkMnih et al. (2013) make use of Convolutional Neural Networks

Feature extraction◦ Can be used to reduce Dimensionality of the Domain Space◦ Examples include

◦ Hand Writing Classification Yuan et al. (2012), Bottou et al. (1994)◦ Face Detection Garcia and Delakis (2004) and Chen et al. (2006)

A CNN as inputs for a fully connected MLP (Bergstra et al. 2010).

Page 23: Atari Game State Representation using Convolutional Neural Networks

Convolutional Neural Networks

Feature Extraction

Developed as a result of the work of LeCun et al. (1998)

Take inspiration from cats and monkeys visual processes Hubel and Wiesel (1962, 1968)

Can accommodate changes in Scale, Rotation, Stroke Width, etc

Can handle Noise

See: http://yann.lecun.com/exdb/lenet/index.html

Page 24: Atari Game State Representation using Convolutional Neural Networks

Convolution of an Image0 0 0

0 1 0

0 0 0

Example Kernel

Source: https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html

Page 25: Atari Game State Representation using Convolutional Neural Networks

Other Examples

0 -1 0

-1 5 -1

0 -1 0

0 1 0

1 -4 1

0 1 0

1 0 -1

0 0 0

-1 0 1

-1 -1 -1

-1 8 -1

-1 -1 -1

Source: http://en.wikipedia.org/wiki/Kernel_(image_processing)

Page 26: Atari Game State Representation using Convolutional Neural Networks

CNN Feature Extraction Single Convolutional Layer

◦ From Full Resolution Images (160 x 210 RGB)

1,939 Inputs

130 Inputs

Page 27: Atari Game State Representation using Convolutional Neural Networks

CNN Feature Extraction Binary Conversion

◦ Accurate State Representation

Lower Computational Costs◦ Single Convolution Layer (15 seconds for 2,391 images / 11.7 seconds for 1,790)

◦ Reduced number of inputs for the MLP◦ More Manageable

Page 28: Atari Game State Representation using Convolutional Neural Networks

Problems & Limitations Binary Conversion was too severe (Breakout)

Feature removed by binary conversion as shown above

Seaquest could not differentiate between the enemy and the goals

Page 29: Atari Game State Representation using Convolutional Neural Networks

New System Training Results

Test Configuration

Results

Lowest Error Rate: 32.50%

Page 30: Atari Game State Representation using Convolutional Neural Networks

Evidence of LearningMLP New System

Page 31: Atari Game State Representation using Convolutional Neural Networks

More Testing

Page 32: Atari Game State Representation using Convolutional Neural Networks

Conclusion Large amounts of data

CNN as a Preprocessor...◦ Reduced Computational Costs◦ Allowed for good state representation◦ Reduced dimensionality for the MLP

Old System◦ No evidence of learning

New System◦ Evidence of the system learning◦ Needs to be implemented as an agent to test real-world effectiveness

Page 33: Atari Game State Representation using Convolutional Neural Networks

What would I do differently?

Better Evaluation Methodology◦ What was the frequency/distribution of controls?◦ Was the system better at different games or controls?

Went too far with the image conversion...

Page 34: Atari Game State Representation using Convolutional Neural Networks

Future Work 1. Data Collection Methods

2. Foundation for Q-Learning

Page 35: Atari Game State Representation using Convolutional Neural Networks

Future Work 3. State Representation

Step 1Identify areas of interest

Step 2Process and Classify Area

Step 3Update State Representation

Page 36: Atari Game State Representation using Convolutional Neural Networks

Future Work 4. Explore the effects of multiple Convolutional Layers

5. Build a working agent...!

? ?

Page 37: Atari Game State Representation using Convolutional Neural Networks

Useful LinksALE (Visual Studio Version)

https://github.com/mvacha/A.L.E.-0.4.4.-Visual-Studio

Replicating the Paper “Playing Atari with Deep Reinforcement Learning” - Kristjan Korjus et al

https://courses.cs.ut.ee/MTAT.03.291/2014_spring/uploads/Main/Replicating%20DeepMind.pdf

Github for the above project

https://github.com/kristjankorjus/Replicating-DeepMind/tree/master/src

ALE : http://www.arcadelearningenvironment.org/

ALE Old Site: http://yavar.naddaf.name/ale/

Page 38: Atari Game State Representation using Convolutional Neural Networks

BibliographyBarto, M. T. and Rosenstein, A. G. (2004), `Supervised actor-critic reinforcement learning', Handbook of Learning and Approximate Dynamic Programming 2, 359.

Bellemare, M. G., Naddaf, Y., Veness, J. and Bowling, M. (2013), `The arcade learning environment: An evaluation platform for general agents', Journal of Articial Intelligence Research 47, 253-279.

Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D. and Bengio, Y. (2010), Theano: a CPU and GPU math expression compiler, in `Proceedings of the Python for Scientic Computing Conference (SciPy)'. Oral Presentation.

Celiberto, L., Matsuura, J., Lopez de Mantaras, R. and Bianchi, R. (2010), Using transfer learning to speed-up reinforcement learning: A cased-based approach, in `Robotics Symposium and Intelligent Robotic Meeting (LARS), 2010 Latin American', pp. 55-60

Korjus, K., Kuzovkin, I., Tampuu, A. and Pungas, T. (2014), Replicating the paper "Playing Atari with Deep Reinforcement Learning", Technical report, University of Tartu.

Kulkarni, P. (2012), Reinforcement and systemic machine learning for decision making, John Wiley & Sons, Hoboken.

Kunze, L., Haidu, A. and Beetz, M. (2013), Acquiring task models for imitation learning through games with a purpose, in `Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on', pp. 102-107.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M. (2013), Playing Atari with deep reinforcement learning, in `NIPS Deep Learning Workshop'.

Nemec, B., Zorko, M. and Zlajpah, L. (2010), Learning of a ball-in-a-cup playing robot, in `Robotics in Alpe-Adria-Danube Region (RAAD), 2010 IEEE 19th International Workshop on', pp. 297-301.

Schmidts, A. M., Lee, D. and Peer, A. (2011), Imitation learning of human grasping skills from motion and force data, in `Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on', pp. 1002-1007.

Watkins, C. J. C. H. and Dayan, P. (1992), `Technical note q-learning', Machine Learning 8, 279-292.

Page 39: Atari Game State Representation using Convolutional Neural Networks

Thank you