Optimizing Neural Network Topologies for Reinforcement Learning Backgammon Players

One of the most famous practical successes in reinforcement learning is Gerry Tesauro's TD-Gammon, which learned to play backgammon near the level of the world's strongest grandmasters. TD-Gammon used TD (lambda), a standard reinforcement learning method, to learn a value function represented as a multi-layer feed-forward neural network. One potential limitation of Tesauro's approach is that the network's topology (how many hidden nodes there are and how they are connected) has to be designed manually. Such design decisions can dramatically affect performance, since networks that are too simple will result in suboptimal performance while those that are too complex may take infeasible amounts of time to learn.

The aim of this project is to improve TD-Gammon's performance by automating the discovery of effective network topologies. Such discovery is possible via neuroevolution, wherein evolutionary algorithms are used to optimize the structures and weights of a population of neural networks. Such methods can solve difficult reinforcement learning problems either in lieu of or in conjunction with traditional temporal difference methods.

This project involves a direct collaboration with Gerry Tesauro at IBM Research, who has offered to share code for the original TD-Gammon.

Keywords:
Backgammon, Neural Networks, Reinforcement Learning, Neuroevolution Learning
Study:
Artificial Intelligence
Contact:
Shimon Whiteson
Location:
Universiteit van Amsterdam
References: