In this talk we will look at a few machine learning algorithms and their Python implementations to build AI's that can play LaserCat, a video game written with Pygame. We start with building an imitation AI that can imitate a human player's decision-making. Then we build a self-evolving AI that can explore and learn on its own to eventually achieve super-human level.
LaserCat is a video game written by our team with Pygame. This project is to use the game as a test bed to experiment and implement various machine learning algorithms in Python to gain insight in building video game AI and AI in general.
In LaserCat, the player controls the cat to shoot the mouse-saucers that are flying across the screen from left to right. The player gains positive score from each kill, and gains negative score from letting the cat collide with the mouse-saucer or letting the mouse-saucer escape.
We start with building an imitation AI that can imitate a human player's decision-making. We recorded 2 hours of my gameplay as training data, including the coordinates of cat, lasers, and mice, as well as the action I chose to execute in each frame. Then it becomes a supervised learning problem in which the coordinates are the predictors and my action is the response.
However, not all the coordinates are available at any given time. We handle the missing values with a hierarchical random forest model: We divide the data into 32 subsets according to the 32 missing value patterns, and train a random forest model with the scikit-learn library for each subset. Then, when the imitation AI plays the game, in each frame, it extracts the coordinates of all the objects on the screen, identifies the missing value pattern, and uses the corresponding random forest to predict the action that the human player would take.
Next, we consider the more interesting and challenging problem of building a self-evolving AI that can learn on its own. It is a reinforcement learning problem in which the goal is to maximize the score gained per minute. In reinforcement learning, the AI has to learn the optimum strategy by an educated trial and error.
All the coordinates in each frame are too much information for the AI to learn efficiently. So we simplify the information from each frame by building a grid that moves with the leading mouse to track the cat's approximate relative position. Then we implement Bayesian Q-learning to let the AI learn the best action to take given the cat's position in the grid. The AI cat gets smarter and smarter as it plays the game and explores different actions. After 7 hours of training, the AI is able to play on a super-human level.