Project 5: Tic-Tac-Go!

Deliverables: To be on track for completing the Red Belt (A in course), it is recommended that you complete this assignment by Monday, 11 April. Bring in to class a stapled turn-in containing your written answers to all the questions, including all the code your wrote.

Purpose

  1. Learn about object-oriented programming and programming using classes and inheritance in Python.

  2. Get more practice programming with functions and analyzing execution cost.

  3. Get a more realistic programming experience, where you are given less guidance how to write code to solve an interesting problem.

  4. Experience simple machine learning that can achive super-human performance (at least on super-silly tasks).

Collaboration Policy

For this assignment, you may work alone or work with a partner.

If you work with a partner, you and your partner should submit a single assignment with both of your names and UVA ids on it. You and your partner should work on all the problems together, and both of you must completely understand everything you turn in. You should also take turns driving (typing) when you are at the computer.

Regardless of whether you work alone or with a partner, you may discuss the assignment with anyone you want and get help from others, so long as it is help in the spirit of learning how to do things yourself not getting answers you don't understand. You should understand everything you turn in for the assignment well enough to be able to produce it completely on your own.

Remember to follow the course pledge you read and signed at the beginning of the semester. For this assignment, you may consult any outside resources, including books, papers, web sites and people, you wish except for materials from previous cs1120 courses or direct solutions to the given problems. You may consult anyone you want, but that person cannot type anything in for you and all work must remain your own and outside sources should never give you specific answers to problem set questions.

If you use resources other than the class materials, lectures and course staff, you should document this in your turn-in.

You are strongly encouraged to take advantage of the scheduled help hours and office hours for this course.

Objects and Classes

Objects provide a way to package functions and state in ways that make programming more convenient and (sometimes) make programs easier to modify and understand.

Python provides the class construct for defining a class which provides a way to make objects. A class provides an __init__ method which is used to create a new object of the class type, and may define methods that are functions operating on objects of the class type.

In this project, we will use classes to represent the state of a board game, players who play a game, and other objects useful in building a generic game-playing bot.

The intial repository provides most of the code for a Tic-Tac-Toe game including a bot that learns to play a passable (but not perfect) game of Tic-Tac-Toe. Your main take for this project is to create new classes that implement the game of Go and a Go-playing bot.

Project 5 Repository

Set up your project5 repository in the same way as you did for previous projects.

The project5 repository contains many files, and a good solution to project5 will involve creating several new files of your own. The provided code is a Tic-Tac-Toe game and learning player. You can use this code in any way you want to build a (mini) Go learning player.

Unlike previous projects, there is no template for writing your answers! It is up to you to figure out how to solve the problems. A good solution will take advantage of the provided code by using inheritance, but not need to make changes to it.

Warning: This project expects you to figure out a lot more on your own than previous projects. It is unwise to jump in and start programming without thinking through your design carefully and making sure you understand what you should about the provided code. We recommend you start early and take advantage of the scheduled help hours.

Understanding Tic-Tac-Toe

Before getting started with Go, you should understand the provided Tic-Tac-Toe game and learning player.

There are two main ways to understand complex programs: top-down and bottom-up. With a top-down approach, we look at the high level design of the program, treating many of the modules as "black boxes" (until we later go into their details). In most cases, it is best to combine top-down and bottom-up understanding: get an understanding of the high-level components of the system, and then drill down into the details of particular components as necessary.

We will provide a walkthrough introduction to the provided code (along with some questions you need to answer to verify understanding it), but you should also feel free to examine and play around with the provided code on your own.

Playing a Game

The game.py module provides the Game class that implements a generic, turn-based perfect-information game.

Here is the class definition (which will be explained below): (this is the full code, but with comments removed, see game.py for the commented code)

class Game:
    """
    A Game represents a turn-based game with
    public and complete state (perfect information).
    """

    def __init__(self, players, state):
        """
        Initializes a new Game with the given
        players and initial state.
        """
        # all players must be of type Player
        assert all([isinstance(player, Player) for player in players])
        self._players = players

        # the state must be a GameState
        assert isinstance(state, GameState)
        self._state = state

    def play(self, show_moves = False):
        """
        Play the game.  Each players gets a turn in order,
        until the game is over.

        Returns a pair: (winner, list of moves)
        where each move in moves is a list of the moves
        for each player for that round.
        """
        moves = []
        all_pass = False
        while not all_pass:
            moves.append([])
            all_pass = True

            for player in self._players:
                if self._state.game_over():
                    break

                move = player.turn(self._state)
                moves[-1].append(move)

                if move:
                    assert self._state.legal_move(player, move)
                    self._state.make_move(player, move)
                    all_pass = False

                if show_moves:
                    print("State after " + player.mark() + " moves " \
                          + str(move) if move else "[pass]")
                    print(str(self._state))

        return (self._state.outcome(), moves)

The Game class defines two methods: __init__ initializes a game, recording the players and initial state in instance variables, _players and _state. We use the _ at the beginning of these names to indicate that they are meant to be internal to the class implementation, and should not be used outside it. (Unlike many languages, Python does not provide any enforcement of attribute visibility; it is only by convention that names beginning with _ are not supposed to be used outside the class.)

The play method plays a game, recording the moves. Note that it does not make any assumptions about the actual game or players, just that the players all take turns making moves in order, and the came continues until it is over. It is up to the objects passed in as players and state when the game is created to determine the rules of the game and how it is played.

The play method uses methods provided by the state object, and requires that they have these behaviors (for the game play to work):

It is up to the class that implements the state object, which should be a subclass of GameState to implement these methods to provide the desired behavior of the game.

The play method also uses one method provided by the player object:

It is up to the class that implements the player object to define the turn method to make a good, legal move in the game. (Note that the play method verifies that the move is a legal_move before making it, to catch players cheating.)

To make sure you understand the definition of the play method, answer Problems 1 and 2.

Problem 1. For a standard game of Tic-Tac-Toe on a 3x3 board with 2 players, what is the maximum length of the list of moves returned as the second return value returned by play?

Problem 2. For each of the following games, answer if they could fit into what would be supported by the Game class. If a game does not fit this generic model, explain why.

a. Chess

b. Minesweeper

c. Poker

d. Soccer

State of Play

The GameState class defines a generic game state, but doesn't implement any particular game:

class GameState:
    """
    Generic GameState module. Actual games will need
    to provide a subclass of GameState that actually
    implements these methods.
    """
    def __init__(self):
        pass

    def make_move(self, player, move):
        raise RuntimeException("No generic update implemented!")

    def legal_move(self, player, move):
        raise RuntimeException("No generic legal_move implemented!")

    def game_over(self):
        raise RuntimeException("No generic game_over implemented!")

    def outcome(self):
        raise RuntimeException("No generic outcome implemented!")

    def serialize(self):

        raise RuntimeException("No generic serialize implemented!")

    def __str__(self):
        return ""

This class provides no useful functionality, but is a starting base class for implementing the state of other games. For both Tic-Tac-Toe and Go, the state can be represented as a board, where each position is either empty or contains one object.

We have provided a Board class that implements a generic board (as a subclass of GameState) in board.py (excerpted below, see board.py for the full implementation):

import copy
from gamestate import GameState

class Board(GameState):
    """
    Board game state - a set of positions, each of which can
    contain one object (or nothing).
    """

    def __init__(self, size):
        """
        Initializes an empty board with size squares,
        each containing None.
        """
        self._positions = size * [None]

    def valid_position(self, position):
        return position >= 0 and position < len(self._positions)

    def empty_positions(self):
        return [position for position in range(len(self._positions))
                if not self.has_object(position)]

    def has_object(self, position):
        assert self.valid_position(position)
        return self._positions[position] is not None

    def get_object(self, position):
        """
        Returns the object at position, or None if it is empty.
        """
        assert self.valid_position(position)
        return self._positions[position]

    def remove_object(self, position):
        """
        Removes the object at position from the board.
        """
        assert self.has_object(position)
        self._positions[position] = None

    def place_object(self, position, obj):
        """
        Places the object obj at position.
        Requires position is a valid, empty position on the board.
        """
        assert self.valid_position(position)
        assert not self.has_object(position)
        self._positions[position] = obj

    ... # methods for converting a Board to a string not shown

Note that the Board has no structure - it just represents the state of the game with a flat list, with one element corresponding to each board position.

Many games, including Tic-Tac-Toe and Go, are played on square boards, where it is helpful to think of the positions being organized in a two-dimensional grid. The SquareBoard class inherits from Board to implement a two-dimensional, square board (see squareboard.py for the full implementation). It also defines a Position class for representing a position on a square board with two coordinates.

from board import Board

class Position:
    """
    (x, y) position on a board
    """
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __str__(self):
        return "(" + str(self.x) + ", " + str(self.y) + ")"

class SquareBoard(Board):
    """
    Square board (NxN positions arranged in a square).
    """

    def __init__(self, N):
        super().__init__(N * N)
        self._size = N

    def _convert_position(self, pos):
        if isinstance(pos, Position):
            return pos.x + (pos.y * self._size)
        else:
            return pos

    def _convert_to_position(self, pos):
        res = Position(pos % self._size, pos // self._size)
        assert self._convert_position(res) == pos
        return res

    def valid_position(self, pos):
        return super().valid_position(self, self._convert_position(pos))

    def empty_positions(self):
        return [self._convert_to_position(pos)
                for pos in super().empty_positions()]

    def has_object(self, pos):
        return super().has_object(self._convert_position(pos))

    def get_object(self, pos):
        return super().get_object(self._convert_position(pos))

    def remove_object(self, pos):
        return super().remove_object(self._convert_position(pos))

    def place_object(self, pos, obj):
        return super().place_object(self._convert_position(pos), obj)

    def __str__(self):            
        return "[" \
               + '\n '.join([' '.join([self.obj_unparse(self.get_object(Position(x, y)))
                                  for x in range(self._size)])    
                        for y in range(self._size)]) \
                    + "]"

Notice how SquareBoard uses super() to access the superclass methods. All of the methods for accessing positions on the square board are implemented by converting the position to the corresponding location on the superclass Board.

Problem 3. Why does _convert_position need to check isinstance(pos, Position)? A good answer would explain how _convert_position might be called with an input that is the number of a square on the superclass Board representation. (Hint: when place_object is invoked on a SquareBoard object, the super().place_object(...) method is called. Inside the Board.place_object method, it calls self.has_object(position). What method will be called when the self object is a SquareBoard? You are encouraged to add tracing prints to the code to verify what is really going on here.)

Noughts and Crosses

To play Tic-Tac-Toe, we need a game state that knows the rules of Tic-Tac-Toe. The provided TicTacToeBoard is a subclass of SquareBoard that encodes the rules of Tic-Tac-Toe.

The rules are simple, other than determining the outcome of the game, which requires checking all vertical, horizontal, and diagonal lines.

class TicTacToeBoard(SquareBoard):
    """
    Board for playing (generalized) Tic-Tac-Toe.
    """

    def __init__(self, N):
        super().__init__(N)

    def possible_moves(self, _player):
        """
        Returns the list of possible moves: all empty squares.
        """
        return self.empty_positions()

    def legal_move(self, player, position):
        """
        A move is legal if the selected position is empty.
        """
        return not self.has_object(position)

    def make_move(self, player, position):
        """
        Places the mark for player at position.
        """
        assert self.legal_move(player, position)
        self.place_object(position, player.mark())

    def game_over(self):
        """
        The game is over if the board is full, or there is a winner.
        """
        return not self.empty_positions() or self.outcome()

    def outcome(self):
        """
        Returns state of game:
            Mark: win for player with that mark
            None: draw

        A player wins the game if they have a line of N of their marks
        (either vertical, horizontal, or diagonal).  If more than one
        player has a winning line, the game is invalid.
        """
        N = self._size

        lines = \
          [[Position(x, y) for x in range(N)] for y in range(N)] + \
          [[Position(x, y) for y in range(N)] for x in range(N)] + \
          [[Position(x, x) for x in range(N)]] + \
          [[Position((N - 1) - x, x) for x in range(N)]]

        winner = None
        for line in lines:
            mark = self.get_object(line[0])
            if mark:
                for position in line:
                    if not self.get_object(position) == mark:
                        mark = None
                        break
                if mark:
                    # fails if there are two winners!
                    assert not winner or winner == mark
                    winner = mark
        return winner

Problem 4. What is the asymptotic running time for the outcome method? Be careful to state any important assumptions you need, and to define any variables you use in your answer.

(Hint 1: your answer should involve N, the dimension of the board.) (Hint 2: If you don't understand the list comprehensions used to initializes lines, try adding some code to print out the value of lines.)

Your Move

Games are not much fun without someone (or at least an object!) to play them.

We have provided several classes for different kinds of players: Player (generic player), RandomPlayer (picks a random legal move), HumanPlayer (requests moves as user input), and LearningPlayer.

Problem 5. Define a play_human_game() function that allows a human to play a 3x3 game of Tic-Tac-Toe against a RandomPlayer. Test out your code (and your Tic-Tac-Toe skills!) by playing a few games.

You should create a new file for your function. You will need to use import statements to import the necessary definitions from other modules. For example, from humanplayer import HumanPlayer will import the HumanPlayer class definition.

Learning to Play

Tic-Tac-Toe is such a small and simple game that it is easy to play optimally by searching the space of all possible games. But, instead of building an optimal player, we will use a generic learning method that can (in theory) be used to play any game.

The LearningPlayer class defines a learning player. The idea is very simple: keep a dictionary that associates game states with outcomes, and after each training game, update the outcomes associated with each game state seen during the game to reflect the result of the game. For example, if the player won the game, all of the states seen during the game led to a win, so are "good" states. (Of course, whether a state is actually good depends on what moves the other player made also, so it is not guaranteed that just because a particular state was seen in a winning game that it is always good.)

The most important code in learningplayer.py is excerpted below (see the file for the full code):

class LearningPlayer(Player):
    """
    Grandmaster of every game (in theory)!
    """

    def __init__(self, name, mark):
        super().__init__(name, mark)

        # each key in training is a serialized game state,
        # the entry is a [win, draw, lose] count for all
        # training games containing that state.
        self._training = {}

    def learn_from_game(self, state, winner, players, moves):
        """
        Learns from the game represented by moves (starting from
        initial state with players given), that had result winner.

        The training data for each state seen in the game is updated
        according to the outcome.
        """
        result = self._game_result(winner) # 0 = win, 1 = draw, 2 = loss
        for rmove in moves:
            for pi in range(len(rmove)):
                pmove = rmove[pi]
                player = players[pi]
                staterep = state.serialize()
                if staterep not in self._training:
                    self._training[staterep] = [0, 0, 0]
                self._training[staterep][result] += 1
                state.make_move(player, pmove)

    def train(self, state, players):
        """
        Trains the player by playing a game with players
        with initial state given by state.
        """
        origstate = state.copy()
        ttg = Game(players, state)
        (winner, moves) = ttg.play()
        self.learn_from_game(origstate, winner, players, moves)

    def turn(self, state):
        moves = state.possible_moves(self)
        if moves:
            bestscore = None
            bestmove = None
            untried_moves = []
            # Try each move, and see how the outcomes are at the resulting state.
            for move in moves:
               newstate = state.copy()
               newstate.make_move(self, move)
               canonical = newstate.serialize()
               if canonical in self._training:
                   outcomes = self._training[canonical]
                   # wins are good, loses are bad (ignore draws)
                   noutcomes = sum(outcomes)
                   score = outcomes[0] - 2 * outcomes[2]
                   if not bestmove or score > bestscore:
                       bestmove = move
                       bestscore = score
               else: # unseen position
                   untried_moves.append(move)

            if bestmove:
                if score > 20:
                    return bestmove
                elif score > 0 and random.random() > 0.2:
                    return bestmove
                elif untried_moves: 
                    return random.choice(untried_moves)
                else:
                    return bestmove
            else:                               
                return random.choice(untried_moves) 
        else:
            return None

Problem 6. Define a train_player(learner, initialstate, opponent, ngames) function that trains a LearningPlayer object by playing ngames against the opponent (a Player object), starting with the initialstate (e.g., TicTacToeBoard(3) to train a player to play tic-tac-toe).

You should test your LearningPlayer object after training by playing 1000 games against a RandomPlayer (or try playing it yourself as a HumanPlayer). You may give your LearningPlayer the advantage of playing first.

Problem 7. Try some experiments with your train_player function to answer (1) how many training games are needed to train a passable tic-tac-toe player? (2) does it make a difference who the player is trained against? (3) (Challenge bonus!) how close to optimal is your trained player?

Walloping AlphaGo

Playing Tic-Tac-Toe isn't very exciting since every 7-year-old knows how to always get at least a draw. But, because of object-oriented programming and the power of inheritance, it doesn't take much to extend our code to make a Go-playing monster.

It should implement the rules of Go as presented in Class 24 (and described in more detail on the Go Wikipedia Page:

The liberties of a stone are the empty squares that are directly adjacent to that square (horizontally or vertically; diagonal does not count). If a stone is adjacent to a stone of its own color, the liberties of the stones are merged. This continues for all adjacent stones of one color, so you can think of these connected stones as one super-stone.

If you are confused about the rules of Go, please ask, and we'll add clarifications here.

Problem 8. Extend the provided code to make a GoBoard game state class, and train a Go player that can play a passable game of Go on a very small board.

For now, we leave it completely up to you to figure out how to do this. But, depending on how it is going, we may add hints and suggestions if this seems too daunting. If you've got here and have no ideas how to proceed, post messages on Slack (#general is fine) asking for help.

Quadruple-Gold-Star Challenge! Make a Go player that can play a competitive (with an amateur human Go player) game of Go on a 9x9 board. A good solution is worth a paid summer position in Dave's research group as well as automatic Red Belt promotion.

Deeper Reading

David Silver, et al. Mastering the Game of Go with Deep Neural Networks and Tree Search, Nature, January 2016.

Although the DeepMind team claims that building a world champion Go player was believed to be decades away, other researchers were not so sure. In this 2007 article, Feng-Hsiung Hsu (who led the Deep Blue project at IBM), predicts "I believe that a world-champion-level Go machine can be built within 10 years" (so by 2017, which is just about spot on!): Cracking GO: Brute-force computation has eclipsed humans in chess, and it could soon do the same in this ancient Asian game, IEEE Spectrum, 1 October 2007.

The Turing Archive has Alan Turing's thoughts on machines playing chess.

Credits: This project is new for cs1120 Spring 2016, and was developed by David Evans.