Does Tic-tac-toe teacher ANN with TensorFlow find out?

I am experimenting with ANN as tic-tac-toe players using TensorFlow learn (formerly Scikit Flow)

The simulator code is here . My current nn-based simple player gains about 90% against a random player, but cannot defend at all, because I don’t know how to submit to NN what he should not do (wrong moves if the opponent has two in a row) . In addition, he trains only on the last state of the board and wins the move.

Questions:

  • What would be the best approach to solving this problem (using the knowledge of ANN and the zero rule of the game)?

  • I guess RNN / LSTM NN will help. Here is a small example on the TensorFlow Learn website:

classifier = skflow.TensorFlowRNNClassifier(rnn_size=EMBEDDING_SIZE, n_classes=15, cell_type='gru', input_op_fn=input_op_fn, num_layers=1, bidirectional=False, sequence_length=None, steps=1000, optimizer='Adam', learning_rate=0.01, continue_training=True)

But I don’t understand here - how does this work for the tic-tac-toe example? What would be rnn_size=EMBEDDING_SIZEand input_op_fn params?

Edit: The workout code is as follows ( Player source ):

def train(self, history):

    X = np.zeros((len(history), 18))
    y = np.zeros((len(history), 1))
    i = 0

    for game in history:

        # Train only on wins of player A (+1 values)
        if game.state == game.WIN_P1:

            # Get the second last board states (that lead to the winning move)
            X[i] = np.concatenate([
                game.history[-2],
                game.history[-3]
            ]).flatten()

            y[i] = game.last_move

        # TODO: How can we train lost games with a DNN classifier?
        # if game.state == game.WIN_PMINUS1:

        i += 1

    self.classifier.fit(X, y)

So, the line in X contains the last two states of the game (3x3 nparrays containing 0, 1, -1 for empty, player 1, player 2, respectively), and y contains the last move (which leads to victory).

+4
source share

All Articles