I am experimenting with ANN as tic-tac-toe players using TensorFlow learn (formerly Scikit Flow)
The simulator code is here . My current nn-based simple player gains about 90% against a random player, but cannot defend at all, because I don’t know how to submit to NN what he should not do (wrong moves if the opponent has two in a row) . In addition, he trains only on the last state of the board and wins the move.
Questions:
What would be the best approach to solving this problem (using the knowledge of ANN and the zero rule of the game)?
I guess RNN / LSTM NN will help. Here is a small example on the TensorFlow Learn website:
classifier = skflow.TensorFlowRNNClassifier(rnn_size=EMBEDDING_SIZE,
n_classes=15, cell_type='gru', input_op_fn=input_op_fn,
num_layers=1, bidirectional=False, sequence_length=None,
steps=1000, optimizer='Adam', learning_rate=0.01, continue_training=True)
But I don’t understand here - how does this work for the tic-tac-toe example? What would be rnn_size=EMBEDDING_SIZEand input_op_fn params?
Edit: The workout code is as follows ( Player source ):
def train(self, history):
X = np.zeros((len(history), 18))
y = np.zeros((len(history), 1))
i = 0
for game in history:
if game.state == game.WIN_P1:
X[i] = np.concatenate([
game.history[-2],
game.history[-3]
]).flatten()
y[i] = game.last_move
i += 1
self.classifier.fit(X, y)
So, the line in X contains the last two states of the game (3x3 nparrays containing 0, 1, -1 for empty, player 1, player 2, respectively), and y contains the last move (which leads to victory).