Twist on tic tac toe

I am writing a tictacto program, but this is not your traditional tictacto.

First of all, the board is 4x4, and the way to win is to get 3 types and 1 of your opponents in a row, column or diagonal. So the next win for β€œO” is through the first column:

O|_|X|_ O|X|_|_ O| |_|_ X|_|_|_ 

I am trying to implement a minimax algorithm to give the program a "hard" mode that cannot be beaten.

My problem is that I cannot hope to create a tree with all possible game states, and therefore I have to come up with some kind of function that evaluates the game states that I can generate.

I think my question is, how can I come up with such a function?

+6
source share
2 answers

This game is definitely small enough for brute force.

You can simply list all the states. There are 16 squares with 3 possible values ​​for each square (X, O, or empty).

3 ^ 16 = 43046721, about 43 million.

This means that a table with one byte to describe the gain of each state will be equal to only 43 megabytes.

I would create a function that matches each state with an index between one and 43 million (you only need states, not possible playback orders), basically representing it as a number in base-3 and letting you create a state from the index.

Choose 4 win values, each state can take - win for O, win for X, not winning and unknown.

Highlight a buffer 43046721 in length to save the winnings in each game state.

Iterate over all index numbers, indicating won states. Then go through and iteratively fill the winnings of each of the other states, if they are known (check all successor states based on who does it). It will take no more than 16 iterations on a set of indices, so I see no reason why brute force does not work here.

There is an optimization, for example, the use of symmetry, using the fact that all states with n parts are reduced with n + 1 pieces, etc., but I don’t think you need it at first.

+4
source

A heuristic function for a game is an evaluation that evaluates the state of a game. Here - the state basically consists of two parts: (1) The board itself. (2) Whose turn.

Possible heuristic function:

  • The maximum number of X (or O, according to the rated player) per row / column / diagonal
  • The number of "almost winning" lines (with one missing movement) - can be affective to maximize the number of winning opportunities.

I guess you can think of more heuristics.
You can combine your various heuristics into one β€œbig” heuristic function as follows:

 a_1 * h_1(state) + a_2 * h_2(state) + ... + a_n * h_n(state) 

The hard part will be to find out the grades for a_1, ..., a_n - this can be done in different ways - one of them is monte carlo training - which basically means: create different agents with different values a_1,..,a_n , start a tournament between them, and when the tournament is completed - adjust the scales according to the winners - and repeat the process while you still have it (this is the algorithm at any time ).
Once you're done, use the lessons learned for your final agent.

PS The number of possible games is ~ 16! (you need to determine the order of the selected squares - he chooses how the rest of the game will end) - ask yourself if it is "little" enough to be developed in your limitations - or is it too much, and a heuristic solution is really necessary.

+2
source

All Articles