How many threads are suitable for tic-tac-toe using minimax?

Take the 5x5 tic-tac-toe example. Say this is my AI. Then

  • I make 25 moves (in each cell in principle, of course, if it's legal to move),
  • create a thread for each movement (25 threads total),
  • calling a minimax function for each move made,
  • then when all the results come from each thread,
  • Compare the scores and select the move with the best result.

Here are my questions:

  • Is it efficient to use 25 threads? What does using 25 threads mean?

  • Is it 25 times faster (most likely not)? What does it depend? Of course, on a computer, but how can I find out how many threads can be used based on computer resources?

  • What happens if I use too many threads (I don’t think anything ...)?

Is my idea a good one? Thanks.

+7
java algorithm minimax
source share
6 answers

For a typical computing-related application, a good rule is to use as many threads as you have hardware cores (or hyper-threads). Using more threads than cores will not make your application run faster. Instead, it will cause your application to use more memory than necessary. Each thread usually has a stack of 0.5 to 1 MB ... depending on your hardware and version of Java. If you create too many threads, additional memory usage will lead to a significant performance hit; i.e. more threads => slow program!

Another thing to keep in mind is that Java threads are very expensive to build on a typical JVM. Therefore, if a thread does not work enough (in its life), there is a risk that you spend more time creating threads than you get using several cores in the calculation.

Finally, you may find that the work does not spread evenly across all threads, depending on your minmax algorithm ... and the state of the game.


If I tried to implement this, I would start by implementing as a single-threaded application, and then:

  • compare it to find out how long it will take to calculate more with a batch run,
  • profile to get rid of any bottlenecks
  • repeat the test to see if it is enough already.

If and only if he needs to go faster, I would then study the code and (if necessary) add some monitoring to see how to break the calculation into large enough pieces that will be executed in parallel.

Finally, I would use these results to develop and implement a multi-threaded version.

I would also look at alternatives ... for example using Java 7 fork / join instead of threads.


To answer your direct questions:

Is it efficient to use 25 threads?

Probably not. This would be effective if you had so many cores (unlikely!). And even then you will only get good acceleration from using a large number of threads, if you get more by running parallel operations than you lose because of the overhead associated with the stream. (In other words, it depends on how efficiently you use these threads.)

What does using 25 threads mean?

I assume that you mean that you created and started 25 threads, either explicitly or using some existing thread pool implementation.

But the bottom line is that if you have (say) 4 cores, then no more than 4 of these 25 threads can be executed at a time. The remaining threads will wait ...

Is it 25 times faster (most likely not)? What does it depend? Of course, on a computer, but how do you know how many threads can be used based on computer resources?

The main factor limiting performance is the number of cores. See above.

What happens if I use too many threads (I don’t think anything ...)?

Too many threads means that you are using more memory and that your application is running slower due to competition in memory bandwidth, competition for pages of physical memory, and additional garbage collection. These factors are application and platform dependent and difficult to quantify; that is, to predict or measure.

Depending on the nature of your application (that is, how you implement the algorithms), too many threads can lead to additional lock conflicts and thread context switching. It will also make your application slower.

It is impossible to predict what will happen without seeing your actual code. But the number of cores gives a theoretical upper bound on how much accelerates. If you have 4 cores, then you cannot get more than 4x speedup with multi-threaded access.

+12
source share

So, the answers to threading are given in order, but it seemed to me that they were overlooking the alpha-beta function for minimax search.

If you start a thread for each "next transition" from your current position, then the communication of these threads with each other is slower and painful to write correctly. But, if they can’t talk to each other, then you won’t get the depth increase that comes from trimming alpha-beta until one level goes down.

This will act against the effectiveness of the result.

For general cases of improving computation time, the best case tends to be 1 thread per core, or simply assigning tasks to a thread if they all have the same time (for example, matrix multiplication) or have a “set” of tasks, with each thread capturing the next non-running when he finishes his current task. (this has some locking tasks, but if they are small compared to the resolution cost, this is very effective).

So, for a quad-core system and ~ 25 natural tasks, you can hope for acceleration in the range of 3.5-4x. (you would do 4 in parallel ~ 5 times, and then end randomly). But in the minimax case, you have lost the alpha-beta-trimming aspect, which, as I understand it, reduces the "effective width" from N to about sqrt (N). For the case of ~ 25, this means an effective branching ratio of 5. This means that using 4 cores and skipping pruning for the first level can hurt you.

So where are we being left?

  • Unsubscribe from mutli-thread. or,
  • based on available cores. Up to 4 times faster, as well as up to sqrt (25) - 5 times slower. or,
  • go multi-thread, but distribute beta versions to your threads. This will require some locking code, but hopefully it will be too expensive. You decrease alpha-beta trimming efficiency, since you will look for subtrees that you would not look in the strict left-right passage, but any thread that happens when searching for redundant areas is still a little worse than the kernel does nothing. Thus, redundant searches should be more than offset by additional useful work. (but it’s much harder to code then a simple stream mapping & lt - →). The real problem here may be to be / find someone who is really experiencing both alpha beta pruning and multithreading for speed. This will not hit me like code that I would trust many people to write correctly. (for example, I had written many multi-threaded programs and several minimax searches at the time, but I don’t know about the top of the head if you need to distribute beta versions or alpha or both between streams to search from the top node).
+3
source share

Like all my friends said they use as many threads as your computer has capacity.

but adding them to you, you should go with the improvement of the algorithm.

for example, in 5x5 tic tac toe both get 12 or 13 moves. therefore, the number of possible movements is nCr (combination equation) base 25C12 = 5,200,300. so now you have reduced the number of threads that are now going to the best choice, you have a better way to find the best solution, only 12 (position to win) and 12, to lose the worst state, everyone else is a jitter condition. so now you can just check these 12 conditions from the threads and leave an extra combination with the calculations you need to create 12! * 12 threads, which are very low compared to 25 !.

therefore, your thread number will decrease, you can think about it to reduce the number of threads.

when as you move your movements you can go with alpha-beta cropping so you can improve your algorithm.

+3
source share

If you use streams, then to prevent memory loss, simply use them for the first mini-max calls, and then combine the result of the stream to get the result. This is a drain if you use 25 threads or something so large because the available cores are smaller than what you can do is plan only the thread threads, equivalent to the available cores at a time in different states, and combine all the results in end.

Here is the pseudo code: -

int miniMax(State,Player,depth) { // normal minimax code } State ParaMiniMax(State,Player) { int totalThreads = Runtime.getRuntime().availableProcessors()); NextStates = getNextStates(State); while(NextStates.size()>0) { k = totalThreads; while(k>0 && NextStates.size>0) { //Schedule thread with nextState. with run calling miniMax with other player //Store (score,state) in Result List k--; NextStates.removeTop(); } wait(); // waits for threads to complete } if(player==max) { return(maxScore(Result).State); } else return(minScore(Result).State); } 
+1
source share

You should use only the number of threads equal to the number of cores that the machine has. Scheduling tasks for these threads is another matter.

0
source share

Consider the symmetry of your problem. In fact, there is only a very limited number of “unique” initial movements - the rest is the same, but for reflection or rotation (therefore, of the same strategic importance). The unique steps for a 5x5 board are:

 xxx.. .xx.. ..x.. ..... ..... 

Or just 6 initial steps. Bam - you just reduced the complexity to> 4x without threads.

As others said, more threads than you have cores, usually does not help to speed up if individual threads do not spend time "waiting" - on inputs, memory access, other results. Perhaps it would be very useful to start six threads.

To convince you of symmetry, I mark equivalent positions with the same number - see if you agree

 12321 24542 35653 24542 12321 

This is the same when you rotate a few 90 degrees or reflect a diagonal or left right up.

PS I understand that in fact you are not answering the question you asked, but I believe that it very directly affects your main question - "how can I effectively solve 5x5 tic-tac-toe completely." Therefore, I will not be upset if you choose a different answer, but I hope you will accept my advice.

0
source share

All Articles