Understanding Threads + Asynchronous

So, I have a program that I did to send a URL (e.g. 10,000+) from GET, and I need it to be as fast as possible. When I first created the program, I just put the connections in a for loop, but it was very slow because it had to wait for each connection to complete before continuing. I wanted to make it faster, so I tried using streams, and it made it a little faster, but I'm still not satisfied.

I assume that the correct way to do this and do it very quickly is to use an asynchronous connection and connection to all URLs. Is this the right approach?

Also, I am trying to understand the threads and how they work, but I cannot get them. The computer I'm working on has a quad-core Intel Core i7-3610QM processor. According to the Intel site for specifications for this processor, it has 8 threads. Does this mean that I can create 8 threads in a Java application and all of them will be executed simultaneously? Not more than 8 and there will be no increase in speed?

What exactly is the number next to the Threads in the task manager on the Performance tab? Currently, my task manager shows Threads as more than 1000. Why is this number and how can it go even through 8 if it supports my entire processor? I also noticed that when I tried my program with 500 threads as a test, the number in the task manager increased by 500, but it had the same speed as if I installed it instead of 8 threads. So, if the number increases according to the number of threads that I use in my Java application, then why is the speed the same?

In addition, I tried to do a small test with threads in Java, but the output does not make sense to me. Here is my test class:

import java.text.SimpleDateFormat; import java.util.Date; public class Test { private static int numThreads = 3; private static int numLoops = 100000; private static SimpleDateFormat dateFormat = new SimpleDateFormat("[hh:mm:ss] "); public static void main(String[] args) throws Exception { for (int i=1; i<=numThreads; i++) { final int threadNum = i; new Thread(new Runnable() { public void run() { System.out.println(dateFormat.format(new Date()) + "Start of thread: " + threadNum); for (int i=0; i<numLoops; i++) for (int j=0; j<numLoops; j++); System.out.println(dateFormat.format(new Date()) + "End of thread: " + threadNum); } }).start(); Thread.sleep(2000); } } } 

It produces such a result as:

 [09:48:51] Start of thread: 1 [09:48:53] Start of thread: 2 [09:48:55] Start of thread: 3 [09:48:55] End of thread: 3 [09:48:56] End of thread: 1 [09:48:58] End of thread: 2 

Why does the third stream begin and end immediately, and the first and second - for 5 seconds? If I add three more threads, then the same thing happens for all threads above 2.

Sorry if this was a long read, I had a lot of questions. Thanks in advance.

+6
source share
3 answers

Your processor has 8 cores, not threads. This actually means that only 8 things can work at any time. This does not mean that you are limited to only 8 threads.

When a thread synchronously opens a connection to a URL, it will often sleep while it waits for the remote server to return to it. Although this thread has slept, other threads may do the work. If you have 500 threads and all 500 are asleep, you are not using any of your processor cores.

On the flip side, if you have 500 threads and all 500 threads want to do something, they cannot all work right away. There is a special tool to handle this scenario. Processors (or, most likely, the operating system or some combination thereof) have a scheduler that determines which threads will actively work on the processor at any given time. There are many different rules, and sometimes random activity, that controls the work of these planners. This may explain why in the above example, thread 3 always seems to end first. Perhaps the scheduler prefers thread 3 because it was the last thread to be scheduled by the main thread, sometimes it is impossible to predict the behavior.

Now answer the question about performance. If opening a connection never included a dream, then it would not matter if you process things synchronously or asynchronously, you will not be able to get a performance boost above 8 threads. In fact, a lot of the time spent opening a connection is spent sleeping. The difference between asynchronous and synchronous is how to handle the time spent sleeping. Theoretically, you should have almost equal performance between them.

With a multi-threaded model, you simply create more threads than there are kernels. When the threads fall into sleep, they allow other threads to work. This can sometimes be easier to handle because you do not need to write any planning or interaction between threads.

With an asynchronous model, you only create one thread per core. If this thread needs to sleep, it does not sleep, but in fact it should have a code to switch to the next connection. For example, suppose there are three steps when opening a connection (A, B, C):

 while (!connectionsList.isEmpty()) { for(Connection connection : connectionsList) { if connection.getState() == READY_FOR_A { connection.stepA(); //this method should return immediately and the connection //should go into the waiting state for some time before going //into the READY_FOR_B state } if connection.getState() == READY_FOR_B { connection.stepB(); //same immediate return behavior as above } if connection.getState() == READY_FOR_C { connection.stepC(); //same immediate return behavior as above } if connection.getState() == WAITING { //Do nothing, skip over } if connection.getState() == FINISHED { connectionsList.remove(connection); } } } 

Please note that in no case does the thread sleep, so it makes no sense to have more threads than your kernel. Ultimately, the question of whether to use a synchronous approach or an asynchronous approach is a matter of personal preference. Only at absolute extremes will there be differences in performance between them, and you will need to spend a long time profiling to get to the point where this is a bottleneck in your application.

It looks like you are creating a lot of threads and not getting a performance boost. There may be several reasons for this.

  • It is possible that you made the connection really awake, in which case I would not expect performance to increase in 8 threads. I do not think that's possible.
  • It is possible that all threads share a common resource. In this case, other threads cannot work because the sleeping thread has a shared resource. Is there any object that shares all threads? Does this object have any synchronized methods?
  • Perhaps you have your own synchronization. This may create the problem mentioned above.
  • Perhaps each thread should do some installation / distribution work that wins the advantage you get using multiple threads.

If I were you, I would use the JVisualVM tool to profile your application when working with a small number of threads (20). JVisualVM has a beautiful color flow chart that will show when threads are running, blocking or sleeping. This will help you understand the thread / kernel relationship, as you will see that the number of threads running is less than the number of cores you have. In addition, if you see a lot of blocked threads, this can lead you to your bottleneck (if you see that many blocked threads use JVisualVM to dump the thread at this point in time and see what the threads are blocked on).

+9
source

Some concepts:

There can be many threads in the system, but only some of them (max. 8 in your case) will be “scheduled” on the CPU at any given time. Thus, you cannot get more performance than 8 threads running in parallel. In fact, performance is likely to decrease as the number of threads increases due to the work involved in creating, destroying, and managing threads.

Themes can be in different states: http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Thread.State.html Of these states, RUNNABLE threads allow you to get a piece of processor time. The operating system decides to assign processor time to threads. In a typical system with 1000 threads, it can be completely unpredictable when a certain thread gets processor time and how long it will be on the CPU.

About the problem you are solving:

It seems that you understood the correct decision - execute parallel asynchronous network requests. However, in practical terms, starting at 10,000+ threads and that many network connections, at the same time, can be voltage in system resources, and it may just not work. There are many suggestions for asynchronous I / O using Java in this post . (Tip. Do not look only at the accepted answer)

+1
source

This solution more specifically relates to the general problem of trying to make 10k requests as quickly as possible. I suggest you drop Java HTTP libraries and use Apache HttpClient instead . They have several suggestions for maximum performance that may be helpful. I heard that the Apache HttpClient library is faster as well as lighter and less overhead.

0
source

Source: https://habr.com/ru/post/927892/


All Articles