When is a thread pool used?

So, I have an understanding of how Node.js works: it has one listener thread that receives the event and then delegates it to the working pool. The workflow notifies the listener of completion, and then the listener returns a response to the caller.

My question is this: if I get on the HTTP server in Node.js and call sleep on one of my routed path events (for example, "/ test / sleep"), the whole system will stop. Even the flow of one listener. But I realized that this code is happening in the working pool.

Now, on the contrary, when I use Mongoose to communicate with MongoDB, reading a database is an expensive I / O operation. Node seems to be able to delegate work to a thread and receive a callback when it completes; the time taken to download from the database does not seem to block the system.

How does Node.js decide to use a thread pool thread and a listener thread? Why can't I write event code that sleeps and only blocks the thread pool thread?

+70
events
Mar 25 '14 at 19:20
source share
3 answers

Your understanding of how node works is wrong ... but this is a common misconception because the reality of the situation is actually quite complicated and usually comes down to meaningful little phrases like “node” “inverted”, which simplifies things.

For now, we will ignore explicit multi-processor / multi-threaded processing through cluster and webworker-threads , and just talk about a typical non-thread node.

Node runs in a single event loop. This is single threaded and you only get one thread. All javascript that you write is executed in this loop, and if a lock operation occurs in this loop, then it locks the entire loop and nothing will happen until it completes. This is typically the single-threaded nature of a node that you hear so much about. But this is not the whole picture.

Some functions and modules, usually written in C / C ++, support asynchronous I / O. When you call these functions and methods, they internally control the transfer of the call to the workflow. For example, when you use the fs module to request a file, the fs module passes this call to a worker thread, and this worker waits for a response, which then returns to the event loop that was showing it without it. All this is distracted from you, the node developer, and some of them are distracted from module developers using libuv .

As Denis Dollfus noted in the comments (from this answer to a similar question), the strategy used by libuv to achieve asynchronous I / O is not always a thread pool, especially in the case of the http module, a different strategy is currently used. For our purposes, it is mainly important to note here how the asynchronous context is achieved (using libuv) and that the thread pool supported by libuv is one of several strategies offered by this library to achieve asynchrony.




In the main related touch, there is a much deeper analysis of how node achieves asynchrony, as well as some potential problems associated with this and how to solve them, in this wonderful article . Most of it extends what I wrote above, but additionally points out:

  • Any external module that you include in your project that uses its own C ++ and libuv will most likely use a thread pool (I think: access to the database)
  • libuv has a default thread pool size of 4 and uses a queue to control access to the thread pool. The result is that if you have 5 lengthy database queries, they all go at the same time, one of them (and any other asynchronous action based on the thread pool) will wait for these queries to complete before they start.
  • You can reduce this by increasing the size of the thread pool through the UV_THREADPOOL_SIZE environment variable while you do this before the thread pool is created and created: process.env.UV_THREADPOOL_SIZE = 10;



If you want traditional multi-processor processing or multi-threading in node, you can get it through the built-in cluster module or other other modules, such as the aforementioned webworker-threads , or you can fake it by doing something to split your work manually using setTimeout or setImmediate or process.nextTick to pause your work and continue in a later cycle so that other processes terminate (but this is not recommended).

Please note: if you write long run / lock code in javascript, you are probably wrong. Other languages ​​will work much more efficiently.

+152
Mar 25 '14 at 19:44
source share

So, I have an understanding of how Node.js works: it has one listener thread that receives the event and then delegates it to the working pool. The workflow notifies the listener of completion, and then the listener returns a response to the caller.

This is not very accurate. Node.js has only one "worker" thread executing javascript. There are threads inside node that handle I / O processing, but thinking of them as “workers” is a misconception. Actually, there is only I / O processing and a few other details of the node's internal implementation, but as a programmer you cannot influence their behavior, except for a few odd parameters like MAX_LISTENERS.

My question is this: if I get to the HTTP server in Node.js and call sleep on one of my routed path events (for example, "/ test / sleep"), the whole system will stop. Even the flow of one listener. But I realized that this code is happening in the working pool.

JavaScript has no sleep mechanism. We could discuss this more specifically if you posted a code snippet of what you think is "sleeping." There is no such function to trigger a simulation, e.g. time.sleep(30) in python. There setTimeout but this is basically NOT sleeping. setTimeout and setInterval explicitly release , rather than block, an event loop, so that other bits of code can be executed in the main thread of execution. The only thing you can do is tackle the CPU cycle with in-memory computation, which will really starve the main thread of execution and reject your program without response.

How does Node.js decide to use a thread pool thread and a listener thread? Why can't I write event code that sleeps and only blocks the thread pool thread?

The IO network is always asynchronous. The end of the story. Disk IO has both synchronous and asynchronous APIs, so there is no “solution”. Node.js will behave according to the basic API functions that you call sync vs normal async. For example: fs.readFile vs fs.readFileSync . For child processes, there are also separate APIs child_process.exec and child_process.execSync .

The thumb rule always uses asynchronous APIs. Possible reasons for using the synchronization APIs are the initialization code in the network service, before it listens for connections or in simple scripts that do not accept network requests for build tools, etc.

+13
Mar 25 '14 at 19:38
source share

This misunderstanding is simply the difference between proactive multitasking and collaborative multitasking ...

Sleep turns off the entire carnival, because there is really one line on all raids, and you have closed the gate. Think of it as a “JS interpreter and some other things” and ignore the threads ... there is only one thread for you, ...

... therefore do not block it.

0
Apr 02 '17 at 22:56 on
source share



All Articles