Why are threads so expensive that a non-blocking event-driven IO is better in tests

I recently started learning node.js, a javascript library on top of V8, known for its non-blocking IO and incredible speed.

As far as I understand, node does not wait for an IO response, but starts an event loop (similar to a game loop) that continues to check for incomplete operations and continues / ends them as soon as the IO answers. node performance compared to Apache HTTPD with node significantly faster when using less memory.

Now, if you read about Apache, you will find out that it uses 1 thread for the user, which supposedly slows it down significantly, and here my question appears:

If you compare threads with what node does inside its event loop, you begin to see similarities: both are abstractions of an incomplete process waiting for a resource to respond, and check whether the operation was performed regularly, and then do not occupy the processor for a certain period of time (by at least I think that a good blocking API falls asleep for a few milliseconds before re-checking).

Now, where is this startling critical difference that makes threads much worse?

+7
source share
2 answers

The difference here is context switching. To switch threads to the OS, you need:

  • saving instruction pointer (CPU running)
  • saving the CPU register (it may not be necessary if the thread made a blocking call, but is necessary if it is unloaded)
  • replacing call stacks. Even if the stacks are in the same virtual memory space, it is at least one record, and some readings are even applied to micropins (fibers).
  • in case of replacement by another process, swapping to kernel mode, updating the virtual memory table and returning to user mode.

In case of event queue:

  • status is updated. This should happen anyway.
  • returns an event handler. Instead of replacing call stacks, the current call stack is displayed.
  • The event queue is checked for pending requests. If there is no pending request, the application waits. This can be done by sleeping multiple times (as suggested by the OP) or (better) by making a blocking call to the event queue. If the event queue (for example, a set of TCP sockets) is controlled by the OS, then the OS is responsible for notifying the application of a new event (the socket can receive more data).

If the server is heavily loaded, the only service task of the event queue is to return the handler, read the queue, and call the handler. The threading approach has additional overhead when replacing threads.

In addition, as PST mentioned, the threading approach introduces the need for blocking. The lock itself is cheap, but waiting for the resource to exit by another thread requires an additional context switch, since the wait thread cannot continue. It is even possible that the thread be replaced in order to obtain a lock only in order to swap several clock cycles later, because it also needs to lock another resource. Compare how much the OS has done (reading the tread queue and swapping the call stack, at least), how much has been done by the thread (returning from the call and another call).

+10
source

In one aspect, this depends on the implementation of a thread specific to that language. In general, however, this is creating a thread, which is an expensive part, not the work of a thread. Thus, some languages ​​(such as .Net) support a thread-stream pool by simply laying around, so you can grab one that is essentially already created, which reduces costs.

The problem with threads also, according to the professor, is that each language has an equivalent to the Thread.Yield () function, but no one actually uses it; therefore, each thread that you encounter is extremely aggressive in planning, which establishes all kinds of wars between mutexes and sempaphores; some threads never start because of the level of aggression used, which in itself is a problem.

Streams have the advantage that they unload functionality from other loops, such as the GUI loop, by increasing the functionality. Events, as far as I know, are still executed in a single thread (unless specifically mentioned).

0
source

All Articles