Linux system (Gentoo x64), code - C ++. I have a daemon application, several instances of which run on the same computer. The application is multithreaded. For some time I observed strange delays in its execution.
After entering some debugging code, I came up with a strange thing when several instances of the daemon literally block at the same time, which is allegedly caused by some external reason or something else. To keep things simple, I have this sequence:
- registration time ( t1 )
- mutex lock
- C ++ call
std::list::push_back() / pop_back() (i.e. very simple math) - unlock mutex
- registration time ( t2 )
From time to time, I clearly see that a sequence executed by several independent (!) Processes blocks in step 2 (or probaby in step 4) some excess time relative to the mathematics in step 3 (for example, 0.5 - 1.0 seconds). As evidence, I see that t2 in the logs is literally the same for all processes (different in some microseconds). It seems that some process flows enter the section at relatively different times (I can clearly see the difference of 0.5 - 1 seconds for t1 ), lock the mutex and unlock at the SAME TIME, the allegedly unreasonable amount of time spent in the lock according to the log ( t2 - t1 ). It seems creepy to me. A.
The manifestation of the problem is relatively rare, after about 5-10 minutes with a moderate load. As part of the test, NTP time shifts are not recorded (this was my first idea actually). If it was NTP, there would be no ACTUAL service delays, only erroneous times in the log.
Where to begin? Run the scheduler setup? What could theoretically block an entire multithreaded process on Linux?
c ++ mutex linux-kernel scheduler blocking
neoxic
source share