The “user time” or “wall clock-hours” time spent on a “wait (timeout)” call is usually the timeout value plus the time until the thread is rescheduled for execution and executed.
See the Javadoc for the Object.wait (long timeout) method:
Then the thread T [...] is again turned on for scheduling flows. Then it competes in the usual way with other threads for the right to synchronize with the object;
Thus, there is no guarantee for the "real-time" operation, it is rather a kind of "best attempt", depending on the current system load and, possibly, also on other lock dependencies in your application. Therefore, if the system is under heavy load or your application processes a lot of threads, the wait can take significantly longer.
PS
The @ nathan-hughes quote mentioned in his commentary on your question is probably the key suggestion in the Javadoc of the "wait" method: The specified amount of real time has elapsed, more or less .
PPS
Based on a change in your question with additional context information (“very complex software”, “high traffic”, “high expectations”): you should find all the uses of your obj object as a lock and determine how these interactions interact with each other .
It can get really complicated. Here it will try to outline a “simple” scenario of what might go wrong with only two simple threads, for example, this:
// thread 1 synchronized (obj) { // wait 1000ms obj.wait(1000); } // check for overwait // thread 2, after, let say 500 ms synchronized (obj) { obj.notify(); }
A simple script, everything is in order, the order of execution is approximately:
- 0ms: T1 uses lock on 'obj'
- 0ms: T1 is registered as a pending "obj" and is excluded from thread scheduling. Although an exception to thread scheduling, the lock on 'obj' is released again (!)
- 500ms: T2 uses a lock on 'obj', notifies one thread that is waiting for notification (a thread is selected based on thread scheduling settings) and releases the lock on 'obj'
- 500ms + X: T1 is turned on again for scheduling threads, it waits until it intercepts the lock on 'obj' (!) , After which it completes its lock and releases the 'OBJ lock.
These are just 2 simple threads and synchronized . Let it be more complicated with poorly written code . What if the 2nd stream is something like this:
In this case, despite the fact that T1 received a notification (or possibly a wait time), he needs to wait until he again restores the lock on “obj” , which is still held by T2 for as long as the complex operation is performed ( step 3 in the previous list)! This may take up to several seconds.
Even harder: we return to our initial simple streams T1 and T2, but add a third stream:
The order of execution may become approximately:
- 0ms: T1 uses lock on 'obj'
- 0ms: T1 is registered as a pending "obj" and is excluded from thread scheduling. Although an exception to thread scheduling, the lock on 'obj' is released again (!)
- 500ms: T2 uses a lock on 'obj', notifies one thread that is waiting for notification (a thread is selected based on thread scheduling settings) and releases the lock on 'obj'
- 500ms + X: T2 is turned on again for scheduling threads, but does not get a lock on 'obj' because
- 500ms + X: T3 is scheduled by the thread scheduler before T1, and it uses the lock on 'obj' (!) And starts performing a complicated operation. T1 can do nothing but wait!
- 500ms + MANY: T3 * releases the lock on 'obj'.
- 500ms + MANY: T1 overwrites the lock on 'obj' (!) , Then exits its synchronized block and releases the lock on 'obj'.
It only scratches the surface of what can happen in your "very sophisticated software," with "high traffic." Add more streams, possibly poorly encoded (for example, do too much in "synchronized" blocks), high traffic, and you can easily get the answers you mentioned.
OPTIONS
How to solve this ... depends on the purpose and complexity of your software, there is no simple plan. No more can be said based on the information available.
Perhaps re-analyzing the code with a pen and paper is enough, perhaps profiling can help you find locks, maybe you can get the necessary information about current locks through JMX or a dump stream (through a signal, jconsole, jcmd, jvisualvm), or by monitoring using Java Mission Control and Java Flight Recording (features available with ... JDK 7u40, I think).
You asked in a comment if Thread.sleep(timeout) would help: you can't say it without additional information. Maybe this will help. Or perhaps retentor locks or other lock options (see java.util.concurrent , java.util.concurrent.atomic , java.util.concurrent.locks packages) would be more appropriate. It depends on your code, your use case, and your version of Java.
If the GC is not a problem (see below) and you have analyzed the code, it “looks great”, and you think that the reason is high traffic, you can also think about the possibility of shifting the lock or locking the lock. See the Java 7 JVM options for more details (this article provides links to Java 8 JVM options).
GARBAGE COLLECTION
By the way, the “high traffic” was supposed to make me ask about this earlier: garbage collection, did you control it? If the setup is not configured / configured properly, the GC can also often lead to very significant pauses! (I have this week this case, 15-30 seconds for a full GC ...)