CPU report, as always, ok with Riemann

We use Riemann and Riemann-health to monitor our servers. However, now I get quite a few critical warnings about the processor because the processor peaked in a very short time. This is nothing I don’t even need to know about, I think. In my opinion, the constant use of a large number of processors will increase the avg load, which is also reported, and sounds more useful.

I do not want to disable the CPU report, only each level should be considered approved. If possible, I would like to change the events on the Riemann server, so I do not need to change all the servers.

Here is our Riemann configuration: https://gist.github.com/iGEL/e352764a8c559440c851

+6
source share
1 answer

I don’t have a complete solution, but theoretically you should be able to filter events related to the processor through the where function and unconditionally set the state to β€œok” with with as follows:

 (streams (where (service #"cpu") (with :state "ok" index))) 

On the other hand, relying on the average load is not a good idea, since the average load can also mean that a large number of IO processes are expected .

Instead of disabling CPU warnings, you can only alert if the CPU is not in the OK state for more than X time units. Even better, a warning about a higher level metric that represents a problem affecting the client, such as delayed response, HTTP status codes, error levels, etc. After all, if the CPU is high, but there is no effect on the system, the warning is likely to be noise.

0
source

All Articles