This can help describe your script more. Since you are trying to find rare events, I assume that you have a working definition not infrequently (for some problem spaces this is really difficult).
For example, let's say that we have some process that is not a random walk process, such as using a CPU for some service. If you want to detect rare events, you can take advantage of average use, and then look at a few standard deviations. The methods from Statistical Process Control are useful here.
If we have a random wandering process, such as stock prices (a worm may open ... please just assume for the sake of simplicity). The direction of travel from t to t + 1 is random. A random event can be a certain number of consecutive movements in one direction or a large movement in one direction in one time step. See Stochastic calculus for basic concepts.
If the process in step t depends only on step t-1, we can use Markov Chains to simulate the process.
This is a short list of mathematical methods available to you. Now about mechanical learning. Why do you want to use machine learning? (Itβs always good to think to make sure that youβre not complicating the problem too much) Suppose you are doing this and that is the right solution. The actual algorithm you are using is not very important at this stage. What you need to do is determine what a rare event is. Conversely, you can determine what a normal event is and look for things that are not normal. Please note that this is not the same thing. Let's say we create many rare events r1 ... rn. Each of these rare events will have some features associated with it. For example, if a computer fails, functions may appear such as the last time it was seen on the network, its switch port status, etc. This is actually the most important part of machine learning, building training sets. This usually consists of manually marking a set of examples for training the model. After you better understand the space with possibilities, you can prepare another model for marking. Repeat this process until you are satisfied.
Now, if you can define your rare set of events, it may be cheaper to just generate a heuristic. To detect rare events, I always found that this works better.
source share