Find events that occurred during a given period of time x

Let's say I have the following table:

CREATE TABLE `occurences` ( `object_id` int(10) NOT NULL, `seen_timestamp` int(10) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 

which contains the identifier of the object (not unique, it is repeated) and the timestamp when this identifier of the object was discovered.

Observation is performed 24/7 and inserts each occurrence of the identifier of the object with the current timestamp.

Now I want to write a query to select all the identifiers of the objects that were noticed during the 10-minute period, at least 7 times.

It should function as intrusion detection.

A similar algorithm is used in the denyhost script, which checks for invalid SSH logins. If you find the configured number of occurrences within the configured time period, it blocks the IP.

Any good suggestion?

+8
mysql count group-by
source share
3 answers

This should work:

 SET @num_occurences = 7; -- how many occurences should occur in the interval SET @max_period = 10; -- your interval in seconds SELECT offset_start.object_id FROM (SELECT @rownum_start := @rownum_start+1 AS idx, object_id, seen_timestamp FROM occurences, (SELECT @rownum_start:=0) r ORDER BY object_id ASC, seen_timestamp ASC) offset_start JOIN (SELECT @rownum_end := @rownum_end + 1 AS idx, object_id, seen_timestamp FROM occurences, (SELECT @rownum_end:=0) r ORDER BY object_id ASC, seen_timestamp ASC) offset_end ON offset_start.object_id = offset_end.object_id AND offset_start.idx + @num_occurences - 1 = offset_end.idx AND offset_end.seen_timestamp - offset_start.seen_timestamp <= @max_period GROUP BY offset_start.object_id; 

You can move @num_occurences and @num_occurences into your code and set them as your operator’s parameters. Depending on your client, you can also transfer the initialization of @rownum_start and @rownum_end before the request, which can improve the performance of the requests (you should check that, apart from that, it’s just a gut feeling considering the explanation of both versions)

Here's how it works:

It selects the entire table twice and offset_start each row of offset_start to a row in offset_end that has an offset of @num_occurences . (This is done using @rownum_* variables to create an index for each row that mimics the row_number () function, known from other rdbms).
Then it simply checks to see if two lines refer to the same object_id and satisfy the requirements of the period.
Since this is done for each line of occurrence, object_id will be returned several times if the number of occurrences is actually greater than @max_occurences , so it is grouped at the end to return the returned object_id unique

+4
source share

You can try

 SELECT COUNT(seen_timestamp) AS tot FROM occurences WHERE seen_timestamp BETWEEN DATE_ADD(your_dt, INTERVAL -10 MINUTES) AND your_dt GROUP BY object_id HAVING tot >= 7 

I do not understand why you are using int(10) for seen_timestamp : you can use datetime ...

+1
source share

You can use the following instructions:

 SELECT oc1.object_id FROM occurences oc1 JOIN occurences oc2 ON oc1.object_id = oc2.object_id AND oc1.seen_timestamp >= (oc2.seen_timestamp -600) AND oc1.seen_timestamp < oc2.seen_timestamp GROUP BY oc1.object_id, oc1.seen_timestamp HAVING COUNT(oc2.object_id)>=7; 

It is not very fast and not very clean, let me know if anyone will find a better solution!

+1
source share

All Articles