Mysql select the number of rows between time intervals

I am trying to get the total number of rows between a specific time or time. Basically, let's say the following table:

CREATE TABLE IF NOT EXISTS `downloads` ( `id` int(7) NOT NULL AUTO_INCREMENT, `stuff_id` int(7) NOT NULL, `user_id` int(7) NOT NULL, `dl_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; 

And this table is populated every time someone loads something.

So I really need to get a list of users (user_id) who have done more than, for example, 100 downloads over time intervals, for example, 24 hours. Not in the last 24 hours, but in that exact period of time, even if it was during Christmas last year =)

Any ideas at all ?!

+6
sql mysql
source share
4 answers

Ok, I understand that I was a little late, but I still wanted to post my answer :-)

What you need can be done using a subquery, but a large table may take age ...

Reflecting on the issue, I came to two different approaches.

One of them has already been considered in other answers, it works, starting from a certain point in time, looking at an interval that starts at that time, and then looks at a gap of equal duration, which immediately follows it. This leads to clear, understandable results and is likely to require (for example, the user should not exceed 100 downloads per calendar day). This, however, would completely miss the situation when the user makes 99 downloads an hour before midnight and another 99 in the first hour of a new day.

So, if the desired result is more related to the "top ten lists of bootloaders", then this is a different approach. The results here may not be as clear at first glance, because a single download can count on several intervals. This is because the intervals will (and should) overlap.

Here is my setup. I created a table from your statement and added two indexes:

 CREATE INDEX downloads_timestamp on downloads (dl_date); CREATE INDEX downloads_user_id on downloads (user_id); 

The data that I inserted into the table:

 SELECT * FROM downloads; +----+----------+---------+---------------------+ | id | stuff_id | user_id | dl_date | +----+----------+---------+---------------------+ | 1 | 1 | 1 | 2011-01-24 09:00:00 | | 2 | 1 | 1 | 2011-01-24 09:30:00 | | 3 | 1 | 1 | 2011-01-24 09:35:00 | | 4 | 1 | 1 | 2011-01-24 10:00:00 | | 5 | 1 | 1 | 2011-01-24 11:00:00 | | 6 | 1 | 1 | 2011-01-24 11:15:00 | | 7 | 1 | 1 | 2011-01-25 09:15:00 | | 8 | 1 | 1 | 2011-01-25 09:30:00 | | 9 | 1 | 1 | 2011-01-25 09:45:00 | | 10 | 1 | 2 | 2011-01-24 08:00:00 | | 11 | 1 | 2 | 2011-01-24 12:00:00 | | 12 | 1 | 2 | 2011-01-24 12:01:00 | | 13 | 1 | 2 | 2011-01-24 12:02:00 | | 14 | 1 | 2 | 2011-01-24 12:03:00 | | 15 | 1 | 2 | 2011-01-24 12:00:00 | | 16 | 1 | 2 | 2011-01-24 12:04:00 | | 17 | 1 | 2 | 2011-01-24 12:05:00 | | 18 | 1 | 2 | 2011-01-24 12:06:00 | | 19 | 1 | 2 | 2011-01-24 12:07:00 | | 20 | 1 | 2 | 2011-01-24 12:08:00 | | 21 | 1 | 2 | 2011-01-24 12:09:00 | | 22 | 1 | 2 | 2011-01-24 12:10:00 | | 23 | 1 | 2 | 2011-01-25 14:00:00 | | 24 | 1 | 2 | 2011-01-25 14:12:00 | | 25 | 1 | 2 | 2011-01-25 14:25:00 | +----+----------+---------+---------------------+ 25 rows in set (0.00 sec) 

As you can see, all downloads occurred either yesterday or today and were performed by two different users.

Now we need to remember the following: there is (mathematically) an infinite number of 24-hour intervals (or intervals of any other duration) between '2011-01-24 0:00' and '2011-01 -25 23:59:59'. But since the server’s accuracy is one second, it is up to 86,400 intervals:

 First interval: 2011-01-24 0:00:00 -> 2011-01-25 0:00:00 Second interval: 2011-01-24 0:00:01 -> 2011-01-25 0:00:01 Third interval: 2011-01-24 0:00:02 -> 2011-01-25 0:00:02 . . . 86400th interval: 2011-01-24 23:59:59 -> 2011-01-25 23:59:59 

Thus, we could use a loop to iterate over all these intervals and calculate the number of downloads per user and per interval. Of course, not all intervals are of the same interest for us, so we can skip some of them using timestamps in the table as the "beginning of the interval."

Here is what the following query does. It uses each load timestamp in the table as the “start of interval”, adds the duration of the interval, and then requests the number of downloads per user for that interval.

 SET @duration = '24:00:00'; SET @limit = 5; SELECT * FROM (SELECT t1.user_id, t1.dl_date startOfPeriod, ADDTIME(t1.dl_date,@duration) endOfPeriod, (SELECT COUNT(1) FROM downloads t2 WHERE t1.user_id = t2.user_id AND t1.dl_date <= t2.dl_date AND ADDTIME(t1.dl_date,@duration) >= t2.dl_date) count FROM downloads t1) t3 WHERE count > @limit; 

Here is the result:

 +---------+---------------------+---------------------+-------+ | user_id | startOfPeriod | endOfPeriod | count | +---------+---------------------+---------------------+-------+ | 1 | 2011-01-24 09:00:00 | 2011-01-25 09:00:00 | 6 | | 1 | 2011-01-24 09:30:00 | 2011-01-25 09:30:00 | 7 | | 1 | 2011-01-24 09:35:00 | 2011-01-25 09:35:00 | 6 | | 1 | 2011-01-24 10:00:00 | 2011-01-25 10:00:00 | 6 | | 2 | 2011-01-24 08:00:00 | 2011-01-25 08:00:00 | 13 | | 2 | 2011-01-24 12:00:00 | 2011-01-25 12:00:00 | 12 | | 2 | 2011-01-24 12:01:00 | 2011-01-25 12:01:00 | 10 | | 2 | 2011-01-24 12:02:00 | 2011-01-25 12:02:00 | 9 | | 2 | 2011-01-24 12:03:00 | 2011-01-25 12:03:00 | 8 | | 2 | 2011-01-24 12:00:00 | 2011-01-25 12:00:00 | 12 | | 2 | 2011-01-24 12:04:00 | 2011-01-25 12:04:00 | 7 | | 2 | 2011-01-24 12:05:00 | 2011-01-25 12:05:00 | 6 | +---------+---------------------+---------------------+-------+ 12 rows in set (0.00 sec) 
+6
source share

This returns a list of user_id that has made over 100 downloads for any period of 1 day:

 SELECT user_id, count(user_id) as downloads_count, DATE(dl_date) FROM downloads GROUP BY user_id, DATE(dl_date) HAVING count(user_id) > 100; 
+2
source share

If you have a period that is less than or equal to 24 hours:

 SET @period_start='2010-10-10 06:00:00'; SET @period_end='2010-10-11 05:59:59'; 

then

 SELECT user_id, COUNT(id) AS num FROM downloads WHERE dl_date>= @period_start AND dl_date<= @period_end GROUP BY user_id HAVING num> 100; 

But if you have a period that is more than 24 hours:

 SET @period_start='2010-10-10 06:00:00'; SET @period_end='2011-09-17 13:15:12'; 

How do you want to calculate your download number? This is in 24h pieces from @period_end or from @period_start. Or do you just need the last 24 hour piece?

+1
source share

You want to filter out two date values ​​using the BETWEEN group on user_id, and then use HAVING to filter the grouped results.

Three parameters, --Date1--, --Date2-- and --Threshhold -

 select user_id , count(*) from downloads where dl_date between --Date1-- and --Date2-- group by user_id having count(*) > --Threshhold-- 
0
source share

All Articles