MySQL: a group of consecutive days and groups of accounts

I have a database table that stores all user checks in cities. I need to know how many days the user was in the city, and then how many visits the user made in the city (a visit consists of consecutive days spent in the city).

So, consider that I have the following table (simplified, containing only DATETIME - same user and city):

  datetime ------------------- 2011-06-30 12:11:46 2011-07-01 13:16:34 2011-07-01 15:22:45 2011-07-01 22:35:00 2011-07-02 13:45:12 2011-08-01 00:11:45 2011-08-05 17:14:34 2011-08-05 18:11:46 2011-08-06 20:22:12 

The number of days during which this user was in this city will be 6 (30.06, 01.07, 02.07, 01.08, 05.08, 06.08).

I thought about this using SELECT COUNT(id) FROM table GROUP BY DATE(datetime)

Then, for the number of visits that this user made to this city, the request should return 3 (06/30/02/07, 01/08, 08/05/06/08).

The problem is that I have no idea how to build this query.

Any help would be greatly appreciated!

+8
mysql datetime gaps-and-islands
source share
5 answers

You can find the first day of each visit by finding checks where there were no checks the day before.

 select count(distinct date(start_of_visit.datetime)) from checkin start_of_visit left join checkin previous_day on start_of_visit.user = previous_day.user and start_of_visit.city = previous_day.city and date(start_of_visit.datetime) - interval 1 day = date(previous_day.datetime) where previous_day.id is null 

There are several important parts to this request.

Firstly, each check is connected to any check from the previous day. But since this is an external join, if there were no checks on the previous day, the right side of the join will have NULL results. WHERE filtering occurs after merging, so it only retains controls on the left side, where they are not on the right side. LEFT OUTER JOIN/WHERE IS NULL really convenient for finding where not .

He then counts the individual dates to make sure he does not double the score if the user checked several times on the first day of the visit. (I actually added this part to the editing when I discovered a possible error.)

Edit: I am just re-reading the suggested query for the first question. In the request, you will receive the number of checks for a given date instead of the number of dates. I think you want something like this:

 select count(distinct date(datetime)) from checkin where user='some user' and city='some city' 
+10
source share

Try applying this code to your task -

 CREATE TABLE visits( user_id INT(11) NOT NULL, dt DATETIME DEFAULT NULL ); INSERT INTO visits VALUES (1, '2011-06-30 12:11:46'), (1, '2011-07-01 13:16:34'), (1, '2011-07-01 15:22:45'), (1, '2011-07-01 22:35:00'), (1, '2011-07-02 13:45:12'), (1, '2011-08-01 00:11:45'), (1, '2011-08-05 17:14:34'), (1, '2011-08-05 18:11:46'), (1, '2011-08-06 20:22:12'), (2, '2011-08-30 16:13:34'), (2, '2011-08-31 16:13:41'); SET @i = 0; SET @last_dt = NULL; SET @last_user = NULL; SELECT v.user_id, COUNT(DISTINCT(DATE(dt))) number_of_days, MAX(days) number_of_visits FROM (SELECT user_id, dt @i := IF(@last_user IS NULL OR @last_user <> user_id, 1, IF(@last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(@last_dt), @i + 1, @i)) AS days, @last_dt := DATE(dt), @last_user := user_id FROM visits ORDER BY user_id, dt ) v GROUP BY v.user_id; ---------------- Output: +---------+----------------+------------------+ | user_id | number_of_days | number_of_visits | +---------+----------------+------------------+ | 1 | 6 | 3 | | 2 | 2 | 1 | +---------+----------------+------------------+ 

Explanation:

To understand how this works, check the subquery, here it is.

 SET @i = 0; SET @last_dt = NULL; SET @last_user = NULL; SELECT user_id, dt, @i := IF(@last_user IS NULL OR @last_user <> user_id, 1, IF(@last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(@last_dt), @i + 1, @i)) AS days, @last_dt := DATE(dt) lt, @last_user := user_id lu FROM visits ORDER BY user_id, dt; 

As you can see, the query returns all rows and ranks by the number of visits. This is a well-known variable-based ranking method; note that strings are sorted by user and date fields. This query calculates user visits and displays the following data set, where the days column provides a rating for the number of visits -

 +---------+---------------------+------+------------+----+ | user_id | dt | days | lt | lu | +---------+---------------------+------+------------+----+ | 1 | 2011-06-30 12:11:46 | 1 | 2011-06-30 | 1 | | 1 | 2011-07-01 13:16:34 | 1 | 2011-07-01 | 1 | | 1 | 2011-07-01 15:22:45 | 1 | 2011-07-01 | 1 | | 1 | 2011-07-01 22:35:00 | 1 | 2011-07-01 | 1 | | 1 | 2011-07-02 13:45:12 | 1 | 2011-07-02 | 1 | | 1 | 2011-08-01 00:11:45 | 2 | 2011-08-01 | 1 | | 1 | 2011-08-05 17:14:34 | 3 | 2011-08-05 | 1 | | 1 | 2011-08-05 18:11:46 | 3 | 2011-08-05 | 1 | | 1 | 2011-08-06 20:22:12 | 3 | 2011-08-06 | 1 | | 2 | 2011-08-30 16:13:34 | 1 | 2011-08-30 | 2 | | 2 | 2011-08-31 16:13:41 | 1 | 2011-08-31 | 2 | +---------+---------------------+------+------------+----+ 

Then we group this data set by the user and use the aggregated functions: 'COUNT (DISTINCT (DATE (dt)))' - counts the number of days "MAX (days)" - the number of visits, this is the maximum value for the days field from our subquery.

It's all;)

+3
source share

As an example of the data provided by Devart, the internal "PreQuery" works with sql variables. By default, @LUser is -1 (a likely non-existent user identifier), the IF () test checks for any difference between the last user and the current one. As soon as a new user receives a value of 1 ... In addition, if the last date is more than 1 day from the new registration date, it receives a value of 1. Then the subsequent columns reset @LUser and @LDate for the value of the newly tested input record for the next cycle . Then the external query simply sums them up and counts them for the final correct results in the Devart dataset

 User ID Distinct Visits Total Days 1 3 9 2 1 2 select PreQuery.User_ID, sum( PreQuery.NextVisit ) as DistinctVisits, count(*) as TotalDays from ( select v.user_id, if( @LUser <> v.User_ID OR @LDate < ( date( v.dt ) - Interval 1 day ), 1, 0 ) as NextVisit, @LUser := v.user_id, @LDate := date( v.dt ) from Visits v, ( select @LUser := -1, @LDate := date(now()) ) AtVars order by v.user_id, v.dt ) PreQuery group by PreQuery.User_ID 
+1
source share

for the first subtask:

 select count(*) from ( select TO_DAYS(pd) from p group by TO_DAYS(pd) ) t 
0
source share

I think you should consider changing the structure of the database. You can add visit tables and visit_id to your checklist. Each time you want to register a new registration, you check to see if there is any registration during the day. If so, then you add a new checkin with visit_id from yesterday's check. If not, then you add a new visit to the visits and a new checkin with a new visit_id.

Then you can get the data in one query with something like this: SELECT COUNT(id) AS number_of_days, COUNT(DISTINCT visit_id) number_of_visits FROM checkin GROUP BY user, city

This is not very optimal, but still better than doing anything with the current structure, and it will work. Also, if the results can be separate queries, it will work very quickly.

But, of course, there are drawbacks - you will need to change the structure of the database, make a few more scripts and convert the current data to a new structure (i.e. you will need to add visit_id to the current data).

0
source share

All Articles