Grouping consecutive posts in a user feed

I have a custom image message feed. Each user can send individual images, but he can often repeat the action. Say upload a few images in an hour.

How to efficiently create a database table so that when a user sends multiple images (one after another) within one hour - can I easily group these consecutive messages together, eigher on INSERT or SELECT?


Do not offer multitasking form. This is not so: I just described the task in more general terms :)

+4
source share
4 answers

This is a playground:

CREATE TABLE `feed`( `id` INT UNSIGNED NOT NULL AUTO_INCREMENT, `tm` INT UNSIGNED NOT NULL COMMENT 'timestamp', `user_id` INT UNSIGNED NOT NULL COMMENT 'author id', `image` VARCHAR(255) NOT NULL COMMENT 'posted image filename', `group` INT UNSIGNED NULL DEFAULT NULL COMMENT 'post group', PRIMARY KEY(`id`), INDEX(`user_id`), INDEX(`tm`,`group`) ); 

We would like to group messages that are temporarily closed.

First declare the desired granularity: threshold for temporal proximity:

 SET @granularity:=60*60; 

Each line forms a group with a group identifier corresponding to a line identifier (it can also be a timestamp):

 SELECT `g`.`id` AS `group` FROM `feed` `g`; 

Each group contains lines that come from the same user, were published earlier than the former group:

 SELECT `g`.`id` AS `group`, `f`.* FROM `feed` `g` CROSS JOIN `feed` `f` ON (`f`.`user_id` = `g`.`user_id` AND `f`.`tm` BETWEEN `g`.`tm` -@granularity AND `g`.`tm` ) 

Each row belongs to several groups. For each row, we select the "widest" group: it has the largest rowId

 SELECT MAX(`g`.`id`) AS `group`, `f`.* FROM `feed` `g` CROSS JOIN `feed` `f` ON (`f`.`user_id` = `g`.`user_id` AND `f`.`tm` BETWEEN `g`.`tm` -@granularity AND `g`.`tm` ) GROUP BY `f`.`id` 

The last updated group always jumps up (if you sort by group DESC). However, if you want the groups to be constant (for example, so that the elements do not move from one group to another), use MIN instead of MAX :

 SELECT MIN(`g`.`id`) AS `group`, `f`.* FROM `feed` `g` CROSS JOIN `feed` `f` ON (`f`.`user_id` = `g`.`user_id` AND `f`.`tm` BETWEEN `g`.`tm` AND `g`.`tm` +@granularity ) GROUP BY `f`.`id` 

Now we are going to update the group column. First, MySQL cannot update the same table you are reading from. We need a temporary table. Second: we update rows whose group column is NULL, or rows placed later by UNIX_TIMESTAMP()-2*@threshold :

 CREATE TEMPORARY TABLE `_feedg` SELECT MAX(`g`.`id`) AS `group`, `f`.`id` FROM `feed` `g` CROSS JOIN `feed` `f` ON (`f`.`user_id` = `g`.`user_id` AND `f`.`tm` BETWEEN `g`.`tm` -@granularity AND `g`.`tm` ) WHERE `f`.`group` IS NULL OR `f`.`tm` >= (UNIX_TIMESTAMP()-2*@granularity) GROUP BY `f`.`id`; 

And update the group column:

 UPDATE `feed` `f` CROSS JOIN `_feedg` `g` USING(`id`) SET `f`.`group` = `g`.`group`; 

Here's SQLFiddle: http://sqlfiddle.com/#!2/be9ce/15

+2
source

Is it possible to save a timestamp with each message, and then select every element whose timestamp is less than some threshold from the next?

Another idea is to store both a timestamp and a "group number" with each message. Before storing a message, perform SELECT to view messages that were sent in the last n minutes. If you find it, use the same group number for the new message. If you do not, increase the group number for the new message. Then you can select a group number to find the items you need.

+3
source

I assume that the data model will look something like this:

enter image description here

Just be careful that the time difference between messages is greater than TIMESTAMP permission (or be prepared to handle PC violations correctly).

In a DBMS that supports analytic functions, you can easily group messages that are temporarily close to each other. For example, an Oracle query for group messages (for a given user) that arrive at the same time as one another will look like this:

 SELECT T.*, SUM(DIFF) OVER (ORDER BY TIMESTAMP) GROUPING FROM ( SELECT IMAGE.*, CASE WHEN TIMESTAMP <= LAG(TIMESTAMP) OVER (ORDER BY TIMESTAMP) + INTERVAL '1' HOUR THEN 0 ELSE 1 END DIFF FROM IMAGE WHERE USER_ID = :user_id ) T; 

The resulting GROUPING field will identify individual groups of rows whose TIMESTAMP is "fairly close". This query is also quite efficient - it's just scanning a range by PK index. You can play with it in SQL Fiddle .

Unfortunately, MySQL does not support analytic functions, but it will not be difficult for you to do almost the same at the application level. Just SELECT ... ORDER BY TIMESTAMP , linearly cross the results and see what is the difference between the current and previous line.

+2
source

The solution "o_O Tync" will not group elements within 1 hour if they are added, for example: 1:00, 1:40, 2:30. Only the last two will be grouped.

Here is a super-fast Mysql solution without temp tables and joins (from one table).

  CREATE TABLE `feed` (
   `id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
   `tm` INT UNSIGNED NOT NULL COMMENT 'timestamp',
   `user_id` INT UNSIGNED NOT NULL COMMENT 'author id',
   `image` VARCHAR (255) NOT NULL COMMENT 'posted image filename',
   `group` INT UNSIGNED NULL DEFAULT NULL COMMENT 'post group',
   PRIMARY KEY (`id`),
   INDEX (`user_id`),
   INDEX (`tm`,` group`)
   );


 SET @granularity: = 60 * 60;
 UPDATE feed f CROSS JOIN (
   SELECT
     g.id
     @id: = COALESCE (IF (ISNULL (@prev_date) OR ( user_id!=@prev _user_id) OR NOT (@ prev_date-tm BETWEEN 0 AND @granularity), g.id, NULL), @id)
     + least (0, @prev_date: = tm)
     + least (0, @prev_user_id: = user_id) as group_id    
   FROM (SELECT @prev_date: = null, @id: = null, @user_id: = null) r, feed g
   ORDER BY user_id DESC, tm DESC
 ) z USING (id)
 SET f.group = z.group_id;

http://sqlfiddle.com/#!2/02a98/1/0

+1
source

Source: https://habr.com/ru/post/1411882/


All Articles