Inclusion / exclusion of MySQL messages

This post takes a considerable amount of time to print, because I try to be as clear as possible, so please bear with me if it is still unclear.

Basically, I have a message table in the database where users can add privacy settings.

ID | owner_id | post | other_info | privacy_level (int value) 

From there, users can add their privacy data, allowing everyone to see it [privacy_level = 0), friends (privacy_level = 1), anyone (privacy_level = 3), or specific people or filters (privacy_level = 4). For privacy levels that define specific people (4), the query refers to the table "post_privacy_includes_for" in the subquery to find out if the user (or the filter to which the user belongs) exists in the row in the table.

 ID | post_id | user_id | list_id 

In addition, the user has the opportunity to prohibit some people from viewing their post within a larger group by excluding them (for example, by setting the option of viewing for everyone, but hiding it from the stalker user). For this, another reference table "post_privacy_exclude_from" is added - it looks identical to the setting as "post_privacy_includes_for".

My problem is that it does not scale. At all. At the moment, there are about 1-2 million messages, most of which are intended for viewing by all. For each message on the page, he must check if there is a line that excludes the message from the user - this moves very slowly on the page, which can be filled with 100-200 messages. This can take up to 2-4 seconds, especially when additional restrictions are added to the request.

It also creates extremely large and complex queries that are simply ... inconvenient.

 SELECT t.* FROM posts t WHERE ( (t.privacy_level = 3 AND t.owner_id = ?) OR (t.privacy_level = 4 AND EXISTS ( SELECT i.id FROM PostPrivacyIncludeFor i WHERE i.user_id = ? AND i.thought_id = t.id) OR t.privacy_level = 4 AND t.owner_id = ?) OR (t.privacy_level = 4 AND EXISTS (SELECT i2.id FROM PostPrivacyIncludeFor i2 WHERE i2.thought_id = t.id AND EXISTS (SELECT r.id FROM FriendFilterIds r WHERE r.list_id = i2.list_id AND r.friend_id = ?)) OR t.privacy_level = 4 AND t.owner_id = ?) OR (t.privacy_level = 1 AND EXISTS (SELECT G.id FROM Following G WHERE follower_id = t.owner_id AND following_id = ? AND friend = 1) OR t.privacy_level = 1 AND t.owner_id = ?) OR (NOT EXISTS (SELECT e.id FROM PostPrivacyExcludeFrom e WHERE e.thought_id = t.id AND e.user_id = ? AND NOT EXISTS (SELECT e2.id FROM PostPrivacyExcludeFrom e2 WHERE e2.thought_id = t.id AND EXISTS (SELECT l.id FROM FriendFilterIds l WHERE l.list_id = e2.list_id AND l.friend_id = ?))) AND t.privacy_level IN (0, 1, 4)) AND t.owner_id = ? ORDER BY t.created_at LIMIT 100 

(The query layout is similar to the query I'm using now in Doctrine ORM. It's a mess, but you get what I'm saying.)

I think my question is, how would you approach this situation in order to optimize it? Is there a better way to set up my database? I am ready to completely abandon the method that I just created, but I do not know what to move on.

Thanks guys.

Update: Correct the query to reflect the values โ€‹โ€‹that I defined for the privacy level above (I forgot to update it because I simplified the values)

+4
source share
2 answers

Your query is too long to give a final solution, but the approach I will follow is simply to search for the data, converting the subqueries to joins, and then create the logic in the list of where clauses and select statement column:

 select t.*, i.*, r.*, G.*, e.* from posts t left join PostPrivacyIncludeFor i on i.user_id = ? and i.thought_id = t.id left join FriendFilterIds r on r.list_id = i.list_id and r.friend_id = ? left join Following G on follower_id = t.owner_id and G.following_id = ? and G.friend=1 left join PostPrivacyExcludeFrom e on e.thought_id = t.id and e.user_id = ? 

(This may require an extension: I could not follow the logic of the final sentence.)

If you can get a simple choice that works quickly AND including all the necessary information, then all you need to do is create logic in the selection list and where is the sentence.

+1
source

There would be a quick hit to simplify this, your original design too overhauled.

Using this solution, your web page can now simply call the following stored procedure to get a list of filtered messages for a given user for a specified period.

 call list_user_filtered_posts( <user_id>, <day_interval> ); 

The whole script can be found here: http://pastie.org/1212812

I have not fully tested all of this, and you may find that this solution is not effective enough for your needs, but it can help you fine-tune / modify the existing design.

Tables

Drop your post_privacy_exclude_from table and add a user_stalkers table, which works almost the same as the back of user_friends. We kept the original post_privacy_includes_for table according to your design, as this allows the user to restrict a specific record to a subset of people.

 drop table if exists users; create table users ( user_id int unsigned not null auto_increment primary key, username varbinary(32) unique not null ) engine=innodb; drop table if exists user_friends; create table user_friends ( user_id int unsigned not null, friend_user_id int unsigned not null, primary key (user_id, friend_user_id) ) engine=innodb; drop table if exists user_stalkers; create table user_stalkers ( user_id int unsigned not null, stalker_user_id int unsigned not null, primary key (user_id, stalker_user_id) ) engine=innodb; drop table if exists posts; create table posts ( post_id int unsigned not null auto_increment primary key, user_id int unsigned not null, privacy_level tinyint unsigned not null default 0, post_date datetime not null, key user_idx(user_id), key post_date_user_idx(post_date, user_id) ) engine=innodb; drop table if exists post_privacy_includes_for; create table post_privacy_includes_for ( post_id int unsigned not null, user_id int unsigned not null, primary key (post_id, user_id) ) engine=innodb; 

Saved Procedures

The stored procedure is relatively simple - first it selects ALL records for the specified period, and then filters the messages according to your initial requirements. I have not tested this sproc with large volumes, but since the initial choice is relatively small, it should be strong enough, and also simplify your application / mid-level code.

 drop procedure if exists list_user_filtered_posts; delimiter # create procedure list_user_filtered_posts ( in p_user_id int unsigned, in p_day_interval tinyint unsigned ) proc_main:begin drop temporary table if exists tmp_posts; drop temporary table if exists tmp_priv_posts; -- select ALL posts in the required date range (or whatever selection criteria you require) create temporary table tmp_posts engine=memory select p.post_id, p.user_id, p.privacy_level, 0 as deleted from posts p where p.post_date between now() - interval p_day_interval day and now() order by p.user_id; -- purge stalker posts (0,1,3,4) update tmp_posts inner join user_stalkers us on us.user_id = tmp_posts.user_id and us.stalker_user_id = p_user_id set tmp_posts.deleted = 1 where tmp_posts.user_id != p_user_id; -- purge other users private posts (3) update tmp_posts set deleted = 1 where user_id != p_user_id and privacy_level = 3; -- purge friend only posts (1) ie where p_user_id is not a friend of the poster /* requires another temp table due to mysql temp table problem/bug http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */ -- the private posts (1) this user can see create temporary table tmp_priv_posts engine=memory select tp.post_id from tmp_posts tp inner join user_friends uf on uf.user_id = tp.user_id and uf.friend_user_id = p_user_id where tp.user_id != p_user_id and tp.privacy_level = 1; -- remove private posts this user cant see update tmp_posts left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id set tmp_posts.deleted = 1 where tpp.post_id is null and tmp_posts.privacy_level = 1; -- purge filtered (4) truncate table tmp_priv_posts; -- reuse tmp table insert into tmp_priv_posts select tp.post_id from tmp_posts tp inner join post_privacy_includes_for ppif on tp.post_id = ppif.post_id and ppif.user_id = p_user_id where tp.user_id != p_user_id and tp.privacy_level = 4; -- remove private posts this user cant see update tmp_posts left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id set tmp_posts.deleted = 1 where tpp.post_id is null and tmp_posts.privacy_level = 4; drop temporary table if exists tmp_priv_posts; -- output filtered posts (display ALL of these on web page) select p.* from posts p inner join tmp_posts tp on p.post_id = tp.post_id where tp.deleted = 0 order by p.post_id desc; -- clean up drop temporary table if exists tmp_posts; end proc_main # delimiter ; 

Test Data

Some basic test data.

 insert into users (username) values ('f00'),('bar'),('alpha'),('beta'),('gamma'),('omega'); insert into user_friends values (1,2),(1,3),(1,5), (2,1),(2,3),(2,4), (3,1),(3,2), (4,5), (5,1),(5,4); insert into user_stalkers values (4,1); insert into posts (user_id, privacy_level, post_date) values -- public (0) (1,0,now() - interval 8 day), (1,0,now() - interval 8 day), (2,0,now() - interval 7 day), (2,0,now() - interval 7 day), (3,0,now() - interval 6 day), (4,0,now() - interval 6 day), (5,0,now() - interval 5 day), -- friends only (1) (1,1,now() - interval 5 day), (2,1,now() - interval 4 day), (4,1,now() - interval 4 day), (5,1,now() - interval 3 day), -- private (3) (1,3,now() - interval 3 day), (2,3,now() - interval 2 day), (4,3,now() - interval 2 day), -- filtered (4) (1,4,now() - interval 1 day), (4,4,now() - interval 1 day), (5,4,now()); insert into post_privacy_includes_for values (15,4), (16,1), (17,6); 

Testing

As I mentioned earlier, I did not fully test this, but on the surface it seems to work.

 select * from posts; call list_user_filtered_posts(1,14); call list_user_filtered_posts(6,14); call list_user_filtered_posts(1,7); call list_user_filtered_posts(6,7); 

Hope you find some of these features.

0
source

All Articles