The most efficient method for determining whether a list of values ​​from one to many relationships is fully consistent (MySQL)

I have a one-to-many relationship between rooms and their inhabitants:

Room | User 1 | 1 1 | 2 1 | 4 2 | 1 2 | 2 2 | 3 2 | 5 3 | 1 3 | 3 

Given a list of users, for example. 1, 3, what is the most effective way to determine which room is completely / completely filled with them? Therefore, in this case, he must return to room 3, because although they are both in room 2, there are other tenants in room 2, which is not "ideal."

I can come up with several solutions for this, but am not sure about the effectiveness. For example, I can perform a group concatenation of a user (sorted in ascending order) by room grouping, which will give me strings separated by commas, such as "1,2,4", "1,2,3,5" and "1", 3 ". Then I can sort the input list in ascending order and find the perfect match with" 1,3 ".

Or I can make a calculation of the total number of users in room AND, containing both users 1 and 3. Then I will select a room in which the number of users is two.

Note. I want the most efficient way, or at least the way that scales to millions of users and rooms. Each issue will have about 25 users. Another thing I want to consider is to pass this list to the database. Should I build the query by combining AND userid = 1 AND userid = 3 AND userid = 5 etc.? Or is there a way to pass values ​​as an array into a stored procedure?

Any help would be appreciated.

+5
source share
2 answers

For example, I can make a concatenate group for a user (ascending) grouping by room, which will give me strings separated by commas, such as "1,2,4", "1,2,3,5" and "1,3 " Then I can sort the list of input data in ascending order and find the perfect match with "1,3".

Firstly, a word of advice to improve your level of function as a developer. Stop thinking about data and decision in terms of CSV. This limits your thinking in spreadsheets and prevents you from thinking in terms of relational data. You do not need to create rows, and then match the rows, when the data is in the database, you can match them there.

Decision

Now, in terms of relational data, what exactly do you want? You want the numbers in which the number of users that match your user list of your argument are the highest. It's right? If so, the code is simple.

You did not specify a table. I assume room, user, room_user, with deadly ids in the first two, and a complex key in the third. I can give you an SQL solution, you will need to decide how to do this in non-SQL.

Another thing I want to consider is to pass this list to the database. Should I build the query by combining AND userid = 1 AND userid = 3 AND userid = 5 etc.? Or is there a way to pass values ​​as an array into a stored procedure?

  • To pass a list to a stored process, since it needs one parm caller whose length is variable, you need to create a list of CSV users. Let me call it parm @user_list. (A note that does not consider data that passes the list to a process in one batch, because you cannot pass an unknown number of authenticated users to proc otherwise.)

  • Since you built @user_list on the client, you can also calculate @user_count (the number of members in the list) while you are on it, on the client, and pass this to the process.

Sort of:

 CREATE PROC room_user_match_sp ( @user_list CHAR(255), @user_count INT ... ) AS -- validate parms, etc ... SELECT room_id, match_count, match_count / @user_count * 100 AS match_pct FROM ( SELECT room_id, COUNT(user_id) AS match_count -- no of users matched FROM room_user WHERE user_id IN ( @user_list ) GROUP BY room_id -- get one row per room ) AS match_room -- has any matched users WHERE match_count = MAX( match_count ) -- remove this while testing 

It is not clear if you want only complete matches. In this case, use:

  WHERE match_count = @user_count 

Expectation

You asked for a proc based solution, so I gave it. Yes, this is the fastest. But keep in mind that for such a requirement and solution, you could build the SQL string on the client and execute it on the "server" in the usual way without using proc. The process is faster here only because the code is compiled and this step is deleted, in contrast to this step, which is performed every time the client calls the "server" with the SQL string.

The point I am doing here, with data in a reasonably relational form, you can get the result you are looking for using a single SELECT , you should not mess around with worksheets or temporary tables or intermediate steps that require proc. Here proc is not required; you run proc for performance reasons.

I am doing this because it is clear from your question that your pending decision is “gee, I can’t get the result directly, I work with the data first, I’m ready and ready to do it”, Such intermediate work steps are required only when the data is not are relational.

+2
source

Perhaps not the most efficient SQL, but something like:

 SELECT x.room_id, SUM(x.occupants) AS occupants, SUM(x.selectees) AS selectees, SUM(x.selectees) / SUM(x.occupants) as percentage FROM ( SELECT room_id, COUNT(user_id) AS occupants, NULL AS selectees FROM Rooms GROUP BY room_id UNION SELECT room_id, NULL AS occupants, COUNT(user_id) AS selectees FROM Rooms WHERE user_id IN (1,3) GROUP BY room_id ) x GROUP BY x.room_id ORDER BY percentage DESC 

will provide you with a list of rooms ordered by the percentage of "best fit"

That is, it performs a percentage of completion based on the number of people in the room and the number of people in your set who are in the room.

0
source

All Articles