MySql - aggregation function to select first choice, second choice, third choice, etc.?

Suppose I have the following data in a table called "messages":

message_id | language_id | message ------------------------------------ 1 en Hello 1 de Hallo 1 es Hola 2 en Goodbye 2 es Adios 

(Note that I do not have a German translation for "Goodbye.")

I want to select messages for a user who speaks English and German, but prefers German.

Meaning, I want the result set to look like this:

 message_id | language_id | message ------------------------------------ 1 de Hallo 2 en Goodbye 

But, um, this is complicated. Any ideas?

+7
mysql
source share
8 answers

The fastest solution I found that gives the results I need is described in this article:

http://onlamp.com/pub/a/mysql/2007/03/29/emulating-analytic-aka-ranking-functions-with-mysql.html

-2
source share
 select message_id, language_id, message from (select if(language_id="de",0,1) as choice, m.* from messages m where m.language_id in ("de","en") order by choice) z group by message_id 

Customize your preferences with the "if" in the selection to force the preferred language at the top of the result set, so group by will select it.

You can also do this, but the answer above is probably more accurate for what you want to use it for.

 select * from messages m where m.language_id = "de" or (language_id = "en" and not exists (select 1 from messages n where n.language_id = "de" and n.message_id = m.message_id)) 

In addition to your comments. If you are not comfortable using the specific MySQL GROUP BY behavior (without aggregate functions), you can use this more standard code:

 select * from messages m where m.language_id in ("de","en") and if(m.language_id="de",0,1) <= (select min(if(n.language_id="de",0,1)) from messages n where n.message_id = m.message_id) 
+2
source share

This query will do exactly what you need:

 SELECT * FROM ( SELECT * FROM messages WHERE language_id IN ('en', 'de') ORDER BY FIELD(language_id, 'en', 'de') DESC ) m GROUP BY message_id; 

Languages ​​in FIELD(language_id, 'en', 'de') should be ordered by priority: the last ("de" in this case) will have a higher priority, then "en", then all the others.

Sentence

WHERE optional here and is required only if you do not want any results in cases where there is no translation for either "en" or "de".

Edit: Sean mentioned that the GROUP BY clause for non-aggregate columns can lead to unreliable results. This may be true, at least MySQL Manual says so (although in practice the first matching line is always (?) Used).

Anyway, there is another request with the same idea, but without the indicated problem.

 SELECT m1.* FROM messages AS m1 INNER JOIN ( SELECT message_id, MAX(FIELD(language_id, 'en', 'de')) AS weight FROM messages WHERE language_id IN ('en', 'de') GROUP BY message_id ) AS m2 USING(message_id) WHERE FIELD(m1.language_id, 'en', 'de') = m2.weight; 
+2
source share

Here is one possible solution:

First I just set up your tables:

 DROP TEMPORARY TABLE IF EXISTS messages; CREATE TEMPORARY TABLE messages ( message_id INT, language_id INT, message VARCHAR(64) ); INSERT INTO messages VALUES (1, 1, "Hello"), (1, 2, "Hellode"), (1, 3, "Hola"), (2, 1, "Goodbye"), (2, 3, "Adios"); 

And adds a new preference for the language:

 DROP TEMPORARY TABLE IF EXISTS user_language_preference; CREATE TEMPORARY TABLE user_language_preference ( user_id INT, language_id INT, preference INT ); INSERT INTO user_language_preference VALUES (1, 1, 10), # know english (1, 2, 100); # but prefers 'de' 

And requests ..

Hello:

 SET @user_id = 1; SET @message_id = 1; # Returns 'Hellode', 'Hello' SELECT m.language_id, message FROM messages AS m, user_language_preference AS l WHERE message_id=@message_id AND m.language_id=l.language_id AND user_id=@user_id ORDER BY preference DESC; 

Goodbye:

 SET @message_id = 2; # Returns 'Goodbye' as 'de' doesn't have a message there SELECT m.language_id, message FROM messages AS m, user_language_preference AS l WHERE message_id=@message_id AND m.language_id=l.language_id AND user_id=@user_id ORDER BY preference DESC; 

Edit: In response to the comment:

 SELECT m.message_id, m.language_id, message FROM messages AS m, user_language_preference AS l WHERE m.language_id=l.language_id AND user_id=@user_id ORDER BY m.message_id, preference DESC; 
0
source share

Use the group-concat trick to get this in a single request:

 select message_id, substring(max(concat(if(language_id='de', 9, if(language_id='en',8,0)), message)),2) as message, substring(max(concat(if(language_id='de', 9, if(language_id='en',8,0)), language_id)),2) as language from messages group by message_id; 

just add conditions and corresponding priorities in the IF clauses to add additional fallback languages.

0
source share
 SELECT * FROM messages WHERE (message_id,CASE language_id WHEN 'de' THEN 1 WHEN 'en' THEN 2 ELSE NULL END) IN ( SELECT message_id, MIN(CASE language_id WHEN 'de' THEN 1 WHEN 'en' THEN 2 ELSE NULL END) pref_language_id FROM `messages` GROUP BY message_id ) 

You must change CASE language_id WHEN 'de' THEN 1 WHEN 'en' THEN 2 ELSE NULL END for the preferred user languages ​​(s). If he has a third, just add another case, for example. CASE language_id WHEN 'de' THEN 1 WHEN 'en' THEN 2 WHEN "THIS" 3 ELSE NULL END .

0
source share

This is a good example for a group maximum query. http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

Here is what I came up with. Using the same data and schema as simendsjo.

 SELECT prefered.message_id, p2.language_id, message FROM (SELECT message_id, MAX(preference) AS prefered FROM messages m JOIN user_language_preference p ON p.language_id = m.language_id AND p.user_id = 1 GROUP BY m.message_id) AS prefered JOIN user_language_preference p2 ON prefered = p2.preference AND p2.user_id = 1 JOIN messages m2 ON p2.language_id = m2.language_id AND m2.message_id = prefered.message_id 

Here's how it works.

  • The prefered internal query selects all messages, attaches them to user language preferences, calculating the maximum preference for each message ( GROUP BY m.messsage id ). If there is now a translation, the maximum will be for the next preferred language, and so on ...
  • An external request consists of two associations: The first connection receives the language identifier from the maximum preference ( MAX(preference) = prefered = p2.preference ) for this user.
  • The last m2 connection just selects a translation for the well-known preferred languages ​​id and message_id.

PS. Remember to change both occurrences of user_id.

0
source share

Edited to add alternative solutions appropriate to the nature of the issue .: D
(FWIW: second choice was my first implementation)

First choice

This one should be able to provide better performance, although a little harder to follow.
More importantly, however, it best includes 4, 5, 6, etc. Languages. The solution requires a temporary table that determines the priority of languages ​​(it is better to use any method in mysql).
The solution meat is in the search subquery; once he has determined the best available language to choose, a simple connection to receive the actual messages.

 declare @prio table (prio_id int, lid varchar(5)) insert into @prio values(1, 'de') insert into @prio values(2, 'en') insert into @prio values(3, 'es') select m.* from ( select message_id, MIN(prio_id) prio_id from @messages m inner join @Prio p on p.lid = m.language_id group by message_id ) finder inner join @Prio p on p.prio_id = finder.prio_id inner join @messages m on m.message_id = finder.message_id and m.language_id = p.lid 

Second choice

The following query structure should be simple enough to follow. Each union adds a message identifier to the result set not yet in the result set.
UNION ALL is sufficient because each subsequent request does not guarantee duplication.
An index on (language_id, message_id) should offer better performance (especially if it's grouped).

 select message_id, language_id, message from messages where language_id = 'de' union all select message_id, language_id, message from messages where language_id = 'en' and message_id not in (select message_id from messages where language_id in ('de')) union all select message_id, language_id, message from messages where language_id = 'es' and message_id not in (select message_id from messages where language_id in ('de', 'en')) 

Third choice

This is an interpolation using the COALESCE function.
However, I do not expect it to work well on large amounts of data.

 select *, COALESCE( (select language_id from @messages where message_id = m.message_id and language_id = 'de'), (select language_id from @messages where message_id = m.message_id and language_id = 'en'), (select language_id from @messages where message_id = m.message_id and language_id = 'es') ) language_id, COALESCE( (select message from @messages where message_id = m.message_id and language_id = 'de'), (select message from @messages where message_id = m.message_id and language_id = 'en'), (select message from @messages where message_id = m.message_id and language_id = 'es') ) message from ( select distinct message_id from @messages ) m 
0
source share

All Articles