How can I simplify / improve the performance of this MySQL query?

Question

How can I simplify / improve the performance of this MySQL query?

I am very new to MySQL and thanks to the great support from you, more experienced guys that I manage to deal with while learning a lot in this process.

I have a request that does exactly what I want. However, it looks very dirty to me, and I'm sure there should be a way to simplify it.

How can this query be improved and optimized for performance?

Many thanks

$sQuery = " SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))." FROM $sTable b LEFT JOIN ( SELECT COUNT(*) AS projects_count, a.songs_id FROM $sTable2 a GROUP BY a.songs_id ) bb ON bb.songs_id = b.songsID LEFT JOIN ( SELECT AVG(rating) AS rating, COUNT(rating) AS ratings_count, c.songid FROM $sTable3 c GROUP BY c.songid ) bbb ON bbb.songid = b.songsID LEFT JOIN ( SELECT c.songid, c.userid, CASE WHEN EXISTS ( SELECT songid FROM $sTable3 WHERE songid = c.songid ) Then 'User Voted' else ( 'Not Voted' ) end AS voted FROM $sTable3 c WHERE c.userid = $userid GROUP BY c.songid ) bbbb ON bbbb.songid = b.songsID

EDIT: here is a description of what the request does: -

I have three tables:

$ sTable = table of songs (songid, mp3link, artwork, useruploadid, etc.)
$ sTable2 = table of projects with related songs (projectid, songid, project name, etc.)
$ sTable3 = song rating table (songid, userid, rating)

All this data is output to a JSON array and displayed in a table in my application to provide a list of songs in combination with project data and ratings.

The request itself performs the following actions in the following order: -

Collects all rows from $ sTable
Joins $ sTable2 on songID and counts the number of rows (projects) in this table that have the same songID
Joins $ stable3 on songID and returns the average column value in this table that have the same songID
At this point, it also counts the total number of lines in $ sTable3 that have the same song id as the total number of votes.
Finally, it checks all of these lines to see if $ userid (which is a variable containing the registered user ID) matches the "userid" repositories in $ sTable3 for each line to check if the user has already voted for a specific song ID or not . If it matches, it returns “User voted”, if it does not return “Did not vote”. It outputs this as a separate column in my JSON array, which I then check against the client in my application and add the class to.

If you need more details, please just let me know. Thanks to everyone.

EDIT:

Thanks to the excellent first Aurimis attempt, I close a much simpler solution.

This is the code I tried based on this suggestion.

 SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))." FROM (SELECT $sTable.songsID, COUNT(rating) AS ratings_count, AVG(rating) AS ratings FROM $sTable LEFT JOIN $sTable2 ON $sTable.songsID = $sTable2.songs_id LEFT JOIN $sTable3 ON $sTable.songsID = $sTable3.songid GROUP BY $sTable.songsID) AS A LEFT JOIN $sTable3 AS B ON A.songsID = B.songid AND B.userid = $userid

However, there are several problems. I should have deleted the first line of your answer, as this caused an internal server 500 error:

 IF(B.userid = NULL, "Not voted", "User Voted") AS voted

Obviously, the function of the “voted check” is now lost.

Also, more importantly, it does not return all the columns defined in my array, only songID. My JSON returns an Unknown column "event_name" in the "list of fields". If I remove it from the $ aColumns array, it will of course move on to the next.

I define my columns at the beginning of my script since this array is used to filter and assemble the output for JSON encoding. This is the definition of $ aColumns: -

 $aColumns = array( 'songsID', 'song_name', 'artist_band_name', 'author', 'song_artwork', 'song_file', 'genre', 'song_description', 'uploaded_time', 'emotion', 'tempo', 'user', 'happiness', 'instruments', 'similar_artists', 'play_count', 'projects_count', 'rating', 'ratings_count', 'voted');

To quickly test the rest of the query, I changed the first line in the subquery to select $ sTable. *, not $ sTable.songsID (remember that $ sTable is a song table)

Then ... Obviously, the request worked, but with terrible performance, of course. But he returned only 24 songs from the test data set of 5000 songs. So I changed your first "JOIN" to "LEFT JOIN", so all 5,000 songs were returned. To clarify the query, it is necessary to return ALL lines in the composition table, but with various additional bits of data from the tables of projects and ratings for each song.

So ... We get there, and I'm sure this is a much better approach that just needs some modification. Thanks for your help so far Aurimis.

+7

mysql query-optimization

gordyr Nov 29 '11 at 11:56

source share

2 answers

Let me try based on your description, not the request. I just use Songs to indicate Table1 , Projects to indicate Table2 and Ratings to indicate Table3 - for clarity.

 SELECT /* [column list again] */, IF(B.userid = NULL, "Not voted", "Voted") as voted FROM (SELECT Songs.SongID, count(rating) as total_votes, avg(rating) as average_rating /*[,.. other columns as you need them] */ FROM Songs JOIN Projects ON Songs.SongID = Projects.SongID LEFT JOIN Ratings ON Songs.SongID = Ratings.SongID GROUP BY Songs.SongID) as A LEFT JOIN Ratings as B ON A.SongID = B.SongID AND B.userid = ? /* your user id */

As you can see, you can get all the information about the songs in one relatively simple request (just using the functions Group by and count () / avg ()). To get information about whether a song has been rated by a particular user, a subquery is required - where can you make a LEFT JOIN, and if the user ID is empty, you know that he did not vote.

Now I did not go deep into your request, since it really looks complicated. Maybe I missed something - if so, please update the description and I can try again :)

+1

Aurimas Nov 30 '11 at 9:11

source share

newtover · Accepted Answer · 2011-12-10T20:43:01+0000

 SELECT SQL_CALC_FOUND_ROWS songsID, song_name, artist_band_name, author, song_artwork, song_file, genre, song_description, uploaded_time, emotion, tempo, `user`, happiness, instruments, similar_artists, play_count, projects_count, rating, ratings_count, IF(user_ratings_count, 'User Voted', 'Not Voted') as voted FROM ( SELECT sp.songsID, projects_count, AVG(rating) as rating, COUNT(rating) AS ratings_count, COUNT(IF(userid=$userid, 1, NULL)) as user_ratings_count FROM ( SELECT songsID, COUNT(*) as projects_count FROM $sTable s LEFT JOIN $sTable2 p ON s.songsID = p.songs_id GROUP BY songsID) as sp LEFT JOIN $sTable3 r ON sp.songsID = r.songid GROUP BY sp.songsID) as spr JOIN $sTable s USING (songsID);

You will need the following indexes:

(songs_id) on $ sTable2
composite (songid, rating, userid) on $ sTable3

Ideas underlying the request:

subqueries work with INT, so the result of the subquery fits easily into memory
left compounds are grouped separately to reduce the Cartesian product
user votes are counted in the same subquery as other ratings to avoid expensive correlated subqueries
all information is retrieved ib final connection

How can I simplify / improve the performance of this MySQL query?

More articles: