What is the most efficient way to structure a two-dimensional MySQL query?

I have a MySQL database with the following tables and fields:

  • Student (id)
  • Class (id)
  • Class (id, student_id, class_id, grade)

Student and class tables are indexed by id (primary keys). The grading table is indexed by id (primary key) and student_id, class_id and class.

I need to build a query that, given the class identifier, gives a list of all the other classes and the number of students who have scored more in this other class.

Essentially, given the following data in the grading table:

student_id | class_id | grade -------------------------------------- 1 | 1 | 87 1 | 2 | 91 1 | 3 | 75 2 | 1 | 68 2 | 2 | 95 2 | 3 | 84 3 | 1 | 76 3 | 2 | 88 3 | 3 | 71 

A request with a class 1 identifier should give:

 class_id | total ------------------- 2 | 3 3 | 1 

Ideally, I would like it to be completed in a few seconds, since I would like it to be part of the web interface.

The problem is that in my database I have over 1300 classes and 160,000 students. My class table has almost 15 million rows and, as such, the query takes a lot of time.

Here is what I tried so far along with the time when each request took up:

 -- I manually stopped execution after 2 hours SELECT c.id, COUNT(*) AS total FROM classes c INNER JOIN grades a ON a.class_id = c.id INNER JOIN grades b ON b.grade < a.grade AND a.student_id = b.student_id AND b.class_id = 1 WHERE c.id != 1 AND GROUP BY c.id -- I manually stopped execution after 20 minutes SELECT c.id, ( SELECT COUNT(*) FROM grades g WHERE g.class_id = c.id AND g.grade > ( SELECT grade FROM grades WHERE student_id = g.student_id AND class_id = 1 ) ) AS total FROM classes c WHERE c.id != 1; -- 1 min 12 sec CREATE TEMPORARY TABLE temp_blah (student_id INT(11) PRIMARY KEY, grade INT); INSERT INTO temp_blah SELECT student_id, grade FROM grades WHERE class_id = 1; SELECT o.id, ( SELECT COUNT(*) FROM grades g INNER JOIN temp_blah t ON g.student_id = t.student_id WHERE g.class_id = c.id AND t.grade < g.grade ) AS total FROM classes c WHERE c.id != 1; -- Same thing but with joins instead of a subquery - 1 min 54 sec SELECT c.id, COUNT(*) AS total FROM classes c INNER JOIN grades g ON c.id = p.class_id INNER JOIN temp_blah t ON g.student_id = t.student_id WHERE c.id != 1 GROUP BY c.id; 

I also considered the possibility of creating a 2D table with students as rows and classes as columns, however I see two problems with this:

  • MySQL implements the maximum number of columns (4096) and the maximum row size (in bytes) that can be exceeded with this approach
  • I can't think of a good way to query this structure to get the results I need.

I also considered the possibility of performing these calculations as background tasks and storing the results somewhere, but in order for the information to remain current (it should), they would have to be recalculated every time a student record, class was created or updated or class.

Does anyone know a more efficient way to build this query?

EDIT: creating table statements:

 CREATE TABLE `classes` ( `id` int(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1331 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$ CREATE TABLE `students` ( `id` int(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=160803 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$ CREATE TABLE `grades` ( `id` int(11) NOT NULL AUTO_INCREMENT, `student_id` int(11) DEFAULT NULL, `class_id` int(11) DEFAULT NULL, `grade` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `index_grades_on_student_id` (`student_id`), KEY `index_grades_on_class_id` (`class_id`), KEY `index_grades_on_grade` (`grade`) ) ENGINE=InnoDB AUTO_INCREMENT=15507698 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$ 

Conclusion of an explanation for the most effective query (1 min. 12 sec.):

 id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | PRIMARY | c | range | PRIMARY | PRIMARY | 4 | | 683 | Using where; Using index 2 | DEPENDENT SUBQUERY | g | ref | index_grades_on_student_id,index_grades_on_class_id,index_grades_on_grade | index_grades_on_class_id | 5 | mydb.c.id | 830393 | Using where 2 | DEPENDENT SUBQUERY | t | eq_ref | PRIMARY | PRIMARY | 4 | mydb.g.student_id | 1 | Using where 

Other editing - explain the output for the sgeddes clause:

 +----+-------------+------------+--------+---------------+------+---------+------+----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+------+---------+------+----------+----------------------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 14953992 | Using where; Using temporary; Using filesort | | 2 | DERIVED | <derived3> | system | NULL | NULL | NULL | NULL | 1 | Using filesort | | 2 | DERIVED | G | ALL | NULL | NULL | NULL | NULL | 15115388 | | | 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used | +----+-------------+------------+--------+---------------+------+---------+------+----------+----------------------------------------------+ 
+4
source share
2 answers

I think this should work for you using SUM and CASE :

 SELECT C.Id, SUM( CASE WHEN G.Grade > C2.Grade THEN 1 ELSE 0 END ) FROM Class C INNER JOIN Grade G ON C.Id = G.Class_Id LEFT JOIN ( SELECT Grade, Student_Id, Class_Id FROM Class JOIN Grade ON Class.Id = Grade.Class_Id WHERE Class.Id = 1 ) C2 ON G.Student_Id = C2.Student_Id WHERE C.Id <> 1 GROUP BY C.Id 

Script Demo Example

- EDIT -

In response to your comment, here is another try, which should be much faster:

 SELECT Class_Id, SUM(CASE WHEN Grade > minGrade THEN 1 ELSE 0 END) FROM ( SELECT Student_Id, @classToCheck:= IF(G.Class_Id = 1, Grade, @classToCheck) minGrade , Class_Id, Grade FROM Grade G JOIN (SELECT @classToCheck:= 0) t ORDER BY Student_Id, IF(Class_Id = 1, 0, 1) ) t WHERE Class_Id <> 1 GROUP BY Class_ID 

And more script .

+3
source

Can you try to try the original data! This is only one connection :)

 select final.class_id, count(*) as total from ( select * from (select student_id as p_student_id, grade as p_grade from table1 where class_id = 1) as partial inner join table1 on table1.student_id = partial.p_student_id where table1.class_id <> 1 and table1.grade > partial.p_grade ) as final group by final.class_id; 

sqlfiddle link

0
source

All Articles