Best query strategy for sorting files by file hash frequency and file size

I wrote this request without much thought, but as a beginner, I am pretty sure that it could be written better.

Here he is:

SELECT filehash, filename, filesize, group_files FROM files INNER JOIN ( SELECT filehash group_id, COUNT(filehash) group_files FROM files GROUP BY filehash) groups ON files.filehash = groups.group_id ORDER BY group_files DESC, filesize DESC 

Table definition:

 CREATE TABLE files (fileid INTEGER PRIMARY KEY AUTOINCREMENT, filename TEXT, filesize INTEGER, filehash TEXT) 

Definition of indices:

 CREATE INDEX files_filehash_idx ON files(filehash) CREATE UNIQUE INDEX files_filename_idx ON files(filename) CREATE INDEX files_filesize_idx ON files(filesize) 

Query EXPLAIN QUERY PLAN:

 selectid order from detail 1 0 0 SCAN TABLE files USING COVERING INDEX files_filehash_idx (~1000000 rows) 0 0 1 SCAN SUBQUERY 1 AS groups (~100 rows) 0 1 0 SEARCH TABLE files USING INDEX files_filehash_idx (filehash=?) (~10 rows) 0 0 0 USE TEMP B-TREE FOR ORDER BY 

Could you correct me if I am wrong? Thank you in advance.

+1
source share
2 answers

What do you think of this version?

  select filehash, group_concat(filename), filesize, count(*) as group_files from files group by filehash order by group_files desc 

It seems that this is likely to work faster. Does he do what you need?

+1
source

Nope. He is looking at me.

I do not think you need an index for the file name for this query. There are plans where a file size index will be useful, but MySQL does not use them for this. You might be better off replacing two separate indexes with a composite index (filehash, filesize). Or you can’t!

0
source

All Articles