Best query strategy for sorting files by file hash frequency and file size

Question

Best query strategy for sorting files by file hash frequency and file size

I wrote this request without much thought, but as a beginner, I am pretty sure that it could be written better.

Here he is:

SELECT filehash, filename, filesize, group_files FROM files INNER JOIN ( SELECT filehash group_id, COUNT(filehash) group_files FROM files GROUP BY filehash) groups ON files.filehash = groups.group_id ORDER BY group_files DESC, filesize DESC

Table definition:

 CREATE TABLE files (fileid INTEGER PRIMARY KEY AUTOINCREMENT, filename TEXT, filesize INTEGER, filehash TEXT)

Definition of indices:

 CREATE INDEX files_filehash_idx ON files(filehash) CREATE UNIQUE INDEX files_filename_idx ON files(filename) CREATE INDEX files_filesize_idx ON files(filesize)

Query EXPLAIN QUERY PLAN:

 selectid order from detail 1 0 0 SCAN TABLE files USING COVERING INDEX files_filehash_idx (~1000000 rows) 0 0 1 SCAN SUBQUERY 1 AS groups (~100 rows) 0 1 0 SEARCH TABLE files USING INDEX files_filehash_idx (filehash=?) (~10 rows) 0 0 0 USE TEMP B-TREE FOR ORDER BY

Could you correct me if I am wrong? Thank you in advance.

+1

sqlite query-optimization

Paulo freitas Mar 05 '11 at 20:27

source share

2 answers

Nope. He is looking at me.

I do not think you need an index for the file name for this query. There are plans where a file size index will be useful, but MySQL does not use them for this. You might be better off replacing two separate indexes with a composite index (filehash, filesize). Or you can’t!

0

Tom anderson Mar 05 '11 at 10:58

source share

Kragen javier sitaker · Accepted Answer · 2011-03-15T06:00:58+0000

What do you think of this version?

  select filehash, group_concat(filename), filesize, count(*) as group_files from files group by filehash order by group_files desc

It seems that this is likely to work faster. Does he do what you need?

Best query strategy for sorting files by file hash frequency and file size

More articles: