I wrote this request without much thought, but as a beginner, I am pretty sure that it could be written better.
Here he is:
SELECT filehash, filename, filesize, group_files FROM files INNER JOIN ( SELECT filehash group_id, COUNT(filehash) group_files FROM files GROUP BY filehash) groups ON files.filehash = groups.group_id ORDER BY group_files DESC, filesize DESC
Table definition:
CREATE TABLE files (fileid INTEGER PRIMARY KEY AUTOINCREMENT, filename TEXT, filesize INTEGER, filehash TEXT)
Definition of indices:
CREATE INDEX files_filehash_idx ON files(filehash) CREATE UNIQUE INDEX files_filename_idx ON files(filename) CREATE INDEX files_filesize_idx ON files(filesize)
Query EXPLAIN QUERY PLAN:
selectid order from detail 1 0 0 SCAN TABLE files USING COVERING INDEX files_filehash_idx (~1000000 rows) 0 0 1 SCAN SUBQUERY 1 AS groups (~100 rows) 0 1 0 SEARCH TABLE files USING INDEX files_filehash_idx (filehash=?) (~10 rows) 0 0 0 USE TEMP B-TREE FOR ORDER BY
Could you correct me if I am wrong? Thank you in advance.
Paulo freitas
source share