I have the same problem with tables> 300 million records and an indexed field with several separate values. I could not get rid of the seq scan, so I created this function to simulate a different search using the index, if one exists. If your table has several different values proportional to the total number of records, this function is not suitable. It must also be adjusted for different column values. Warning This feature is widely open for SQL injection and should only be used in a secure environment.
Explain the results of the analysis:
Query with regular SELECT DISTINCT: Total runtime: 598310.705 ms
Query with SELECT small_distinct (...): total execution time: 1.156 ms
CREATE OR REPLACE FUNCTION small_distinct( tableName varchar, fieldName varchar, sample anyelement = ''::varchar) -- Search a few distinct values in a possibly huge table -- Parameters: tableName or query expression, fieldName, -- sample: any value to specify result type (defaut is varchar) -- Author: T.Husson, 2012-09-17, distribute/use freely RETURNS TABLE ( result anyelement ) AS $BODY$ BEGIN EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName ||' LIMIT 1' INTO result; WHILE result IS NOT NULL LOOP RETURN NEXT; EXECUTE 'SELECT '||fieldName||' FROM '||tableName ||' WHERE '||fieldName||' > $1 ORDER BY ' || fieldName || ' LIMIT 1' INTO result USING result; END LOOP; END; $BODY$ LANGUAGE plpgsql VOLATILE;
Sample Calls:
SELECT small_distinct('observations','id_source',1); SELECT small_distinct('(select * from obs where id_obs > 12345) as temp', 'date_valid','2000-01-01'::timestamp); SELECT small_distinct('addresses','state');
Le droid
source share