Slow single post request in postgres

I make the following two queries quite often in a table that essentially collects registration information. Both select different values ​​from a huge number of lines, but with less than 10 different values ​​in them.

I analyzed the "different" requests made by the page:

marchena=> explain select distinct auditrecor0_.bundle_id as col_0_0_ from audit_records auditrecor0_; QUERY PLAN ---------------------------------------------------------------------------------------------- HashAggregate (cost=1070734.05..1070734.11 rows=6 width=21) -> Seq Scan on audit_records auditrecor0_ (cost=0.00..1023050.24 rows=19073524 width=21) (2 rows) marchena=> explain select distinct auditrecor0_.server_name as col_0_0_ from audit_records auditrecor0_; QUERY PLAN ---------------------------------------------------------------------------------------------- HashAggregate (cost=1070735.34..1070735.39 rows=5 width=13) -> Seq Scan on audit_records auditrecor0_ (cost=0.00..1023051.47 rows=19073547 width=13) (2 rows) 

Both perform column sequence scans. However, if I disable enable_seqscan (dispate name, this only disables the execution of sequence scanning on columns with indexes), the query uses the index, but even slower:

 marchena=> set enable_seqscan = off; SET marchena=> explain select distinct auditrecor0_.bundle_id as col_0_0_ from audit_records auditrecor0_; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Unique (cost=0.00..19613740.62 rows=6 width=21) -> Index Scan using audit_bundle_idx on audit_records auditrecor0_ (cost=0.00..19566056.69 rows=19073570 width=21) (2 rows) marchena=> explain select distinct auditrecor0_.server_name as col_0_0_ from audit_records auditrecor0_; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Unique (cost=0.00..45851449.96 rows=5 width=13) -> Index Scan using audit_server_idx on audit_records auditrecor0_ (cost=0.00..45803766.04 rows=19073570 width=13) (2 rows) 

Both bundle_id and server_name columns have btree indexes on them, should I use a different index type to quickly select individual values?

+8
postgresql
source share
4 answers
 BEGIN; CREATE TABLE dist ( x INTEGER NOT NULL ); INSERT INTO dist SELECT random()*50 FROM generate_series( 1, 5000000 ); COMMIT; CREATE INDEX dist_x ON dist(x); VACUUM ANALYZE dist; EXPLAIN ANALYZE SELECT DISTINCT x FROM dist; HashAggregate (cost=84624.00..84624.51 rows=51 width=4) (actual time=1840.141..1840.153 rows=51 loops=1) -> Seq Scan on dist (cost=0.00..72124.00 rows=5000000 width=4) (actual time=0.003..573.819 rows=5000000 loops=1) Total runtime: 1848.060 ms 

PG cannot (yet) use an index for individual ones (skip the same values), but you can do this:

 CREATE OR REPLACE FUNCTION distinct_skip_foo() RETURNS SETOF INTEGER LANGUAGE plpgsql STABLE AS $$ DECLARE _x INTEGER; BEGIN _x := min(x) FROM dist; WHILE _x IS NOT NULL LOOP RETURN NEXT _x; _x := min(x) FROM dist WHERE x > _x; END LOOP; END; $$ ; EXPLAIN ANALYZE SELECT * FROM distinct_skip_foo(); Function Scan on distinct_skip_foo (cost=0.00..260.00 rows=1000 width=4) (actual time=1.629..1.635 rows=51 loops=1) Total runtime: 1.652 ms 
+15
source share

You select different values ​​from the entire table, which automatically leads to a seq scan. You have millions of lines, so this is bound to be slow.

There is a trick to get individual values ​​faster, but it only works when the data has a known (and fairly small) set of possible values. For example, I suppose your bundle_id is referencing some kind of package table that is smaller. This means that you can write:

 select bundles.bundle_id from bundles where exists ( select 1 from audit_records where audit_records.bundle_id = bundles.bundle_id ); 

This should result in a nested / seq loop check in packages -> scanning indexes on audit_records using an index on bundle_id.

+7
source share

I have the same problem with tables> 300 million records and an indexed field with several separate values. I could not get rid of the seq scan, so I created this function to simulate a different search using the index, if one exists. If your table has several different values ​​proportional to the total number of records, this function is not suitable. It must also be adjusted for different column values. Warning This feature is widely open for SQL injection and should only be used in a secure environment.

Explain the results of the analysis:
Query with regular SELECT DISTINCT: Total runtime: 598310.705 ms
Query with SELECT small_distinct (...): total execution time: 1.156 ms

 CREATE OR REPLACE FUNCTION small_distinct( tableName varchar, fieldName varchar, sample anyelement = ''::varchar) -- Search a few distinct values in a possibly huge table -- Parameters: tableName or query expression, fieldName, -- sample: any value to specify result type (defaut is varchar) -- Author: T.Husson, 2012-09-17, distribute/use freely RETURNS TABLE ( result anyelement ) AS $BODY$ BEGIN EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName ||' LIMIT 1' INTO result; WHILE result IS NOT NULL LOOP RETURN NEXT; EXECUTE 'SELECT '||fieldName||' FROM '||tableName ||' WHERE '||fieldName||' > $1 ORDER BY ' || fieldName || ' LIMIT 1' INTO result USING result; END LOOP; END; $BODY$ LANGUAGE plpgsql VOLATILE; 

Sample Calls:

 SELECT small_distinct('observations','id_source',1); SELECT small_distinct('(select * from obs where id_obs > 12345) as temp', 'date_valid','2000-01-01'::timestamp); SELECT small_distinct('addresses','state'); 
+4
source share

In PostgreSQL 9.3, starting with a response from Denis:

  select bundles.bundle_id from bundles where exists ( select 1 from audit_records where audit_records.bundle_id = bundles.bundle_id ); 

just adding “limit 1” to the subquery, I got 60x acceleration (for my use case, with 8 million records, a composite index and 10k combinations), starting from 1800 ms to 30 ms:

  select bundles.bundle_id from bundles where exists ( select 1 from audit_records where audit_records.bundle_id = bundles.bundle_id limit 1 ); 
+1
source share

All Articles