Slow single post request in postgres

Question

Slow single post request in postgres

I make the following two queries quite often in a table that essentially collects registration information. Both select different values from a huge number of lines, but with less than 10 different values in them.

I analyzed the "different" requests made by the page:

marchena=> explain select distinct auditrecor0_.bundle_id as col_0_0_ from audit_records auditrecor0_; QUERY PLAN ---------------------------------------------------------------------------------------------- HashAggregate (cost=1070734.05..1070734.11 rows=6 width=21) -> Seq Scan on audit_records auditrecor0_ (cost=0.00..1023050.24 rows=19073524 width=21) (2 rows) marchena=> explain select distinct auditrecor0_.server_name as col_0_0_ from audit_records auditrecor0_; QUERY PLAN ---------------------------------------------------------------------------------------------- HashAggregate (cost=1070735.34..1070735.39 rows=5 width=13) -> Seq Scan on audit_records auditrecor0_ (cost=0.00..1023051.47 rows=19073547 width=13) (2 rows)

Both perform column sequence scans. However, if I disable enable_seqscan (dispate name, this only disables the execution of sequence scanning on columns with indexes), the query uses the index, but even slower:

 marchena=> set enable_seqscan = off; SET marchena=> explain select distinct auditrecor0_.bundle_id as col_0_0_ from audit_records auditrecor0_; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Unique (cost=0.00..19613740.62 rows=6 width=21) -> Index Scan using audit_bundle_idx on audit_records auditrecor0_ (cost=0.00..19566056.69 rows=19073570 width=21) (2 rows) marchena=> explain select distinct auditrecor0_.server_name as col_0_0_ from audit_records auditrecor0_; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Unique (cost=0.00..45851449.96 rows=5 width=13) -> Index Scan using audit_server_idx on audit_records auditrecor0_ (cost=0.00..45803766.04 rows=19073570 width=13) (2 rows)

Both bundle_id and server_name columns have btree indexes on them, should I use a different index type to quickly select individual values?

+8

postgresql

Sindri traustason May 17 '11 at 14:59

source share

4 answers

peufeu · Answer 1 · 2011-05-18T08:47:27+0000

 BEGIN; CREATE TABLE dist ( x INTEGER NOT NULL ); INSERT INTO dist SELECT random()*50 FROM generate_series( 1, 5000000 ); COMMIT; CREATE INDEX dist_x ON dist(x); VACUUM ANALYZE dist; EXPLAIN ANALYZE SELECT DISTINCT x FROM dist; HashAggregate (cost=84624.00..84624.51 rows=51 width=4) (actual time=1840.141..1840.153 rows=51 loops=1) -> Seq Scan on dist (cost=0.00..72124.00 rows=5000000 width=4) (actual time=0.003..573.819 rows=5000000 loops=1) Total runtime: 1848.060 ms

PG cannot (yet) use an index for individual ones (skip the same values), but you can do this:

 CREATE OR REPLACE FUNCTION distinct_skip_foo() RETURNS SETOF INTEGER LANGUAGE plpgsql STABLE AS $$ DECLARE _x INTEGER; BEGIN _x := min(x) FROM dist; WHILE _x IS NOT NULL LOOP RETURN NEXT _x; _x := min(x) FROM dist WHERE x > _x; END LOOP; END; $$ ; EXPLAIN ANALYZE SELECT * FROM distinct_skip_foo(); Function Scan on distinct_skip_foo (cost=0.00..260.00 rows=1000 width=4) (actual time=1.629..1.635 rows=51 loops=1) Total runtime: 1.652 ms

Denis de bernardy · Answer 2 · 2011-05-17T16:21:39+0000

You select different values from the entire table, which automatically leads to a seq scan. You have millions of lines, so this is bound to be slow.

There is a trick to get individual values faster, but it only works when the data has a known (and fairly small) set of possible values. For example, I suppose your bundle_id is referencing some kind of package table that is smaller. This means that you can write:

 select bundles.bundle_id from bundles where exists ( select 1 from audit_records where audit_records.bundle_id = bundles.bundle_id );

This should result in a nested / seq loop check in packages -> scanning indexes on audit_records using an index on bundle_id.

Le droid · Answer 3 · 2012-09-17T22:39:42+0000

I have the same problem with tables> 300 million records and an indexed field with several separate values. I could not get rid of the seq scan, so I created this function to simulate a different search using the index, if one exists. If your table has several different values proportional to the total number of records, this function is not suitable. It must also be adjusted for different column values. Warning This feature is widely open for SQL injection and should only be used in a secure environment.

Explain the results of the analysis:
Query with regular SELECT DISTINCT: Total runtime: 598310.705 ms
Query with SELECT small_distinct (...): total execution time: 1.156 ms

 CREATE OR REPLACE FUNCTION small_distinct( tableName varchar, fieldName varchar, sample anyelement = ''::varchar) -- Search a few distinct values in a possibly huge table -- Parameters: tableName or query expression, fieldName, -- sample: any value to specify result type (defaut is varchar) -- Author: T.Husson, 2012-09-17, distribute/use freely RETURNS TABLE ( result anyelement ) AS $BODY$ BEGIN EXECUTE 'SELECT '||fieldName||' FROM '||tableName||' ORDER BY '||fieldName ||' LIMIT 1' INTO result; WHILE result IS NOT NULL LOOP RETURN NEXT; EXECUTE 'SELECT '||fieldName||' FROM '||tableName ||' WHERE '||fieldName||' > $1 ORDER BY ' || fieldName || ' LIMIT 1' INTO result USING result; END LOOP; END; $BODY$ LANGUAGE plpgsql VOLATILE;

Sample Calls:

 SELECT small_distinct('observations','id_source',1); SELECT small_distinct('(select * from obs where id_obs > 12345) as temp', 'date_valid','2000-01-01'::timestamp); SELECT small_distinct('addresses','state');

Robert Monfera · Answer 4 · 2014-01-24T00:54:25+0000

In PostgreSQL 9.3, starting with a response from Denis:

  select bundles.bundle_id from bundles where exists ( select 1 from audit_records where audit_records.bundle_id = bundles.bundle_id );

just adding “limit 1” to the subquery, I got 60x acceleration (for my use case, with 8 million records, a composite index and 10k combinations), starting from 1800 ms to 30 ms:

  select bundles.bundle_id from bundles where exists ( select 1 from audit_records where audit_records.bundle_id = bundles.bundle_id limit 1 );

Slow single post request in postgres

More articles: