Bitmap Scan Performance

I have a large report table. The heap raster scan step takes more than 5 seconds.

Is there something I can do? I add columns to the table, does updating the index that it uses help?

I combine and summarize the data, so I do not return 500K records to the client. I am using postgres 9.1.
Here is the explanation:

Bitmap Heap Scan on foo_table (cost=24747.45..1339408.81 rows=473986 width=116) (actual time=422.210..5918.037 rows=495747 loops=1) Recheck Cond: ((foo_id = 72) AND (date >= '2013-04-04 00:00:00'::timestamp without time zone) AND (date <= '2013-05-05 00:00:00'::timestamp without time zone)) Filter: ((foo)::text = 'foooooo'::text) -> Bitmap Index Scan on foo_table_idx (cost=0.00..24628.96 rows=573023 width=0) (actual time=341.269..341.269 rows=723918 loops=1) 

Query:

 explain analyze SELECT CAST(date as date) AS date, foo_id, .... from foo_table where foo_id = 72 and date >= '2013-04-04' and date <= '2013-05-05' and foo = 'foooooo' Index def: Index "public.foo_table_idx" Column | Type -------------+----------------------------- foo_id | bigint date | timestamp without time zone btree, for table "public.external_channel_report" 

Table:
foo is a text field with 4 different values.
foo_id bigint with current 10K different values.

+1
sql indexing postgresql postgresql-performance
source share
2 answers

Create a composite index on (foo_id, foo, date) (in that order).

Please note: if you select 500k records (and return them to all the client), this can take a lot of time.

Are you sure that you need all 500k records on the client (and not some aggregate or LIMIT )?

+3
source share

Reply to comment

Do I need where the columns are in the same index order?

The order of expressions in the WHERE completely irrelevant ; SQL is not a procedural language.

Correct mistakes

The timestamp column should not be called a "date" for several reasons. Obviously, this is a timestamp , not a date . But more importantly, date is a reserved word in all SQL standards and a type and function name in Postgres and should not be used as an identifier.

You must provide the correct information with your question, including a full definition of the table and final information on existing indexes. It might be nice for me to start by reading the index chapter in the manual .

The WHERE conditions on the timestamp are most likely incorrect:

 and date >= '2013-04-04' and date <= '2013-05-05' 

The upper bound for the timestamp column should probably be excluded:

 and date >= '2013-04-04' and date < '2013-05-05' 

Index

If a multi- column pointer @Quassnoi is provided , your query will be much faster, since all qualification rows can be read from one continuous block of index data. No line is read in vain (and later disqualified), as is your case now.
But 500k lines will still take some time. Usually you should check visibility and retrieve additional columns from the table. checking only the index may be an option in Postgres 9.2 +.

The order of the columns is best because the rule of thumb is: columns for equality first - then for ranges. More explanation and links in this related answer to dba.SE.

CLUSTER / pg_repack

You can speed up the process by arranging the table according to this index so that you need to read a minimum of blocks from the table - unless you have other requirements that oppose this!

If you want faster, however, you can arrange the physical order of the rows in the table. If you can afford to lock your table for only a few seconds (for example, after hours) to rewrite the table and arrange the rows according to the index:

 ALTER TABLE foo_table CLUSTER ON idx_myindex_idx; 

If concurrent use is a problem, consider pg_repack , which can do the same without blocking exclusively.

Effect: fewer blocks must be read from the table, and everything is pre-sorted. This is a one-time effect that worsens over time if you write on the table. Therefore, you repeat it from time to time.

I copied and adapted the last chapter from this related answer on dba.SE.

+3
source share

All Articles