Why is this counting request so slow?

Question

Why is this counting request so slow?

Hi, I am hosted on Heroku by doing postgresql 9.1.6 according to their Ika plan (7.5 GB bar). I have a table called cars. I need to do the following:

SELECT COUNT(*) FROM "cars" WHERE "cars"."reference_id" = 'toyota_hilux'

Now it takes a very long time (64 seconds !!!)

 Aggregate (cost=2849.52..2849.52 rows=1 width=0) (actual time=63388.390..63388.391 rows=1 loops=1) -> Bitmap Heap Scan on cars (cost=24.76..2848.78 rows=1464 width=0) (actual time=1169.581..63387.361 rows=739 loops=1) Recheck Cond: ((reference_id)::text = 'toyota_hilux'::text) -> Bitmap Index Scan on index_cars_on_reference_id (cost=0.00..24.69 rows=1464 width=0) (actual time=547.530..547.530 rows=832 loops=1) Index Cond: ((reference_id)::text = 'toyota_hilux'::text) Total runtime: 64112.412 ms

A bit of background:

The table contains about 3.2 m rows, and the column I'm trying to count on has the following setting:

 reference_id character varying(50);

and index:

 CREATE INDEX index_cars_on_reference_id ON cars USING btree (reference_id COLLATE pg_catalog."default" );

What am I doing wrong? I expect this performance is not what I should expect - or should I?

+3

performance postgresql

Niels kristian Oct 22 '12 at 15:07

source share

1 answer

Erwin brandstetter · Accepted Answer · 2012-10-22T16:10:24+0000

What @ Satya claims in his comment is not entirely true. If there is a compliance index, the scheduler selects only a full table scan, if the statistics table assumes that it will return more than 5% (depends) on the table, because then it’s faster to scan the entire table.

As you can see on your own question, this does not apply to your request. It uses a raster image scan, followed by a raster map scan. Although I would expect a simple index scan. (?)

I noticed two more things in your explanation:
The first scan gets 832 lines, and the second reduces the score to 739. This will indicate that you have a lot of dead tuples in your index.

Check the runtime after each step with EXPLAIN ANALYZE and maybe add the results to your question:

First restart the query with EXPLAIN ANALYZE two or three times to fill the cache. What is the result of the last run compared to the first?

Further:

 VACUUM ANALYZE cars;

Restart

If you have many write operations in the table, I would set the fill factor below 100. For example:

 ALTER TABLE cars SET (fillfactor=90);

If the line size is large or you have many write operations. Then:

 VACUUM FULL ANALYZE cars;

It will take some time. Relocation.

Or, if you can afford it (and other important requests do not contradict the requirements):

 CLUSTER cars USING index_cars_on_reference_id;

This overwrites the table in the physical order of the index, which should make this kind of query much faster.

Normalize the circuit

If you want this to be very fast, create the car_type table using the serial primary key and access it from the cars table. This will reduce the required index to the part that is now.

It goes without saying that you back up before trying any of this.

 CREATE temp TABLE car_type ( car_type_id serial PRIMARY KEY , car_type text ); INSERT INTO car_type (car_type) SELECT DISTINCT car_type_id FROM cars ORDER BY car_type_id; ANALYZE car_type; CREATE UNIQUE INDEX car_type_uni_idx ON car_type (car_type); -- unique types ALTER TABLE cars RENAME COLUMN car_type_id TO car_type; -- rename old col ALTER TABLE cars ADD COLUMN car_type_id int; -- add new int col UPDATE cars c SET car_type_id = ct.car_type_id FROM car_type ct WHERE ct.car_type = c.car_type; ALTER TABLE cars DROP COLUMN car_type; -- drop old varchar col CREATE INDEX cars_car_type_id_idx ON cars (car_type_id); ALTER TABLE cars ADD CONSTRAINT cars_car_type_id_fkey FOREIGN KEY (car_type_id ) REFERENCES car_type (car_type_id) ON UPDATE CASCADE; -- add fk VACUUM FULL ANALYZE cars;

Or if you want to figure it out:

 CLUSTER cars USING cars_car_type_id_idx;

Your request will now look like this:

 SELECT count(*) FROM cars WHERE car_type_id = (SELECT car_type_id FROM car_type WHERE car_type = 'toyota_hilux')

And it should be even faster. Mostly because the index and table are now smaller, but also because integer processing is faster than varchar processing. However, the gain will not be dramatic over the cluster table in the varchar column.

A welcome side effect: if you need to rename a type, now it's a tiny UPDATE for a single row, and not for sharing with a large table at all.

Why is this counting request so slow?

Normalize the circuit

More articles: