Index is not used when LIMIT is used in postgres

I have a word table with an index on (language_id, state). The following are the results for EXPLAIN ANALYZE:

With no restrictions

explain analyze SELECT "words".* FROM "words" WHERE (words.language_id = 27) AND (state IS NULL); Bitmap Heap Scan on words (cost=10800.38..134324.10 rows=441257 width=96) (actual time=233.257..416.026 rows=540556 loops=1) Recheck Cond: ((language_id = 27) AND (state IS NULL)) -> Bitmap Index Scan on ls (cost=0.00..10690.07 rows=441257 width=0) (actual time=230.849..230.849 rows=540556 loops=1) Index Cond: ((language_id = 27) AND (state IS NULL)) Total runtime: 460.277 ms (5 rows) 

Limit 100

 explain analyze SELECT "words".* FROM "words" WHERE (words.language_id = 27) AND (state IS NULL) LIMIT 100; Limit (cost=0.00..51.66 rows=100 width=96) (actual time=0.081..0.184 rows=100 loops=1) -> Seq Scan on words (cost=0.00..227935.59 rows=441257 width=96) (actual time=0.080..0.160 rows=100 loops=1) Filter: ((state IS NULL) AND (language_id = 27)) Total runtime: 0.240 ms (4 rows) 

Why is this happening? How can I make the index be used in all cases?

Thanks.

+7
source share
4 answers

I think the PostreSQL query planner just thinks that in the second case - the one that has LIMIT - it is not worth it to use the index because it is [LIMIT] too small. So this is not a problem.

+7
source

Unlimited: rows = 540556 loops = 1 Total run time: 460.277 ms

With limit: rows = 100 loop = 1 Total run time: 0.240 ms

I do not see a problem here. If your query yields 500K rows, it will take longer.

+3
source

Take a look at the PostgreSQL documentation on Using EXPLAIN and Planning Queries . The reason the query planner prefers sequential index scan scanning in the case of LIMIT 100 is simply because sequential scanning is cheaper.

There is no ORDER BY in the query, so the scheduler works with the first 100 (random) rows that match the filter condition. To scan an index, you must first read the index pages and then read the data pages to get the corresponding rows. A sequential scan only requires reading data pages to extract rows. Your table statistics table seems to assume that there are enough (random) rows that match the filter condition. The cost of sequentially reading pages to get 100 rows is considered cheaper than the cost of first reading the index, and then fetching the actual rows. You can see a different plan when you increase the limit or when fewer lines match the filter condition.

At default settings, the scheduler takes into account the cost of randomly reading a page (random_page_cost) four times the cost of a sequential page (seq_page_cost). These settings can be configured to configure query plans (for example, when the entire database is in RAM, random page reading is not more expensive than sequential page reading, and it is desirable that index checking is preferred). You can also try out different query plans by enabling / disabling certain types of scans, for example:

 set enable_seqscan = [on | off] set enable_indexscan = [on | off] 

While you can enable / disable certain types of scans on a global basis, this should only be used ad hoc for debugging or troubleshooting for each session.

Also run VACUUM ANALYZE words before testing query plans, otherwise automatic analysis (autovaccum) between tests can affect the results.

+3
source

It is also strange that two queries return a different number of rows. I assume that you insert at least ... Uh, what if you make sub-choices?

 select * from (select ...) limit 100; 
0
source

All Articles