Take a look at the PostgreSQL documentation on Using EXPLAIN and Planning Queries . The reason the query planner prefers sequential index scan scanning in the case of LIMIT 100
is simply because sequential scanning is cheaper.
There is no ORDER BY
in the query, so the scheduler works with the first 100 (random) rows that match the filter condition. To scan an index, you must first read the index pages and then read the data pages to get the corresponding rows. A sequential scan only requires reading data pages to extract rows. Your table statistics table seems to assume that there are enough (random) rows that match the filter condition. The cost of sequentially reading pages to get 100 rows is considered cheaper than the cost of first reading the index, and then fetching the actual rows. You can see a different plan when you increase the limit or when fewer lines match the filter condition.
At default settings, the scheduler takes into account the cost of randomly reading a page (random_page_cost) four times the cost of a sequential page (seq_page_cost). These settings can be configured to configure query plans (for example, when the entire database is in RAM, random page reading is not more expensive than sequential page reading, and it is desirable that index checking is preferred). You can also try out different query plans by enabling / disabling certain types of scans, for example:
set enable_seqscan = [on | off] set enable_indexscan = [on | off]
While you can enable / disable certain types of scans on a global basis, this should only be used ad hoc for debugging or troubleshooting for each session.
Also run VACUUM ANALYZE words
before testing query plans, otherwise automatic analysis (autovaccum) between tests can affect the results.
tscho
source share