Postgresql - the query runs much faster with enable_nestloop = false. Why is the scheduler not doing the right thing?

Question

Postgresql - the query runs much faster with enable_nestloop = false. Why is the scheduler not doing the right thing?

I have a query that runs much slower (~ 5 minutes) when I run it with the default value enable_nestloop = true and enable_nestloop = false (~ 10 seconds).

Explain the result of the analysis for both cases:

Machine A nestloop = true - http://explain.depesz.com/s/nkj0 (~ 5 minutes) Machine A nestloop = false - http://explain.depesz.com/s/wBM (~ 10 seconds)

On another slightly slower machine, copying the database and leaving default to enable_nestloop = true takes ~ 20 seconds.

Machine B nestloop = true - (~ 20 seconds)

For all of the above cases, I guaranteed that I made ANALYZE before executing the queries. Other requests were not executed in parallel.

Both machines work with Postgres 8.4. Machine A runs Ubuntu 10.04 32 bit, while Machine B runs Ubuntu 8.04 32 bit.

Actual request is available here. This is a reporting query with many associations, since the database is mainly used for transaction processing.

Without resorting to creating something like materialized views, what can I do to make the scheduler do what I achieved by setting enable_nestloop = false?
From the research that I did, it seems that the reason the planner selects a seemingly suboptimal query is the huge difference between the estimated and actual lines. How can I approximate this figure?
, ?
, , B. ?

+5

postgresql

Mohan 25 . '11 5:41

4

, , , .

. Wiki PostgreSQL . random_page_cost default_statistics_target.
, Planner .

, statistics target :

ALTER TABLE postgres.products ALTER COLUMN id SET STATISTICS 1000;
ALTER TABLE postgres.sales_orders ALTER COLUMN retailer_id SET STATISTICS 1000;
ALTER TABLE postgres.sales_orders ALTER COLUMN company_id SET STATISTICS 1000;

ALTER TABLE goods_return_notes ALTER COLUMN retailer_id SET STATISTICS 1000;
ALTER TABLE goods_return_notes ALTER COLUMN company_id SET STATISTICS 1000;

ALTER TABLE retailer_category_leaf_nodes ALTER COLUMN tree_left SET STATISTICS 1000;
ALTER TABLE channels ALTER COLUMN principal_id SET STATISTICS 1000;

,

.

. , . - 100. → 1000 . . ANALYZE , .

postgres(sales_orders.retailer_id) WHERE retailer_id IS NOT NULL ( , NULL).

, , - 9.1. .

+2

Erwin Brandstetter 25 . '11 7:37

: PostgreSQL JOINs.

JOIN, JOINING.

, 15 JOIN. JOIN (n!). JOIN-, 15 JOINs - 15!= 1307674368000 .

, Genetic Query Optimizer. . : . "geqo_threshold" , JOINs , Genetic Query Optimizer.

, PostgreSQL ( ). , ANALYZE, .

I think that in the general case, if you have so many tables for JOIN, you better do what you did: rewriting the query in the optimal JOIN order.

+2

Rauni Lillemets Dec 29 '11 at 8:33

source share

Usually there is only one reason for different plans for the same data and the same queries on two servers with the same PostgreSQL. This is a different configuration - basically the value of work_mem. A hash connection is usually faster, but requires a lot of available memory.

0

Pavel stehule Oct 25 '11 at 16:43

source share

Mohan · Accepted Answer · 2011-11-08T07:24:46+0000

, . , . , , . , , , , .

Postgresql - the query runs much faster with enable_nestloop = false. Why is the scheduler not doing the right thing?

More articles: