Sample query to show PostgreSQL cardinality error

I am working on a project using PostgreSQL9.3. I used the query below to show how selectivity estimation errors can lead to a multiple increase in query execution time for a TPC-H workload using PostgreSQL8.3.

select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and (r_name='ASIA' or r_name='AFRICA') and o_orderdate >= date '1994-01-01' and o_orderdate < date '1994-01-01' + interval '1 year' and l_shipdate <= l_receiptdate and l_commitdate <= l_shipdate + integer '90' and l_extendedprice <= 20000 and c_name like '%r#00%' and c_acctbal <=2400 group by n_name order by revenue desc 

The problem was that PostgreSQL8.3 chose a plan that included many NestedLoop associations, because the selectivity assessment on lineitem and client was incorrect by a wide margin. I think this was mainly due to the LIKE pattern matching. But the optimal plan was to use Hash Joins.

I recently upgraded to PostgreSQL9.3 for my project and noticed that the above query no longer gives a bad plan. I spent some time trying to find a query with a big power estimation error according to the TPC-H 1GB without success so far. Do any PostgreSQL geeks check some shelf issues in the TPC-H test or any query to show the power rating error in PostgreSQL9.3.

+1
sql postgresql
Jul 28 '14 at 18:53
source share
1 answer

This is the answer to the comment from @Twelfth , as well as the question itself.

Three quotes from this chapter in the manual:
" Managing the Scheduler with Explicit JOIN Clauses "

The explicit syntax of the inner join ( INNER JOIN , CROSS JOIN or unadorned JOIN ) semantically matches the enumeration of input relations in FROM , and therefore does not limit the join order.

...

To force the scheduler to follow the join order set forth by explicit JOIN s, set the join_collapse_limit time join_collapse_limit to 1. (Other possible values โ€‹โ€‹are discussed below.)

...

Limiting the search for the scheduler in this way is a useful method both to reduce planning time and to direct the scheduler to a good query plan .

My bold accent. Conversely, you can abuse the same to direct the query planner to a poor query plan for testing purposes. Read the entire manual page. This should be helpful.

In addition, you can force nested loops to disable alternative methods one by one (best in your session). How:

 SET enable_hashjoin = off; 

Etc.
About verification and settings:

  • Request a parameter (postgresql.conf parameter), for example, "max_connections",

Forced errors in the actual assessment

One obvious way would be to disable autovacuum and add / remove rows from the table. Then the query planner works with outdated statistics. Please note that some other commands also update statistics.

Statistics are stored in the catalog tables pg_class and pg_statistics .

 SELECT * FROM pg_class WHERE oid = 'mytable'::regclass; SELECT * FROM pg_statistic WHERE starelid = 'mytable'::regclass; 

This leads me to another option. You can fake entries in these two tables. Superuser privilege required.
You do not attack me as new, but a warning to the general public: if you break something in the catalog tables, your database (cluster) may move up. You have been warned.

+1
Jul 28 '14 at 19:36
source share



All Articles