Advanced indexing using OR-ed conditions (pgsql)

I'm starting to better understand PostgreSQL indexing, but I have a problem with the conditional expression OR, where I do not know how to optimize my indexes for a faster query.

I have 6 conditional expressions that when run individually seem small. Here is an example of cropped queries, including the estimated time of the query plan.

(NOTE: I did not output the actual query plans for these queries below to reduce complexity, but they all use nested loop left joins and index scans , as I would expect with proper indexing. If necessary, I can include query plans for a more meaningful answer.)

 EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.* FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id WHERE (conditions1) LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------- Limit (cost=0.25..46.69 rows=1 width=171) (actual time=0.031..0.031 rows=0 loops=1) EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.* FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id WHERE (conditions2) LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------- Limit (cost=0.76..18.97 rows=1 width=171) (actual time=14.764..14.764 rows=0 loops=1) /* snip */ EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.* FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id WHERE (conditions6) LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------- Limit (cost=0.51..24.48 rows=1 width=171) (actual time=0.252..5.332 rows=10 loops=1) 

My problem is that I want to join these 6 conditions together with OR operators, which makes each condition possible. My combined query looks something like this:

 EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.* FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id WHERE (conditions1 OR conditions2 OR conditions3 OR conditions4 OR conditions5 OR conditions 6) LIMIT 10; 

Unfortunately, this leads to an increase in MASSIVE in terms of queries, which no longer uses my indexes (instead, you select hash left join rather than nested loop left join , and perform various sequence scans on the previously used index scans ).

 Limit (cost=142.62..510755.78 rows=1 width=171) (actual time=30.591..30.986 rows=10 loops=1) 

Is there anything special I need to know about indexing regarding OR-ed conditions that will improve my final query?

UPDATE If I use UNION for every single SELECT, this speeds up the query. However, does this prevent me from ordering my results if I want in the future? Here is what I did to speed up the query through UNION:

 EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.* FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id WHERE (conditions1) UNION SELECT t1.*, t2.*, t3.* FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id WHERE (conditions2) UNION SELECT t1.*, t2.*, t3.* FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id WHERE (conditions3) UNION SELECT t1.*, t2.*, t3.* FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id WHERE (conditions4) UNION SELECT t1.*, t2.*, t3.* FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id WHERE (conditions5) UNION SELECT t1.*, t2.*, t3.* FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id WHERE (conditions6) LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------- Limit (cost=219.14..219.49 rows=6 width=171) (actual time=125.579..125.653 rows=10 loops=1) 
+3
source share
2 answers

Depending on the conditions, it is logically impossible to use any index to help a complex condition with OR expressions.

Like MySQL, PostgreSQL 8.0 and earlier states in its docs for indexes :

Note that a query or data processing command can use no more than one index for each table.

With PostgreSQL 8.1, this has changed .

However, if this does not help, you can use the UNION solution that you tried (this is a common solution for MySQL users, which still has a limit of one index per table).

You should be able to order the results of a UNION query, but you should use parentheses to indicate that ORDER BY applies to the UNION result, not just the last subquery in the chain.

 (SELECT ... ) UNION (SELECT ... ) UNION (SELECT ... ) ORDER BY columnname; 

I hope this helps; I am not an expert in the PostgreSQL optimizer. You can try searching the mailing list archives or request an IRC channel .

+4
source

(Sorry - I do not know how to answer the answer, so this will be the top level)

To clarify - PG used only one index for one table scan. If you have a query joining three tables, and each of them has a useful index, it was always smart enough to use all three.

In your particular case, it is likely that you have some kind of connection between the ORed conditions. PostgreSQL does not know this, and therefore ends up in the fact that it will match the number of rows than it actually is. Enough lines to change the query plan.

Also, your UNIONed queries are not exactly the same as separate ones, since you LIMIT each small one separately, and not the entire result set with UNION.

You should be able to order the results of a UNION query, but you have to use parentheses to indicate that ORDER BY applies to the result of UNION, and not just the last subquery in the chain.

This is not true - ORDER BY applies to the whole result.

NTN

+2
source

All Articles