To indicate something else: having a (complex) query with JOINs, SUBSELECTs, UNIONs, it is possible (or not) to reduce it to a simpler equivalent SQL statement that produces the same result using some conversion rules
What optimizers do for a living (not what I say they always do it well).
Since SQL is a set-based language, there are usually several ways to convert one query to another.
Like this query:
SELECT * FROM mytable WHERE col1 > @value1 OR col2 < @value2
can convert to this:
SELECT * FROM mytable WHERE col1 > @value1 UNION SELECT * FROM mytable WHERE col2 < @value2
or that:
SELECT mo.* FROM ( SELECT id FROM mytable WHERE col1 > @value1 UNION SELECT id FROM mytable WHERE col2 < @value2 ) mi JOIN mytable mo ON mo.id = mi.id
which look uglier but can give better execution plans.
One of the most common things is replacing this request:
SELECT * FROM mytable WHERE col IN ( SELECT othercol FROM othertable )
with this:
SELECT * FROM mytable mo WHERE EXISTS ( SELECT NULL FROM othertable o WHERE o.othercol = mo.col )
In some RDBMS (for example, PostgreSQL ), DISTINCT and GROUP BY use different execution plans, so sometimes it is better to replace one with another:
SELECT mo.grouper, ( SELECT SUM(col) FROM mytable mi WHERE mi.grouper = mo.grouper ) FROM ( SELECT DISTINCT grouper FROM mytable ) mo
against.
SELECT mo.grouper, SUM(col) FROM mytable GROUP BY mo.grouper
In PostgreSQL , DISTINCT sorted and GROUP BY hashes.
MySQL missing a FULL OUTER JOIN , so it can be rewritten as follows:
SELECT t1.col1, t2.col2 FROM table1 t1 LEFT OUTER JOIN table2 t2 ON t1.id = t2.id
against.
SELECT t1.col1, t2.col2 FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.id UNION ALL SELECT NULL, t2.col2 FROM table1 t1 RIGHT JOIN table2 t2 ON t1.id = t2.id WHERE t1.id IS NULL
but see this blog post on how to do this more efficiently in MySQL :
This hierarchical query in Oracle :
SELECT DISTINCT(animal_id) AS animal_id FROM animal START WITH animal_id = :id CONNECT BY PRIOR animal_id IN (father, mother) ORDER BY animal_id
can convert to this:
SELECT DISTINCT(animal_id) AS animal_id FROM ( SELECT 0 AS gender, animal_id, father AS parent FROM animal UNION ALL SELECT 1, animal_id, mother FROM animal ) START WITH animal_id = :id CONNECT BY parent = PRIOR animal_id ORDER BY animal_id
the latter of which is more effective.
See this article on his blog for details of the implementation plan:
To find all ranges that overlap a given range, you can use the following query:
SELECT * FROM ranges WHERE end_date >= @start AND start_date <= @end
but in SQL Server this more complex query gives the same results faster:
SELECT * FROM ranges WHERE (start_date > @start AND start_date <= @end) OR (@start BETWEEN start_date AND end_date)
and believe it or not, I also have an article on my blog:
SQL Server also lacks an efficient way to create aggregate aggregates, so this query:
SELECT mi.id, SUM(mo.value) AS running_sum FROM mytable mi JOIN mytable mo ON mo.id <= mi.id GROUP BY mi.id
You can rewrite cursors more efficiently with, help me, cursors (you heard me right: cursors , more efficiently and SQL Server in one sentence).
Check out this blog post on how to do this:
There is a certain type of request that is commonly found in financial applications that are looking for an effective rate for a currency, for example, in Oracle :
SELECT TO_CHAR(SUM(xac_amount * rte_rate), 'FM999G999G999G999G999G999D999999') FROM t_transaction x JOIN t_rate r ON (rte_currency, rte_date) IN ( SELECT xac_currency, MAX(rte_date) FROM t_rate WHERE rte_currency = xac_currency AND rte_date <= xac_date )
This query can be heavily rewritten to use an equality condition that allows HASH JOIN instead of NESTED LOOPS :
WITH v_rate AS ( SELECT cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate FROM ( SELECT cur_id, dte_date, ( SELECT MAX(rte_date) FROM t_rate ri WHERE rte_currency = cur_id AND rte_date <= dte_date ) AS rte_effdate FROM ( SELECT ( SELECT MAX(rte_date) FROM t_rate ) - level + 1 AS dte_date FROM dual CONNECT BY level <= ( SELECT MAX(rte_date) - MIN(rte_date) FROM t_rate ) ) v_date, ( SELECT 1 AS cur_id FROM dual UNION ALL SELECT 2 AS cur_id FROM dual ) v_currency ) v_eff LEFT JOIN t_rate ON rte_currency = cur_id AND rte_date = rte_effdate ) SELECT TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999') FROM ( SELECT xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt FROM t_transaction x GROUP BY xac_currency, TRUNC(xac_date) ) JOIN v_rate ON eff_currency = xac_currency AND eff_date = xac_date
Despite being cumbersome, the last request is 6 times faster.
The main idea here is to replace <= with = , which requires building a calendar table in memory. up to JOIN s.