SQL comparison: tuple

In my current application, I need to execute this type of request:

SELECT MIN((colA, colB, colC)) FROM mytable WHERE (colA, colB, colC) BETWEEN (200, 'B', 'C') AND (1000, 'E', 'F') 

and get the answer (333, 'B', 'B') , given this data:

 +------+------+------+ | colA | colB | colC | +------+------+------+ | 99 | A | A | | 200 | A | Z | | 200 | B | B | | 333 | B | B | | 333 | C | D | | 333 | C | E | | 333 | D | C | | 1000 | E | G | | 1000 | F | A | +------+------+------+ 

What is the most efficient way to accomplish this in real SQL? Please keep in mind that this is an example of a toy, and that my actual application has tables with various columns and data types and hundreds of millions of rows. I use MySQL if this helps. You can also assume that these columns have a PRIMARY or UNIQUE index.

If the solution is easily expandable to more / less columns, this is even better.


Comparison of tuples:

Several people asked, so I have to put this in question. Tuples are ordered lexicographically, which means that the sequences are ordered in the same way as their first distinct elements. For example, (1,2, x) <(1,2, y) returns the same as x <y.

It is worth noting that SQL (or at least mysql) implements this correctly:

 mysql> select (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F'); +--------------------------------------------------------------------------+ | (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F') | +--------------------------------------------------------------------------+ | 1 | +--------------------------------------------------------------------------+ 1 row in set (0.00 sec) 

SQL is needed here to create the example:

 create table mytable select 333 colA, 'B' colB, 'B' colC; insert into mytable values (200, 'B', 'B'), (333, 'C', 'D'), (1000, 'E', 'G'), (200, 'A', 'Z'), (1000, 'F', 'A'), (333, 'C', 'E'), (333, 'D', 'C'), (99, 'A', 'A'); alter table mytable add unique index myindex (colA, colB, colC); 

Adding this index makes the table sorted lexicographically, which is interesting. This is not consistent with our production system.

+13
source share
2 answers

Just do:

 SELECT colA , colB , colC FROM mytable WHERE ( ('A', 'B', 'C') <= (colA, colB, colC ) ) AND ( (colA, colB, colC) <= ('D', 'E', 'F' ) ) ORDER BY colA, colB, colC LIMIT 1 ; 

It works great. And I suspect this should be pretty fast too.


This is equivalent, but may have better performance depending on your tables:

 SELECT m.colA , m.colB , m.colC FROM mytable m WHERE ( ('A', 'B', 'C') <= (m.colA, m.colB, m.colC) ) AND ( (m.colA, m.colB, m.colC) <= ('D', 'E', 'F') ) AND NOT EXISTS ( SELECT 1 FROM mytable b WHERE (b.colA, b.colB, b.colC) < (m. colA, m.colB, m.colC) AND ( ('A', 'B', 'C') <= (b.colA, b.colB, b.colC) ) ); 
+7
source

--- EDIT ---: (Previous previous incorrect tests removed)

The second attempt (not quite relational algebra).

This works, but only when the char (1) fields:

 SELECT colA, colB, colC FROM mytable WHERE CONCAT(colA, colB, colC) BETWEEN CONCAT('A', 'B', 'C') AND CONCAT('D', 'E', 'F') ORDER BY colA, colB, colC LIMIT 1 ; 

I thought that a view that shows all combinations of tuples from mytable that are less than or equal to tuples in the same table could be useful, as this can be used for other comparisons:

 CREATE VIEW lessORequal AS ( SELECT a.colA AS smallA , a.colB AS smallB , a.colC AS smallC , b.colA AS largeA , b.colB AS largeB , b.colC AS largeC FROM mytable a JOIN mytable b ON (a.colA < b.colA) OR ( (a.colA = b.colA) AND ( (a.colB < b.colB) OR (a.colB = b.colB AND a.colC <= b.colC) ) ) ) ; 

Using a similar technique, this solves the issue. It works with any fields (int, float, char of any length). It will be kind of awkard and harder, although if you try to add more fields.

 SELECT colA, colB, colC FROM mytable m WHERE ( ('A' < colA) OR ( ('A' = colA) AND ( ('B' < colB) OR ('B' = colB AND 'C' <= colC) ) ) ) AND ( (colA < 'D') OR ( (colA = 'D') AND ( (colB < 'E') OR (colB = 'E' AND colC <= 'F') ) ) ) ORDER BY colA, colB, colC LIMIT 1 ; 

You can also define a function:

 CREATE FUNCTION IslessORequalThan( lowA CHAR(1) , lowB CHAR(1) , lowC CHAR(1) , highA CHAR(1) , highB CHAR(1) , highC CHAR(1) ) RETURNS boolean RETURN ( (lowA < highA) OR ( (lowA = highA) AND ( (lowB < highB) OR ( (lowB = highB) AND (lowC <= highC) ) ) ) ); 

and use it to solve the same or similar problems. This solves the issue again. The query is elegant, but if you change the type or number of fields, you must create a new function.

 SELECT colA , colB , colC FROM mytable WHERE IslessORequalThan( 'A', 'B', 'C', colA, colB, colC ) AND IslessORequalThan( colA, colB, colC, 'D', 'E', 'F' ) ORDER BY colA, colB, colC LIMIT 1; 

Until then, because the condition

(colA, colB, colC) BETWEEN ('A', 'B', 'C') AND ('D', 'E', 'F')

in MySQL was not allowed, I thought that

('A', 'B', 'C') <= (colA, colB, colC)

also not allowed. But I was wrong.

+3
source

All Articles