From a set of values, how to find values ​​not stored in a table column?

I have a table in which hundreds of thousands of integers will potentially be stored:

desc id_key_table; +----------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------+--------------+------+-----+---------+-------+ | id_key | int(16) | NO | PRI | NULL | | +----------------+--------------+------+-----+---------+-------+ 

From the program, I have a large set of integers. I would like to see which of these integers is NOT in the id_key column above.

So far I have come up with the following approaches:

1) Pass through each integer and do:

 select count(*) count from id_key_table where id_key = :id_key 

When count is 0, id_key is not in the table.

It seems like a terrible, terrible way to do it.


2) Create a temporary table, insert each of the values ​​into the temporary table and execute JOIN in the two tables.

 create temporary table id_key_table_temp (id_key int(16) primary key ); insert into id_key_table_temp values (1),(2),(3),...,(500),(501); select temp.id_key from id_key_table_temp temp left join id_key_table as main on temp.id_key = main.id_key where main.killID is null; drop table id_key_table_temp; 

This seems like a better approach, but I'm sure there is a much better approach that I haven't thought about yet. I would prefer not to create a temporary table and use a single query to determine which integers are missing.

Is there a valid query for this type of search?

(databases)

+4
source share
1 answer

Using your code in the second example asked in the question, I created two stored procedures (SP): 1 SP to load an approximate table of primes in the form of keys, and the other to search for missing integers.

Here is the first SP:

 DELIMITER $$ DROP PROCEDURE IF EXISTS `test`.`CreateSampleTable` $$ CREATE PROCEDURE `test`.`CreateSampleTable` (maxinttoload INT) BEGIN DECLARE X,OKTOUSE,MAXLOOP INT; DROP TABLE IF EXISTS test.id_key_table; CREATE TABLE test.id_key_table (id_key INT(16)) ENGINE=MyISAM; SET X=2; WHILE X <= maxinttoload DO INSERT INTO test.id_key_table VALUES (X); SET X = X + 1; END WHILE; ALTER TABLE test.id_key_table ADD PRIMARY KEY (id_key); SET MAXLOOP = FLOOR(SQRT(maxinttoload)); SET X = 2; WHILE X <= MAXLOOP DO DELETE FROM test.id_key_table WHERE MOD(id_key,X) = 0 AND id_key > X; SELECT MIN(id_key) INTO OKTOUSE FROM test.id_key_table WHERE id_key > X; SET X = OKTOUSE; END WHILE; OPTIMIZE TABLE test.id_key_table; SELECT * FROM test.id_key_table; END $$ DELIMITER ; 

Here is the second SP:

 DELIMITER $$ DROP PROCEDURE IF EXISTS `test`.`GetMissingIntegers` $$ CREATE PROCEDURE `test`.`GetMissingIntegers` (maxinttoload INT) BEGIN DECLARE X INT; DROP TABLE IF EXISTS test.id_key_table_temp; CREATE TEMPORARY TABLE test.id_key_table_temp (id_key INT(16)) ENGINE=MyISAM; SET X=1; WHILE X <= maxinttoload DO INSERT INTO test.id_key_table_temp VALUES (X); SET X = X + 1; END WHILE; ALTER TABLE test.id_key_table_temp ADD PRIMARY KEY (id_key); SELECT temp.id_key FROM test.id_key_table_temp temp LEFT JOIN test.id_key_table main USING (id_key) WHERE main.id_key IS NULL; END $$ DELIMITER ; 

Here's an example of Run of First SP using the number 25 to create prime numbers:

 mysql> CALL test.CreateSampleTable(25); +-------------------+----------+----------+----------+ | Table | Op | Msg_type | Msg_text | +-------------------+----------+----------+----------+ | test.id_key_table | optimize | status | OK | +-------------------+----------+----------+----------+ 1 row in set (0.16 sec) +--------+ | id_key | +--------+ | 2 | | 3 | | 5 | | 7 | | 11 | | 13 | | 17 | | 19 | | 23 | +--------+ 9 rows in set (0.17 sec) mysql> 

Here is a run of the second SP using 25 as a complete list for comparison:

 mysql> CALL test.GetMissingIntegers(25); +--------+ | id_key | +--------+ | 1 | | 4 | | 6 | | 8 | | 9 | | 10 | | 12 | | 14 | | 15 | | 16 | | 18 | | 20 | | 21 | | 22 | | 24 | | 25 | +--------+ 16 rows in set (0.03 sec) Query OK, 0 rows affected (0.05 sec) mysql> 

Although this solution is suitable for small samples, large lists become a major headache. You might want to save the temporary table (do not use CREATE TEMPORARY TABLE again and again, use CREATE TABLE only once), which is constantly loaded with the numbers 1 .. MAX (id_key) and populates this constant table tempo through the trigger on id_key_table.

Just a question, because I'm wondering: are you doing this to see if auto_increment keys from the table can be reused ???

+4
source

All Articles