How to check if a record exists in the database - quick method

I have a table where I store unique text strings and then check if this row exists in the database by selecting

String checkIfAlreadyScanned = "SELECT id FROM \"STRINGS_DB\" where STR ='" + mystring + "'"; 

then I check if the value exists. My database has about 5 million records; can i improve my method?

Maybe there is a way to create a new attribute (hashedSTR), for example, and convert the string to some unique numeric value, and then get these numbers instead of strings? Will it work faster? (will this work at all?)

+4
source share
9 answers

To ensure the fastest processing, make sure that:

  • The field you are looking for is indexed (you talked about a "unique" string, so I assume that it is already so. For this reason, "restriction 1" is not required, otherwise it must be added)
  • You use the ExecuteScalar() method for the Command object
+4
source

Testing doesn't make sense, just include the "test" in the where clause:

 INSERT INTO silly_table(the_text) 'literal_text' WHERE NOT EXISTS ( SELECT * FROM silly_table WHERE the_text = 'literal_text' ); 

Now you will do the test only when it is needed : at the end of the instruction, the line will exist. There is no such thing as an attempt.

For those who do not understand testing, there is no point: testing will make sense if after the test it is not possible to change the situation after the test. This will require a validation and blocking script. Or, even worse: a test inside a transaction.

UPDATE: the version that works (basically the same):

 DROP TABLE exitsnot CASCADE; CREATE TABLE exitsnot ( id SERIAL NOT NULL PRIMARY KEY , val INTEGER -- REFERENCES something , str varchar -- REFERENCES something ); INSERT INTO exitsnot (val) SELECT 42 WHERE NOT EXISTS ( SELECT * FROM exitsnot WHERE val = 42 ); INSERT INTO exitsnot (str) SELECT 'silly text' WHERE NOT EXISTS ( SELECT * FROM exitsnot WHERE str = 'silly text' ); SELECT version(); 

Conclusion:

 DROP TABLE NOTICE: CREATE TABLE will create implicit sequence "exitsnot_id_seq" for serial column "exitsnot.id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "exitsnot_pkey" for table "exitsnot" CREATE TABLE INSERT 0 1 INSERT 0 1 version ---------------------------------------------------------------------------------------------- PostgreSQL 9.1.2 on i686-pc-linux-gnu, compiled by gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 32-bit (1 row) 
+2
source
 String checkIfAlreadyScanned = "SELECT 1 FROM \"STRINGS_DB\" where STR ='" + mystring + "'"; 

If your result set contains a string, then you have an entry

+1
source

Limit the result set to 1:

 String checkIfAlreadyScanned = @" SELECT id FROM ""STRINGS_DB"" where STR ='" + mystring + @"' limit 1"; 

This, the index in this column and the @Laurent clause for ExecuteScalar() will give the best result.

In addition, if mystring has every opportunity to affect the user, then parameterize the query to avoid sql injection.

Pure version:

 String checkIfAlreadyScanned = @" SELECT id FROM ""STRINGS_DB"" where STR = '@mystring' limit 1 ".replace("@mystring", mystring); 
+1
source

How long are these text lines? If they are very long, you can improve performance by storing the hash of the strings (along with the original strings).

 CREATE TABLE strings_db ( id PRIMARY KEY INT, text TEXT, hash TEXT ); 

Your hash column can store MD5, CRC32, or any other hash algorithm you choose. And it must be indexed.

Then change your request to something like:

 SELECT id FROM strings_db WHERE hash=calculate_hash(?) 

If the average size of your text fields is large enough than the size of your hashes, performing a search in a shorter field will help with disk I / O. It also means extra CPU overhead on insertion and selection, hash calculation, and additional disk space for storing the hash. Therefore, all these factors must be taken into account.

PS Always use prepared statements to avoid SQL injection attacks!

+1
source

Actually, there is just such a thing as you ask. But he has some limitations. PostgreSQL supports the hash index type:

 CREATE INDEX strings_hash_idx ON "STRINGS_DB" USING hash (str); 

Works for a simple search for equality using = , just like you do. I will quote a guide about limitations:

Hash index operations are not currently registered in the WAL, so hash indexes may need to be rebuilt using REINDEX after a database failure. They are also not replicated over streaming or file replication. For these reasons, the use of a hash index is currently not encouraged.


Quick test on a real life table, 433 thousand rows, only 59 MB:

 SELECT * FROM tbl WHERE email = ' some.user@some.domain.com ' 
 -- No index, sequnence scan: Total runtime: 188 ms -- B-tree index (default): Total runtime: 0.046 ms -- Hash index: Total runtime: 0.032 ms 

This is not huge, but something. The difference will be more significant with longer lines than the email address in my test. Creating an index was 1 or 2 seconds. with an index.

+1
source

[Change] Limit results are returned to return the first record that occurs that meets the criteria: For SqlServer: select TOP 1 ...; For mysql / postgres: select ... LIMIT 1;

If the number can be a multiple, perhaps adding "TOP 1" to your select statement may return faster.

 String checkIfAlreadyScanned = "SELECT TOP 1 id FROM \"STRINGS_DB\" where STR ='" + mystring + "'"; 

Thus, he should only find the first instance of the string.

But, if you do not have multiple values, you are unlikely to see much benefit with this approach.

Like others, he said the index could help.

0
source

Assuming you really don't need the id column, I think this gives the compiler the greatest chance of optimizing:

 select 1 where exists( select 1 from STRINGS_DB where STR = 'MyString' ) 
0
source

Although the whole answer here has its merits, I would like to mention one more aspect.

Building a query this way and passing a string will not help the database engine optimize your query. Instead, you should write a stored procedure, call it passing one parameter, and let the database engine build a query plan and reuse your command.

Of course, the field must be indexed.

0
source

All Articles