Optimized insertion in chuncks of millions of records, MySQL and PHP

I need to populate a MySQL table with random SHA-1 hash values ​​generated by the PHP function. I am trying to optimize an insert by breaking it into pieces of 10000. My question is: Is the following approach effective? Here is the code.

//MySQL server connection routines are above this point if ($select_db) { $time_start = microtime(true); //query $query = 'INSERT INTO sha1_hash (sha1_hash) VALUES '; for ($i=1; $i<1000001; $i++) { $query .= "('".sha1(genRandomString(8))."'),"; $count++; if ($count ==10000) { //result $result = mysql_query(rtrim($query,',')) or die ('Query error:'.mysql_error()); if ($result) mysql_free_result($result); $count = 0; } } $time_end = microtime(true); echo '<br/>'. ($time_end - $time_start); } //function to generate random string function genRandomString($length) { $charset='abcdefghijklmnopqrstuvwxyz0123456789'; $count = strlen($charset); while ($length--) { $str .= $charset[mt_rand(0, $count-1)]; } return $str; } 

EDIT: the variables $time_start and $time_end are ONLY for performance testing purposes. In addition, the MySQL table has only two fields: ID int(11) UNSIGNED NOT NULL AUTO_INCREMENT and sha1_hash varchar(48) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL , MyISAM EDIT2 engine: the point of view of the computer is not connected with the question.

+4
source share
1 answer

Insertions are usually performed in large batches because indexes are updated after each insertion. Batching allows you to insert many records and then update indexes only once at the end, and not after each row.

However, in the case of an index with an automatic incremental primary key, the index must be expanded to even add a new row, so you do not save anything since you have no other indexes.

Batching also saves some overhead when parsing requests and blocking. However, you might also consider using parameterized queries (PDOs).

Inserting one record at a time using a PDO-parameterized query will also be very fast, since MySQL only has to parse the query once, and from then on it uses a small binary binary data string transfer.

You can lock the table before entering with LOCK TABLES . This will save a little overhead on the table.

Also, since SHA1 will always be ASCII with six-character hexadecimal encoding, you should use CHAR(40) instead of VARCHAR() . This will speed up the process. Also, if the SHA1 index is indexed, use a single-byte character set instead of UTF8 to reduce the size of the index and speed it up.

+4
source

Source: https://habr.com/ru/post/1411123/


All Articles