How to improve INSERT performance on a very large MySQL table

I am working on a large MySQL database and I need to improve the performance of INSERT in a specific table. It contains about 200 million lines, and its structure is as follows:

(small premise: I'm not a database expert, so the code I wrote may be based on the wrong basics. Please help me sort out my mistakes :))

CREATE TABLE IF NOT EXISTS items ( id INT NOT NULL AUTO_INCREMENT, name VARCHAR(200) NOT NULL, key VARCHAR(10) NOT NULL, busy TINYINT(1) NOT NULL DEFAULT 1, created_at DATETIME NOT NULL, updated_at DATETIME NOT NULL, PRIMARY KEY (id, name), UNIQUE KEY name_key_unique_key (name, key), INDEX name_index (name) ) ENGINE=MyISAM PARTITION BY LINEAR KEY(name) PARTITIONS 25; 

Every day I get a lot of csv files in which each row consists of a pair of "name; key", so I have to parse these files (adding the created_at and updated_at values ​​for each row) and paste the values ​​into my table. In this case, the combination of "name" and "key" MUST be UNIQUE, so I implemented the insert procedure as follows:

 CREATE TEMPORARY TABLE temp_items ( id INT NOT NULL AUTO_INCREMENT, name VARCHAR(200) NOT NULL, key VARCHAR(10) NOT NULL, busy TINYINT(1) NOT NULL DEFAULT 1, created_at DATETIME NOT NULL, updated_at DATETIME NOT NULL, PRIMARY KEY (id) ) ENGINE=MyISAM; LOAD DATA LOCAL INFILE 'file_to_process.csv' INTO TABLE temp_items FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' (name, key, created_at, updated_at); INSERT INTO items (name, key, busy, created_at, updated_at) ( SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at FROM temp_items ) ON DUPLICATE KEY UPDATE busy=1, updated_at=NOW(); DROP TEMPORARY TABLE temp_items; 

The code just shown allows me to achieve my goal, but it takes about 48 hours to complete the execution, and this is a problem. I think this low performance is caused by the fact that the script has to check a very large table (200 million rows) and for each insert that the pair "name; key" is unique.

How can I improve the performance of my script?

Thanks to everyone in advance.

+7
performance database mysql insert insert-update
source share
4 answers

Your linear key by name and large indexes slow down.

LINEAR KEY need to calculate each insert. http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html

can you show us some examples of file_to_process.csv data, it might be better to build a schema.

Change looked more carefully

 INSERT INTO items (name, key, busy, created_at, updated_at) ( SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at FROM temp_items ) 

this will create a temporary disk table, it is very slow, so you should not use it to improve performance or maybe you need to check some mysql configuration settings, such as tmp-table-size and max-heap-table- the size may be misconfigured.

+2
source share

The following methods can be used to speed up inserts:

  • If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert multiple rows at a time. This is significantly faster (in many cases faster) than using separate single-line INSERT statements. If you are adding data to a non-empty table, you can set the variable bulk_insert_buffer_size to make data insertion even faster.

  • When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.

  • Take advantage of the fact that columns have default values. Insert values ​​explicitly only when the value to be inserted is different from the default value. This reduces the parsing that MySQL should perform and improves the insertion speed.

+1
source share

There is a piece of documentation that I would like to indicate, INSERT Statement Speed .

0
source share

you can use

 load data local infile '' REPLACE into table 

etc...

REPLACE ensures that any duplicate value is overwritten with new values. Add SET updated_at=now() to the end and you're done.

No need for a temporary table.

-2
source share

All Articles