Slow MySQL Inserts

I use and work on software that uses MySQL as an internal server (it can use others like PostgreSQL or Oracle or SQLite, but this is the main application that we use). The software was designed so that the binary data that we want to receive is stored as BLOB in separate columns (each table has one BLOB column, the other columns have integer / float for the BLOB characteristic and one row column with a BLOB MD5 hash). Tables usually have 2, 3, or 4 indexes, one of which is always an MD5 column that is made by UNIQUE . Some tables already have millions of records, and they have entered a size of several gigabytes. We store separate MySQL databases in one year on one server (so far). The hardware is reasonably reasonable (I think) for general applications (Dell PowerEdge 2U-form server).

MySQL SELECT queries are relatively fast. There is a small complaint, since they (most of the time) are in batch mode. However, INSERT queries take a lot of time, which increases with the size of the table (the number of rows). Admittedly, this is because the MD5 column is of type UNIQUE , and therefore each INSERT needs to find out if each new row has a corresponding, already inserted MD5 row. And this is not too strange (I think) if performance worsens, if there are other indexes (not unique). But I still can’t calm down that choosing this software architecture (I suspect storing BLOBs in a table rather than a disk has a significant negative impact) is not the best choice. The inserts are not critical, but it is an annoying feeling.

Does anyone have experience in such situations? With MySQL or even with other (preferably based on Linux) RDBMes? Any ideas you would like to provide, perhaps some performance indicators?

BTW, the working language is C ++ (which transfers C calls to the MySQL API).

+7
performance database mysql insert indexing
source share
5 answers

It may be time to horizontally split and move the blob field into a separate table. In this article, in the section β€œQuick Link to Vertical Separation,” the author removes the larger varchar field from the table and increases the query speed by about an order of magnitude.

The reason is that physical crawl of data on a disk becomes much faster if there is less space for coverage, so moving large fields to other places improves performance.

Also (and you probably already did) it is useful to reduce the size of the index column to its absolute minimum (char (32) in ascii encoding for md5), since the key size is directly proportional to the speed of its use.

If you make multiple attachments at the same time as InnoDB tables, you can significantly increase the speed of attachments by nesting them in a transaction and making mupliple inserts in one query:

 START TRANSACTION INSERT INTO x (id, md5, field1, field2) values (1, '123dab...', 'data1','data2'),(2,'ab2...','data3','data4'),.....; COMMIT 
+10
source share

See Speed ​​of INSERT statements . Do you have frequent MD5 collisions? I believe this should not happen too many times, so maybe you can use something like INSERT ... ON DUPLICATE to handle collisions. If you have specific insertion periods, you can disable the keys at insertion time and restore them later. Another option is to use replication , using a master machine for insertions and a subordinate for choosing.

+5
source share

Do you use MyISAM?
AFAIK MyISAM has very good read performance, but it does not work well.

InnoDB must be speed balanced.

+1
source share

Does your data match in RAM? If not, get more RAM until it becomes uneconomical (16G is usually suitable for most people).

Then are your indexes executed in the MyISAM key buffer?

If you are using a 32-bit OS, do not do this. After you are on a 64-bit OS, set the buffer buffer to about 1/3 of the bar. RAM is used by the OS cache to cache data files (which is small for insertions, but useful for selection).

Having tables with several gigabytes in MyISAM can be painful, because in case of an unclean shutdown, a very long repair operation is required, but

Do not switch MySQL servers without significant testing of your application, it will change the behavior in different ways (and not just on performance). This will affect disk usage.

+1
source share

Today I asked a few related question .

One of the answers received is to consider INSERT DELAYED so that it gets into the insert queue and is processed when db is not busy.

+1
source share

All Articles