Change SQL insert speed

script I am working on updating a database table that records the country of use and the status of all IP addresses (or almost all of them). Currently, I save it simply and receive only data from 5 RIRs (regional Internet registries) and save it in my database.

The speeds were initially impractical, but they improved significantly, reducing the amount of information in the log and grouping SQL inserts into groups of 1000 and using one query. However, when I run the script, now I get very large variations in the speed of SQL insertion, and I was wondering if anyone knows why.

Here are some of the speeds I recorded. In the test, I highlighted the time taken to execute the script iterations in PHP, and the time taken to use the sql operator, I did not include the PHP times in the list below, since the effect was not significant; no more than 1 second even for the largest data blocks.

Speed โ€‹โ€‹test (the number of inserted data rows remains unchanged)

Test 1 Total SQL Runtime: 33 seconds

Test 2 Total SQL Runtime: 72 seconds

Test 3 Total SQL Runtime: 78 seconds

Other tests continued to range from ~ 30 seconds to ~ 80 seconds.

I have two questions:

1) Do I have to accept these inconsistencies as the way of the world, or is there a reason for them?

2) I was nervous because you inserted rows of 185,000 rows in one query. Is there a reason why I should avoid using a single query for these inserts? I did not work with this amount of data that was saved at a time earlier.

thanks

__

The database table is as follows.

Sorage Engine - InnoDB

Columns:

id - int, primary key

registry - varchar (7)

code - varchar (2)

type - varchar (4)

start - varchar (15)

value - int

date - datetime

status - varchar (10)

+7
source share
1 answer
1) Should I accept these disparities as the way of the world, or is there a reason for them? 

Variations in speed may be related to competing processes using disk-IO - so we expect resources. If it is a production server, and not a lone test server, then, of course, some other processes request disk access.

 2) I felt nervous about lumping the ~185000 row inserts into one query. Is there any reason I should avoid using one query for these inserts? I've not worked with this amount of data being saved at one time before. 

You should also split the inserts into groups of X inserts and insert each group as a transaction.

It is difficult to determine the value of X in any other way than the experimental one.

Grouping the inserts in the transaction ensures that the data is written (committed) to the disk only after each transaction and not after each (automatic commit) insertion.

This has a good effect on disk I / O, and if you group many attachments into one transaction, this can have a bad effect on available memory. If the amount of uncommitted data is too large for the current memory, the DBMS will start writing data to the internal log (on disk).

Thus, X depends on the number of inserts, the amount of data associated with each insert, the allowed memory / user / session parameters. And much more.


There are some interesting (free) tools from percona . They help track database activity.

You can also watch vmstat watch -n.5 'vmstat'

See the amount and change of data written to disk as a result of the actions of the production environment.

Run your script and wait until you notice a step in the number of bytes written to disk. If you write a step up, this is largely a constant value (above normal use), then it cheats and replaces, if it is rhythmic, then it writes only for commits.

+3
source

All Articles