How to set DataAdapter.UpdateBatchSize to the "optimal" value?

I finally got my insert batch, and now I am working on the package size, but I donโ€™t see any performance difference between the value 50 and the value 10000. It seems very strange to me, but I donโ€™t know what is going on behind the scenes, so that this may be normal behavior.

I insert 160 thousand rows into the table, and the average time for my tested values โ€‹โ€‹is 115 +/- 2 seconds. Without batch processing, it takes 210 seconds, so I'm quite happy with the improvement. Target table:

CREATE TABLE [dbo].[p_DataIdeas]( [wave] [int] NOT NULL, [idnumber] [int] NOT NULL, [ideaID] [int] NOT NULL, [haveSeen] [bit] NOT NULL CONSTRAINT [DF_p_DataIdeas_haveSeen] DEFAULT ((0)), CONSTRAINT [PK_p_DataIdeas] PRIMARY KEY CLUSTERED ( [wave] ASC, [idnumber] ASC, [ideaID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON ) ON [PRIMARY] ) ON [PRIMARY] 

I am reading What to look for when setting up UpdateBatchSize , and the answer was just to test several different values. I can understand this, but should you not count or at least evaluate a good value if you know the table design, the SQL question and the data that needs to be inserted?

Are there any best practices that anyone can recommend?

+6
performance
source share
3 answers

You can see the effect of batch processing by looking at SQL Profiler or by calling SqlConnection.RetrieveStatistics() . What you should see is that each batch corresponds to one circuit in the database.

As far as batch size is optimized, a very rough rule is that productivity tends to stop improving with batch sizes above 50 - in fact, sometimes large batches can work slower than smaller ones. If I'm too busy to check, I usually start with a batch of about 20 (unless I use table options where batches of up to 500 can be faster than smaller ones). However, the optimal number depends on such things as the total size of the attachments (all of them will fit into RAM), how fast are the disks on which your database log is located, regardless of whether the log is on the / LUN disk (for example, high cost of perforation, if it is not), etc.

The achievable speed is usually limited by the number of rounded rides, then the transaction size, and then registers the speed of the disk (in particular, is it possible to access sequentially or if it was random due to competition with other files on the same spindles), and finally RAM. However, all factors are also somewhat related.

The first step in improving the performance of your inserts will be to execute them in transactions โ€” perhaps one transaction โ€” each batch or two. Other than that, table parameters are probably the next step, using a stored procedure with an INSERT INTO Table SELECT column FROM @TableArgument .

+5
source share

Although changing UpdateBatchSize will help to some extent, the basic approach of using a DataAdapter to update a large number of records will be slow. This is because each separate row of the DataAdapter will create a separate SQL statement (insert, update or delete). UpdateBatchSize only affects how many of these individual statements are sent in the same TSQL package when sent to SQL Server.

To get much greater performance improvements, you want SQLServer to insert / update / delete many records in a single expression (usually using some kind of JOIN). Tabular parameters (as mentioned by RickNZ) is one way to do this. Another possibility is to use SqlBulkCopy (although you usually need to use a staging table for this).

+1
source share

Make sure that an active transaction will also significantly improve performance (about 30 times in my tests using the MysqlDataAdapter).

0
source share

All Articles