Need advice on pressing an envelope using SqlBulkCopy on SQL Server

I am developing an application, one of the aspects of which is that it should receive huge amounts of data into the SQL database. I designed the database structure as a separate table with the bigint identifier, something like this:

CREATE TABLE MainTable ( _id bigint IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, field1, field2, ... ) 

I will omit how I intend to fulfill requests, as this is not relevant to the question I have.

I wrote a prototype that inserts data into this table using SqlBulkCopy. He seemed to be working fine in the lab. I was able to insert tens of millions of records at a speed of ~ 3K records / sec (the full record in itself is quite large, ~ 4K). Since the only index in this table is bigint auto-increment, I did not see a slowdown even after a significant number of rows were pressed.

Given that the lab SQL server was a relatively weak configuration virtual machine (4Gb RAM shared with another ssbsystem for virtual machines), I expected to get significantly better throughput on the physical machine, but this did not happen, or say, an increase performance was negligible. I could maybe get 25% faster than inserting into a physical machine. Even after I configured the 3-drive RAID0, which ran 3 times faster than a single drive (measured by software for comparison), I did not get an improvement. Basically: a faster control subsystem, a dedicated physical processor, and dual RAM were practically not translated into any increase in performance.

Then I repeated the test using the largest instance on Azure (8 cores, 16Gb), and I got the same result. Thus, adding more cores did not change insertion speed.

At this time, I played with the following software options without a significant performance boost:

  • Change SqlBulkInsert.BatchSize
  • Insert from multiple threads at the same time and configure # threads
  • Using the table lock option in SqlBulkInsert
  • Eliminate network latency by pasting from a local process using a shared memory driver

I am trying to increase performance by at least 2–3 times, and my original idea was that dropping more hardware would be done, but for now it’s not.

So can someone recommend me:

  • What resource can be suspected in a bottleneck? How to confirm?
  • Is there a methodology that I could try to get reliably scalable to improve bulk insertion, given the existence of a single SQL server system?

UPDATE I am sure that downloading the application is not a problem. It creates an entry in the time queue in a separate thread, so when there is an insert, it looks like this (simplified):

 ===>start logging time int batchCount = (queue.Count - 1) / targetBatchSize + 1; Enumerable.Range(0, batchCount).AsParallel(). WithDegreeOfParallelism(MAX_DEGREE_OF_PARALLELISM).ForAll(i => { var batch = queue.Skip(i * targetBatchSize).Take(targetBatchSize); var data = MYRECORDTYPE.MakeDataTable(batch); var bcp = GetBulkCopy(); bcp.WriteToServer(data); }); ====> end loging time 

timings are logged, and the part that creates the queue never takes a significant fragment

UPDATE2 I implemented a collection of how long each operation in this loop takes, and the layout looks like this:

  • queue.Skip().Take() - negligible
  • MakeDataTable(batch) - 10%
  • GetBulkCopy() - Minor
  • WriteToServer(data) - 90%

UPDATE3 I am developing a standard version of SQL, so I cannot rely on partitioning because it is only available in the Enterprise version. But I tried a variant of the splitting scheme:

  • 16 filegroups were created (from G0 to G15),
  • only 16 tables are executed for insertion only (from T0 to T15), each of which is tied to a separate group. Tables have no indexes at all, not even clustered int identifiers.
  • streams that insert data will cycle through all 16 tables. This makes almost an assurance that each bulk insert operation uses its own table.

This gave a ~ 20% improvement in volume insertion. The core CPU, LAN interface, drive I / O were not maximized and were used at about 25% of maximum capacity.

UPDATE4 I think that now he is as good as he is. I was able to push the inserts at reasonable speeds using the following methods:

  • Each bulk insert goes into its own table, then the results are combined into the main
  • Tables are recreated fresh for each bulk insert, table locks are used.
  • The used implementation of IDataReader is here instead of DataTable.
  • Bulk inserts made from multiple clients
  • Each client accesses SQL using a separate gigabit VLAN
  • Extraneous processes accessing the main table use the NOLOCK option
  • I reviewed sys.dm_os_wait_stats and sys.dm_os_latch_stats to eliminate claims

I have a hard time deciding at this point who gets credit for the answer question. For those of you who have not received a “response,” I apologize, it was a very difficult decision, and I thank you all.

UPDATE5 . The following element may use some optimization:

  • The used implementation of IDataReader is here instead of DataTable.

If you do not run your program on a machine with a massive processor core count, it may use some re-factoring. Since it uses reflection to generate get / set methods, this becomes the main processor load. If performance is key, it adds a lot of performance when you manually code the IDataReader to compile it instead of using

+4
source share
3 answers

For recommendations on tuning SQL Server for bulk workloads, see the MS Data Download and Performance Guide , and Recommendations for optimizing bulk import from books on the Internet. Although they focus on bulk uploads from SQL Server, most of the recommendations relate to bulk uploads using the client API. These documents are related to SQL 2008 - you are not saying which version of SQL Server you are planning. Both have quite a lot of information worth discussing in detail. However, some key points:

  • Minimum register mass operation. Use bulk or simple recovery. You may need to enable traceflag 610 (but see Disclaimers on doing this)
  • Batch size setting
  • Consider splitting the target table
  • Consider dropping indices during bulk loading

This flowchart provides an overview of the Download and Data Performance Guide : enter image description here

As others have said, you need to get some watch counters to establish the source of the bottleneck, as your experiments show that IO may not be a limitation. The data loading and performance guide contains a list of SQL wait types and performance counters for monitoring (there are no links in the document for binding. It is about 75% in the document, in the section "Mass load optimization")

UPDATE

It took me a while to find the link, but Thomas Keyser’s SQLBits conversation is also worth a look - slides are available if you don’t have time to watch it all. He repeats part of the material related here, but also covers several other suggestions on how to deal with the high frequencies of specific performance counters.

+4
source

You seem to have done a lot, but I'm not sure that you have a chance to study the Alberto Ferrari SqlBulkCopy Performance Analysis report, which describes several factors for considering performance related to SqlBulkCopy. I would say that many of the things discussed in this article are still worth trying first.

+2
source

I'm not sure why you are not getting 100% CPU, IO or memory usage. But if you just want to improve your download speed in bulk, here are some things to keep in mind:

  • Separate the data file into different files. Or, if they come from different sources, just create different data files.
  • Then do multiple volume inserts at the same time.

Depending on your situation, the above may not be possible; but if you can, then I'm sure this should improve your download speed.

+1
source

All Articles