Strategy to optimize this big SQL insert through C #?

I have about 1.5 million files that I need to insert into the database. Each entry is inserted with a key that includes the file name.

Catch: Currently, files are not uniquely identified.

So what we would like to do for each file:

  • Paste the entry. One of the fields in the record should include the Amazon key S3, which should include the identifier of the newly inserted record.
  • Rename the file so that it includes an identifier so that it matches the key format.

The best I can do is:

  • Run a separate insert command that returns the identifier of the added row.
  • Add this back as a property to the individual business object that I am viewing.
  • Generate an update statement that updates the S3 key to include the identifier
  • Output the file, concatenate the identifier at the end of the file name.

How can I say it looks like this:

  • 1.5 million insert statements
    • individual execution of SqlCommand and reading, because we need an identifier),
  • 1.5 million times sets the property of an object.
  • 1.5 million update statements generated and executed
    • Perhaps this could become one giant concatenated update statement to do everything at once; not sure if this helps
  • 1.5 million copies of files.

I can't get around the actual part of the file, but for the rest, is there a better strategy that I don't see?

+7
c # file sql-server insert sqlcommand
source share
2 answers

If you create a client application for generating identifiers, you can use the straightforward SqlBulkCopy to insert all the rows at once. This will be done in seconds.

If you want to keep the IDENTITY property in a column, you can run DBCC CHECKIDENT(RESEED) to increase the personal data counter by 1.5 meters to give you a guaranteed gap that you can insert. If the number of rows is not known statically, you can paste into smaller chunks, possibly 100k, until you finish.

+3
source share

You will halve the number of SQL statements without relying on the database to generate your identifier for each row. Do everything locally (including assigning an identifier), and then do one batch of inserts at the end with identity_insert on .

This will force SQL Server to use your identifier for this batch of records.

If it is still too slow (and there may be 1.5 million insertions), the next step is to output your data to a text file (XML, comma or other), and then perform the bulk import operation to the file.

It's as fast as you can do it, I think.

+1
source share

All Articles