Improve INSERT INTO - FROM SELECT, SQL Query

Question

Improve INSERT INTO - FROM SELECT, SQL Query

Currently, I got this type of request generated by a program (C #)

INSERT INTO TableName (Field1, Field2, Field3) SELECT Field1, Field2, Field3 FROM TableName2

The problem is that SELECT can have the result of many records (for example, a million), so it takes many times, and the result is a connection timeout.

Also, if I separate all the inserts into one insert (for this example, a million insert requests), the execution takes a lot of time ... but it works ...

Is there a way to improve this type of request?

I am using MSSQl 2005

thanks

+6

c # sql sql-server insert

Melursus Feb 12 '09 at 15:05

source share

10 answers

Well, if this is a complete copy, I wonder if you should look at the bulk upload tools.

BULK INSERT (TSQL)
SqlBulkCopy (.NET)
bcp (command line)
etc.

If you had a Where clause, I would check that it was indexed appropriately ...

Additionally:

it is possible to reset indexes and triggers before executing INSERT (recreate subsequently)
~~consider deleting the whole table and using SELECT INTO?~~ (see comments)

+6

Marc gravell Feb 12 '09 at 15:08

source share

Well, there are a few fundamental issues.

I \ O - Insert into a table when reading from another table is likely to be called by the disk if the tables are not on separate disks. Place opposite tables on physically different spindles.
Transaction log You need to make sure that your transaction log is located on its own disk or works with smaller transactions (several thousand lines at a time) or uses BCP \ Bulk Insert, which is not logged.
Clustered indexes. If you insert all of these rows into the target table, and the clustered index (physical order data is written to the disk) is not written sequentially, the requirements for the disk ID go through the roof because of page breaks and redistribution. An easy solution would be to create a clustered index in the recipient table, which is a sequential seed key. This usually ensures that you get a sequential write to the table and almost always at the end.
File extension. Make sure you have SQL installed to expand its files at an acceptable speed, for example, 10%. Otherwise, he will have to constantly resize files and reset the disk. There are ways to prevent it from also resetting the disk, for example, enabling mass file management permissions in your group policies for a Sql service user.

Quite frankly, apart from this and a few other suggestions, it is unlikely that you will make the insert with millions of rows in a transaction really fast. If you did this through Bulk Insert, it would be significantly faster, although perhaps this is not what you need from the point of view of the application.

+3

Brian rudolph Feb 12 '09 at 19:07

source share

Set the CommandTimeout SqlCommand property that you are using with a reasonable value (10 minutes or something else). Remember that CommandTimeout is in seconds.

+2

Anton Gogolev Feb 12 '09 at 15:09

source share

Some good answers here.

Also how to add that if you have indexes in the destination table, they slow down. However, index recovery can sometimes take a long time if you create a drop drop technique.

If you don't want to drop indexes, use ORDER BY in SELECT , which matches the target cluster index, this seems to help (maybe helps minimize page breaks).

+1

Cade roux Feb 12 '09 at 16:31

source share

You do not indicate which problem you are solving with this approach. Obviously, WHERE will narrow down the set of records. But if the result set is not changed in the new table, then why replicate the data at all? Why not query directly from the source?

0

SAMills Feb 12 '09 at 15:11

source share

either bulk download using a file and then bcp / BULK INSERT or a batch package in batches of 5K or so

0

SQLMenace Feb 12 '09 at 15:11

source share

First, never try to insert a million records through C #. Never process large groups of records one at a time. This is the work that needs to be done in a database by database. To do this, use the embedded insert or SSIS or DTS. And then plan it as work after hours. If it still takes too much time, then I suggest you run it in several batches (you have to play with your own database to see which is the best choice, since the number that you can safely handle is highly dependent on the tables, indexing how fast your server is and how many users are also trying to work with the same tables.

0

Hlgem Feb 12 '09 at 15:13

source share

Another way we used in the past is to create a temporary table with the primary keys that we want to move, and use a while loop. Thus, you can do this in some block mode to avoid large transaction costs if you canceled it and it had to roll back.

Basically what you do is insert into tablename (...) select (...) from the name of the table where the primary key is used (select the top key 10000 from the tempted one)

the top 10000 you want in the secondary result set so that you can remove them from the temp table so that they are not processed again.

Another way is to use cursors to reduce the number of records that you are processing at the same time.

Another loop method would be to do something similar in a while loop.

declare @stop as int set @stop = (select count (primaryKey) from tableName, where primaryKey is not in destinstiontable)

while (@stop> 0) start a transaction

Paste into destinationTable (...) select (...) from sourcetable where primaryKey is not in (select primarykey from destination table)

commit

set @stop = (select count (primaryKey) from tableName, where primaryKey is not in destinstiontable) end

Not the most efficient, but it will work and should allow you to keep a transaction log. If you don’t need it, make sure you use the no lock keyword to not block other transactions when doing this big move (unless you use BCP or DTS, because they are much faster).

Some of what has been said is probably your best bet. Use a BCP, DTS, or some other volumetric instrument. If you can refuse indexes, it will accelerate work.

0

Joshua cauble Feb 12 '09 at 16:26

source share

Have you tested sql server studio through sql server to find out how long it takes? I would start there. You can improve the performance of choice. And you can improve performance by using tab hints on the table you paste into.

0

Will rickards Feb 12 '09 at 19:19

source share

Frederik gheysels · Accepted Answer · 2009-02-12T15:10:24+0000

I found out that if you have many INSERT statements that are executed sequentially, you can improve performance by adding a "GO" statement after each xxxx number of insert statements:

 ... INSERT INTO Table ( ... ) VALUES ( ... ) INSERT INTO Table ( ... ) VALUES ( ... ) INSERT INTO Table ( ... ) VALUES ( ... ) GO INSERT INTO Table ( ... ) VALUES ( ... ) INSERT INTO Table ( ... ) VALUES ( ... ) ...

Perhaps another possibility is to make sure your INSERT INTO .. SELECT FROM query doesn't insert everything all at once, instead it uses some kind of swap technique:

 INSERT INTO Table ... SELECT ... FROM OtherTable WHERE Id > x and Id < y

Improve INSERT INTO - FROM SELECT, SQL Query

More articles: