Bulk insert is the best way about this? + Helping me fully understand what I have found so far

So, I saw this post here and read it, and it looks like a bulk copy might be the way.

What is the best way to insert large databases from C #?

I still have questions and want to know how everything works.

So, I found 2 tutorials.

http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241

http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx

The first method uses 2 functions of ado.net 2.0. BulkInsert and BulkCopy. the second uses linq for sql and OpenXML.

This kind of call to me, since I use linq for sql already and prefer it through ado.net. However, as one person indicated in the messages, that he simply circumvented the problem due to the performance (nothing bad in my opinion)

First, I will talk about two ways in the first tutorial

I use VS2010 Express (for testing the tutorials that I used VS2008, and I don’t know which version of .net I just downloaded sample files there and ran them), net 4.0, MVC 2.0, SQl Server 2005

  • Is ado.net 2.0 the latest version?
  • Based on the technology I'm using, are there any updates to what I'm going to show that it will somehow improve?
  • Is there something that was not mentioned in this tutorial that I should know about?

BulkInsert

I use this table for all examples.

CREATE TABLE [dbo].[TBL_TEST_TEST] ( ID INT IDENTITY(1,1) PRIMARY KEY, [NAME] [varchar](50) ) 

SP code

 USE [Test] GO /****** Object: StoredProcedure [dbo].[sp_BatchInsert] Script Date: 05/19/2010 15:12:47 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER PROCEDURE [dbo].[sp_BatchInsert] (@Name VARCHAR(50) ) AS BEGIN INSERT INTO TBL_TEST_TEST VALUES (@Name); END 

C # code

 /// <summary> /// Another ado.net 2.0 way that uses a stored procedure to do a bulk insert. /// Seems slower then "BatchBulkCopy" way and it crashes when you try to insert 500,000 records in one go. /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241 /// </summary> private static void BatchInsert() { // Get the DataTable with Rows State as RowState.Added DataTable dtInsertRows = GetDataTable(); SqlConnection connection = new SqlConnection(connectionString); SqlCommand command = new SqlCommand("sp_BatchInsert", connection); command.CommandType = CommandType.StoredProcedure; command.UpdatedRowSource = UpdateRowSource.None; // Set the Parameter with appropriate Source Column Name command.Parameters.Add("@Name", SqlDbType.VarChar, 50, dtInsertRows.Columns[0].ColumnName); SqlDataAdapter adpt = new SqlDataAdapter(); adpt.InsertCommand = command; // Specify the number of records to be Inserted/Updated in one go. Default is 1. adpt.UpdateBatchSize = 1000; connection.Open(); int recordsInserted = adpt.Update(dtInsertRows); connection.Close(); } 

So, firstly, the lot size. Why would you set the batch size to everything except the number of records you send? For example, I send 500,000 records, so I made a packet size of 500,000.

Then, why does this happen when I do this? If I set it to 1000 for batch size, it will work fine.

 System.Data.SqlClient.SqlException was unhandled Message="A transport-level error has occurred when sending the request to the server. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.)" Source=".Net SqlClient Data Provider" ErrorCode=-2146232060 Class=20 LineNumber=0 Number=233 Server="" State=0 StackTrace: at System.Data.Common.DbDataAdapter.UpdatedRowStatusErrors(RowUpdatedEventArgs rowUpdatedEvent, BatchCommandInfo[] batchCommands, Int32 commandCount) at System.Data.Common.DbDataAdapter.UpdatedRowStatus(RowUpdatedEventArgs rowUpdatedEvent, BatchCommandInfo[] batchCommands, Int32 commandCount) at System.Data.Common.DbDataAdapter.Update(DataRow[] dataRows, DataTableMapping tableMapping) at System.Data.Common.DbDataAdapter.UpdateFromDataTable(DataTable dataTable, DataTableMapping tableMapping) at System.Data.Common.DbDataAdapter.Update(DataTable dataTable) at TestIQueryable.Program.BatchInsert() in C:\Users\a\Downloads\TestIQueryable\TestIQueryable\TestIQueryable\Program.cs:line 124 at TestIQueryable.Program.Main(String[] args) in C:\Users\a\Downloads\TestIQueryable\TestIQueryable\TestIQueryable\Program.cs:line 16 InnerException: 

The time taken to insert 500,000 records with a batch size of insert 1000 took "2 minutes and 54 seconds"

Of course, this is not an official time when I sat there with a stopwatch (I'm sure there are better ways, but it was too lazy to see what they were there)

This way, I find it slow compared to all my others (expect linq for sql to insert one) and I'm not quite sure why.

Then I looked at the bulk paper

 /// <summary> /// An ado.net 2.0 way to mass insert records. This seems to be the fastest. /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241 /// </summary> private static void BatchBulkCopy() { // Get the DataTable DataTable dtInsertRows = GetDataTable(); using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity)) { sbc.DestinationTableName = "TBL_TEST_TEST"; // Number of records to be processed in one go sbc.BatchSize = 500000; // Map the Source Column from DataTabel to the Destination Columns in SQL Server 2005 Person Table // sbc.ColumnMappings.Add("ID", "ID"); sbc.ColumnMappings.Add("NAME", "NAME"); // Number of records after which client has to be notified about its status sbc.NotifyAfter = dtInsertRows.Rows.Count; // Event that gets fired when NotifyAfter number of records are processed. sbc.SqlRowsCopied += new SqlRowsCopiedEventHandler(sbc_SqlRowsCopied); // Finally write to server sbc.WriteToServer(dtInsertRows); sbc.Close(); } } 

It seemed very fast and didn't even need SP (can you use SP with bulk copy? If possible, would it be better?)

BatchCopy had no problems with a batch size of 500,000. Again, why is this less than the number of records you want to send?

I found that with BatchCopy and a size of 500,000 batches, it took 5 seconds to complete . Then I tried with a batch size of 1000, and it took him 8 seconds .

So much faster than volumetric volume above.

Now I tried another tutorial.

 USE [Test] GO /****** Object: StoredProcedure [dbo].[spTEST_InsertXMLTEST_TEST] Script Date: 05/19/2010 15:39:03 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER PROCEDURE [dbo].[spTEST_InsertXMLTEST_TEST](@UpdatedProdData nText) AS DECLARE @hDoc int exec sp_xml_preparedocument @hDoc OUTPUT,@UpdatedProdData INSERT INTO TBL_TEST_TEST(NAME) SELECT XMLProdTable.NAME FROM OPENXML(@hDoc, 'ArrayOfTBL_TEST_TEST/TBL_TEST_TEST', 2) WITH ( ID Int, NAME varchar(100) ) XMLProdTable EXEC sp_xml_removedocument @hDoc 

C # code.

 /// <summary> /// This is using linq to sql to make the table objects. /// It is then serailzed to to an xml document and sent to a stored proedure /// that then does a bulk insert(I think with OpenXML) /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx /// </summary> private static void LinqInsertXMLBatch() { using (TestDataContext db = new TestDataContext()) { TBL_TEST_TEST[] testRecords = new TBL_TEST_TEST[500000]; for (int count = 0; count < 500000; count++) { TBL_TEST_TEST testRecord = new TBL_TEST_TEST(); testRecord.NAME = "Name : " + count; testRecords[count] = testRecord; } StringBuilder sBuilder = new StringBuilder(); System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder); XmlSerializer serializer = new XmlSerializer(typeof(TBL_TEST_TEST[])); serializer.Serialize(sWriter, testRecords); db.insertTestData(sBuilder.ToString()); } } 

So I like it because I get the opportunity to use objects, even if it's a little redundant. I do not understand how the joint venture works. As if I don’t understand anything. I don’t know if OPENXML has a batch insert under the hood, but I don’t even know how to take this SP example and modify it to fit my tables, since I said that I did not know what was going on.

I also don't know what will happen if an object has more tables. For example, I have a ProductName table that is related to the Product table or something like that.

In linq to sql, you can get the product name object and make changes to the Product table in the same object. Therefore, I am not sure how to take this into account. I'm not sure that I will have to do separate inserts or what.

The time was pretty good for 500,000 records, which took 52 seconds.

The last way, of course, is to simply use linq to do all this, and that was very bad.

 /// <summary> /// This is using linq to sql to to insert lots of records. /// This way is slow as it uses no mass insert. /// Only tried to insert 50,000 records as I did not want to sit around till it did 500,000 records. /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx /// </summary> private static void LinqInsertAll() { using (TestDataContext db = new TestDataContext()) { db.CommandTimeout = 600; for (int count = 0; count < 50000; count++) { TBL_TEST_TEST testRecord = new TBL_TEST_TEST(); testRecord.NAME = "Name : " + count; db.TBL_TEST_TESTs.InsertOnSubmit(testRecord); } db.SubmitChanges(); } } 

I made just 50,000 records and it took a minute.

So, I really narrowed it down to linq before bulk sql insert or bulk copy. I'm just not sure how to do this when you have a relationship anyway. I'm not sure how they get up when doing updates instead of inserts, as I have not tried to try them yet.

I don’t think that I will ever need to insert / update more than 50,000 records on one type, but at the same time I know that I will have to do validation on records before inserting to slow it down and makes linq to sql better than your objects, especially if your first parsing data is from an XML file before pasting into the database.

Full C # Code

 using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml.Serialization; using System.Data; using System.Data.SqlClient; namespace TestIQueryable { class Program { private static string connectionString = ""; static void Main(string[] args) { BatchInsert(); Console.WriteLine("done"); } /// <summary> /// This is using linq to sql to to insert lots of records. /// This way is slow as it uses no mass insert. /// Only tried to insert 50,000 records as I did not want to sit around till it did 500,000 records. /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx /// </summary> private static void LinqInsertAll() { using (TestDataContext db = new TestDataContext()) { db.CommandTimeout = 600; for (int count = 0; count < 50000; count++) { TBL_TEST_TEST testRecord = new TBL_TEST_TEST(); testRecord.NAME = "Name : " + count; db.TBL_TEST_TESTs.InsertOnSubmit(testRecord); } db.SubmitChanges(); } } /// <summary> /// This is using linq to sql to make the table objects. /// It is then serailzed to to an xml document and sent to a stored proedure /// that then does a bulk insert(I think with OpenXML) /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx /// </summary> private static void LinqInsertXMLBatch() { using (TestDataContext db = new TestDataContext()) { TBL_TEST_TEST[] testRecords = new TBL_TEST_TEST[500000]; for (int count = 0; count < 500000; count++) { TBL_TEST_TEST testRecord = new TBL_TEST_TEST(); testRecord.NAME = "Name : " + count; testRecords[count] = testRecord; } StringBuilder sBuilder = new StringBuilder(); System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder); XmlSerializer serializer = new XmlSerializer(typeof(TBL_TEST_TEST[])); serializer.Serialize(sWriter, testRecords); db.insertTestData(sBuilder.ToString()); } } /// <summary> /// An ado.net 2.0 way to mass insert records. This seems to be the fastest. /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241 /// </summary> private static void BatchBulkCopy() { // Get the DataTable DataTable dtInsertRows = GetDataTable(); using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity)) { sbc.DestinationTableName = "TBL_TEST_TEST"; // Number of records to be processed in one go sbc.BatchSize = 500000; // Map the Source Column from DataTabel to the Destination Columns in SQL Server 2005 Person Table // sbc.ColumnMappings.Add("ID", "ID"); sbc.ColumnMappings.Add("NAME", "NAME"); // Number of records after which client has to be notified about its status sbc.NotifyAfter = dtInsertRows.Rows.Count; // Event that gets fired when NotifyAfter number of records are processed. sbc.SqlRowsCopied += new SqlRowsCopiedEventHandler(sbc_SqlRowsCopied); // Finally write to server sbc.WriteToServer(dtInsertRows); sbc.Close(); } } /// <summary> /// Another ado.net 2.0 way that uses a stored procedure to do a bulk insert. /// Seems slower then "BatchBulkCopy" way and it crashes when you try to insert 500,000 records in one go. /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241 /// </summary> private static void BatchInsert() { // Get the DataTable with Rows State as RowState.Added DataTable dtInsertRows = GetDataTable(); SqlConnection connection = new SqlConnection(connectionString); SqlCommand command = new SqlCommand("sp_BatchInsert", connection); command.CommandType = CommandType.StoredProcedure; command.UpdatedRowSource = UpdateRowSource.None; // Set the Parameter with appropriate Source Column Name command.Parameters.Add("@Name", SqlDbType.VarChar, 50, dtInsertRows.Columns[0].ColumnName); SqlDataAdapter adpt = new SqlDataAdapter(); adpt.InsertCommand = command; // Specify the number of records to be Inserted/Updated in one go. Default is 1. adpt.UpdateBatchSize = 500000; connection.Open(); int recordsInserted = adpt.Update(dtInsertRows); connection.Close(); } private static DataTable GetDataTable() { // You First need a DataTable and have all the insert values in it DataTable dtInsertRows = new DataTable(); dtInsertRows.Columns.Add("NAME"); for (int i = 0; i < 500000; i++) { DataRow drInsertRow = dtInsertRows.NewRow(); string name = "Name : " + i; drInsertRow["NAME"] = name; dtInsertRows.Rows.Add(drInsertRow); } return dtInsertRows; } static void sbc_SqlRowsCopied(object sender, SqlRowsCopiedEventArgs e) { Console.WriteLine("Number of records affected : " + e.RowsCopied.ToString()); } } } 
+6
performance c # sql-server linq-to-sql
source share
3 answers

Lot size exists to reduce the impact of network latency. It should not be more than a few thousand. Several operators come together and are sent as a whole, so you get hit one network trip after every N operators, and not once for each operator.

+2
source share

Is this a one-off mass copy or regular?

If it is one or only once a day, for example, using BCP, it is much faster because it uses a special API that is faster than ado.net.

+1
source share

Oh, I'm glad we're not the only ones suffering from this problem InsertOnSubmit () .

Our company recently moved data centers, and then our SQL Server machines were located at a distance of 6,000 miles, and not in one country.

And suddenly, saving a batch of 1800 records took 3.5 minutes, not 3-4 seconds. Our users were not happy!

The solution was to replace InsertOnSubmit calls with the Bulk Insert library.

Read the section “Insert Records Using Bulk Insert” on this page. It shows (very few) changes that you really need to make to your code to get around this delayed issue.

You just need to add three lines of code and use a couple of C # classes that are presented on this page.

http://mikesknowledgebase.com/pages/LINQ/InsertAndDeletes.htm

0
source share

All Articles