So, I saw this post here and read it, and it looks like a bulk copy might be the way.
What is the best way to insert large databases from C #?
I still have questions and want to know how everything works.
So, I found 2 tutorials.
http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx#_Toc196622241
http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx
The first method uses 2 functions of ado.net 2.0. BulkInsert and BulkCopy. the second uses linq for sql and OpenXML.
This kind of call to me, since I use linq for sql already and prefer it through ado.net. However, as one person indicated in the messages, that he simply circumvented the problem due to the performance (nothing bad in my opinion)
First, I will talk about two ways in the first tutorial
I use VS2010 Express (for testing the tutorials that I used VS2008, and I don’t know which version of .net I just downloaded sample files there and ran them), net 4.0, MVC 2.0, SQl Server 2005
- Is ado.net 2.0 the latest version?
- Based on the technology I'm using, are there any updates to what I'm going to show that it will somehow improve?
- Is there something that was not mentioned in this tutorial that I should know about?
BulkInsert
I use this table for all examples.
CREATE TABLE [dbo].[TBL_TEST_TEST] ( ID INT IDENTITY(1,1) PRIMARY KEY, [NAME] [varchar](50) )
SP code
USE [Test] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER PROCEDURE [dbo].[sp_BatchInsert] (@Name VARCHAR(50) ) AS BEGIN INSERT INTO TBL_TEST_TEST VALUES (@Name); END
C # code
/// <summary> /// Another ado.net 2.0 way that uses a stored procedure to do a bulk insert. /// Seems slower then "BatchBulkCopy" way and it crashes when you try to insert 500,000 records in one go. /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx
So, firstly, the lot size. Why would you set the batch size to everything except the number of records you send? For example, I send 500,000 records, so I made a packet size of 500,000.
Then, why does this happen when I do this? If I set it to 1000 for batch size, it will work fine.
System.Data.SqlClient.SqlException was unhandled Message="A transport-level error has occurred when sending the request to the server. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.)" Source=".Net SqlClient Data Provider" ErrorCode=-2146232060 Class=20 LineNumber=0 Number=233 Server="" State=0 StackTrace: at System.Data.Common.DbDataAdapter.UpdatedRowStatusErrors(RowUpdatedEventArgs rowUpdatedEvent, BatchCommandInfo[] batchCommands, Int32 commandCount) at System.Data.Common.DbDataAdapter.UpdatedRowStatus(RowUpdatedEventArgs rowUpdatedEvent, BatchCommandInfo[] batchCommands, Int32 commandCount) at System.Data.Common.DbDataAdapter.Update(DataRow[] dataRows, DataTableMapping tableMapping) at System.Data.Common.DbDataAdapter.UpdateFromDataTable(DataTable dataTable, DataTableMapping tableMapping) at System.Data.Common.DbDataAdapter.Update(DataTable dataTable) at TestIQueryable.Program.BatchInsert() in C:\Users\a\Downloads\TestIQueryable\TestIQueryable\TestIQueryable\Program.cs:line 124 at TestIQueryable.Program.Main(String[] args) in C:\Users\a\Downloads\TestIQueryable\TestIQueryable\TestIQueryable\Program.cs:line 16 InnerException:
The time taken to insert 500,000 records with a batch size of insert 1000 took "2 minutes and 54 seconds"
Of course, this is not an official time when I sat there with a stopwatch (I'm sure there are better ways, but it was too lazy to see what they were there)
This way, I find it slow compared to all my others (expect linq for sql to insert one) and I'm not quite sure why.
Then I looked at the bulk paper
/// <summary> /// An ado.net 2.0 way to mass insert records. This seems to be the fastest. /// http://www.codeproject.com/KB/cs/MultipleInsertsIn1dbTrip.aspx
It seemed very fast and didn't even need SP (can you use SP with bulk copy? If possible, would it be better?)
BatchCopy had no problems with a batch size of 500,000. Again, why is this less than the number of records you want to send?
I found that with BatchCopy and a size of 500,000 batches, it took 5 seconds to complete . Then I tried with a batch size of 1000, and it took him 8 seconds .
So much faster than volumetric volume above.
Now I tried another tutorial.
USE [Test] GO /****** Object: StoredProcedure [dbo].[spTEST_InsertXMLTEST_TEST] Script Date: 05/19/2010 15:39:03 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER PROCEDURE [dbo].[spTEST_InsertXMLTEST_TEST](@UpdatedProdData nText) AS DECLARE @hDoc int exec sp_xml_preparedocument @hDoc OUTPUT,@UpdatedProdData INSERT INTO TBL_TEST_TEST(NAME) SELECT XMLProdTable.NAME FROM OPENXML(@hDoc, 'ArrayOfTBL_TEST_TEST/TBL_TEST_TEST', 2) WITH ( ID Int, NAME varchar(100) ) XMLProdTable EXEC sp_xml_removedocument @hDoc
C # code.
/// <summary> /// This is using linq to sql to make the table objects. /// It is then serailzed to to an xml document and sent to a stored proedure /// that then does a bulk insert(I think with OpenXML) /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx /// </summary> private static void LinqInsertXMLBatch() { using (TestDataContext db = new TestDataContext()) { TBL_TEST_TEST[] testRecords = new TBL_TEST_TEST[500000]; for (int count = 0; count < 500000; count++) { TBL_TEST_TEST testRecord = new TBL_TEST_TEST(); testRecord.NAME = "Name : " + count; testRecords[count] = testRecord; } StringBuilder sBuilder = new StringBuilder(); System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder); XmlSerializer serializer = new XmlSerializer(typeof(TBL_TEST_TEST[])); serializer.Serialize(sWriter, testRecords); db.insertTestData(sBuilder.ToString()); } }
So I like it because I get the opportunity to use objects, even if it's a little redundant. I do not understand how the joint venture works. As if I don’t understand anything. I don’t know if OPENXML has a batch insert under the hood, but I don’t even know how to take this SP example and modify it to fit my tables, since I said that I did not know what was going on.
I also don't know what will happen if an object has more tables. For example, I have a ProductName table that is related to the Product table or something like that.
In linq to sql, you can get the product name object and make changes to the Product table in the same object. Therefore, I am not sure how to take this into account. I'm not sure that I will have to do separate inserts or what.
The time was pretty good for 500,000 records, which took 52 seconds.
The last way, of course, is to simply use linq to do all this, and that was very bad.
/// <summary> /// This is using linq to sql to to insert lots of records. /// This way is slow as it uses no mass insert. /// Only tried to insert 50,000 records as I did not want to sit around till it did 500,000 records. /// http://www.codeproject.com/KB/linq/BulkOperations_LinqToSQL.aspx /// </summary> private static void LinqInsertAll() { using (TestDataContext db = new TestDataContext()) { db.CommandTimeout = 600; for (int count = 0; count < 50000; count++) { TBL_TEST_TEST testRecord = new TBL_TEST_TEST(); testRecord.NAME = "Name : " + count; db.TBL_TEST_TESTs.InsertOnSubmit(testRecord); } db.SubmitChanges(); } }
I made just 50,000 records and it took a minute.
So, I really narrowed it down to linq before bulk sql insert or bulk copy. I'm just not sure how to do this when you have a relationship anyway. I'm not sure how they get up when doing updates instead of inserts, as I have not tried to try them yet.
I don’t think that I will ever need to insert / update more than 50,000 records on one type, but at the same time I know that I will have to do validation on records before inserting to slow it down and makes linq to sql better than your objects, especially if your first parsing data is from an XML file before pasting into the database.
Full C # Code
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml.Serialization; using System.Data; using System.Data.SqlClient; namespace TestIQueryable { class Program { private static string connectionString = ""; static void Main(string[] args) { BatchInsert(); Console.WriteLine("done"); }