C # Excel Interop Slow while passing through cells

Question

C # Excel Interop Slow while passing through cells

I am trying to extract all the text data from an Excel document in C # and I am having performance issues. In the following code, I open the Workbook, iterate over all the sheets and iterate over all the cells in the range used, extracting text from each cell when I go. The problem is that it takes 14 seconds to complete.

public class ExcelFile { public string Path = @"C:\test.xlsx"; private Excel.Application xl = new Excel.Application(); private Excel.Workbook WB; public string FullText; private Excel.Range rng; private Dictionary<string, string> Variables; public ExcelFile() { WB = xl.Workbooks.Open(Path); xl.Visible = true; foreach (Excel.Worksheet CurrentWS in WB.Worksheets) { rng = CurrentWS.UsedRange; for (int i = 1; i < rng.Count; i++) { FullText += rng.Cells[i].Value; } } WB.Close(false); xl.Quit(); } }

While in VBA, I would do something like this, which takes ~ 1 second:

 Sub run() Dim strText As String For Each ws In ActiveWorkbook.Sheets For Each c In ws.UsedRange strText = strText & c.Text Next c Next ws End Sub

Or, even faster (less than 1 second):

 Sub RunFast() Dim strText As String Dim varCells As Variant For Each ws In ActiveWorkbook.Sheets varCells = ws.UsedRange For i = 1 To UBound(varCells, 1) For j = 1 To UBound(varCells, 2) strText = strText & CStr(varCells(i, j)) Next j Next i Next ws End Sub

Perhaps something is happening in a for loop in C # that I don't know about? Is it possible to load a range into an array type object (as in my last example) to allow iteration of only values, not cell objects?

+7

performance c # excel office-interop

pwwolff Mar 04 '17 at 23:29

source share

4 answers

I am using this feature. The loops are intended only for conversion to an array starting at index 0, the main work is done in object[,] tmp = range.Value .

 public object[,] GetTable(int row, int col, int width, int height) { object[,] arr = new object[height, width]; Range c1 = (Range)Worksheet.Cells[row + 1, col + 1]; Range c2 = (Range)Worksheet.Cells[row + height, col + width]; Range range = Worksheet.get_Range(c1, c2); object[,] tmp = range.Value; for (int i = 0; i < height; ++i) { for (int j = 0; j < width; ++j) { arr[i, j] = tmp[i + tmp.GetLowerBound(0), j + tmp.GetLowerBound(1)]; } } return arr; }

+3

Antonín Lejsek Mar 05 '17 at 3:46

source share

I sympathize with you, pwwolff. Looping through Excel cells can be expensive. Antonio and Max are both correct, but John Woo's answer sums it up. Using a string builder can speed up the process and make an array of objects from your IMHO range used about as fast as you intend to use interop. I understand that there are other third-party libraries that may work better. A loop through each cell will take an unacceptable amount of time if the file is large using interop.

In the tests below, I used a book with a single sheet, in which the sheet has 11 columns and 100 rows of range data used. Using an implementation of an array of objects, this took a little over a second. With 735 rows, it took about 40 seconds.

I put 3 buttons on a form with a multi-line text box. The first button uses your code. The second button displays ranges from loops. The third button uses an array of objects approach. Each of them has a significant performance improvement over the other. I used a form text field to output data, you can use a string like you, but using a line builder would be better if you should have one big line.

Again, if the files are large, you may consider a different implementation. Hope this helps.

 private void button1_Click(object sender, EventArgs e) { Stopwatch sw = new Stopwatch(); MessageBox.Show("Start DoExcel..."); sw.Start(); DoExcel(); sw.Stop(); MessageBox.Show("End DoExcel...Took: " + sw.Elapsed.Seconds + " seconds and " + sw.Elapsed.Milliseconds + " Milliseconds"); } private void button2_Click(object sender, EventArgs e) { MessageBox.Show("Start DoExcel2..."); Stopwatch sw = new Stopwatch(); sw.Start(); DoExcel2(); sw.Stop(); MessageBox.Show("End DoExcel2...Took: " + sw.Elapsed.Seconds + " seconds and " + sw.Elapsed.Milliseconds + " Milliseconds"); } private void button3_Click(object sender, EventArgs e) { MessageBox.Show("Start DoExcel3..."); Stopwatch sw = new Stopwatch(); sw.Start(); DoExcel3(); sw.Stop(); MessageBox.Show("End DoExcel3...Took: " + sw.Elapsed.Seconds + " seconds and " + sw.Elapsed.Milliseconds + " Milliseconds"); } // object[,] array implementation private void DoExcel3() { textBox1.Text = ""; string Path = @"D:\Test\Book1 - Copy.xls"; Excel.Application xl = new Excel.Application(); Excel.Workbook WB; Excel.Range rng; WB = xl.Workbooks.Open(Path); xl.Visible = true; int totalRows = 0; int totalCols = 0; foreach (Excel.Worksheet CurrentWS in WB.Worksheets) { rng = CurrentWS.UsedRange; totalCols = rng.Columns.Count; totalRows = rng.Rows.Count; object[,] objectArray = (object[,])rng.Cells.Value; for (int row = 1; row < totalRows; row++) { for (int col = 1; col < totalCols; col++) { if (objectArray[row, col] != null) textBox1.Text += objectArray[row,col].ToString(); } textBox1.Text += Environment.NewLine; } } WB.Close(false); xl.Quit(); Marshal.ReleaseComObject(WB); Marshal.ReleaseComObject(xl); } // Range taken out of loops private void DoExcel2() { textBox1.Text = ""; string Path = @"D:\Test\Book1 - Copy.xls"; Excel.Application xl = new Excel.Application(); Excel.Workbook WB; Excel.Range rng; WB = xl.Workbooks.Open(Path); xl.Visible = true; int totalRows = 0; int totalCols = 0; foreach (Excel.Worksheet CurrentWS in WB.Worksheets) { rng = CurrentWS.UsedRange; totalCols = rng.Columns.Count; totalRows = rng.Rows.Count; for (int row = 1; row < totalRows; row++) { for (int col = 1; col < totalCols; col++) { textBox1.Text += rng.Rows[row].Cells[col].Value; } textBox1.Text += Environment.NewLine; } } WB.Close(false); xl.Quit(); Marshal.ReleaseComObject(WB); Marshal.ReleaseComObject(xl); } // original posted code private void DoExcel() { textBox1.Text = ""; string Path = @"D:\Test\Book1 - Copy.xls"; Excel.Application xl = new Excel.Application(); Excel.Workbook WB; Excel.Range rng; WB = xl.Workbooks.Open(Path); xl.Visible = true; foreach (Excel.Worksheet CurrentWS in WB.Worksheets) { rng = CurrentWS.UsedRange; for (int i = 1; i < rng.Count; i++) { textBox1.Text += rng.Cells[i].Value; } } WB.Close(false); xl.Quit(); Marshal.ReleaseComObject(WB); Marshal.ReleaseComObject(xl); }

+2

Johng Mar 05 '17 at 11:21

source share

One thing that will speed it up is to use StringBuilder instead of += in the previous line. Lines are immutable in C #, and so you create a ton of extra lines during the process of creating the final line.

In addition, you can improve the performance cycle by row, column position, rather than looping around the index.

Here is the code modified with StringBuilder and row positioning column:

 public class ExcelFile { public string Path = @"C:\test.xlsx"; private Excel.Application xl = new Excel.Application(); private Excel.Workbook WB; public string FullText; private Excel.Range rng; private Dictionary<string, string> Variables; public ExcelFile() { StringBuilder sb = new StringBuilder(); WB = xl.Workbooks.Open(Path); xl.Visible = true; foreach (Excel.Worksheet CurrentWS in WB.Worksheets) { rng = CurrentWS.UsedRange; for (int i = 1; i <= rng.Rows.Count; i++) { for (int j = 1; j <= rng.Columns.Count; j++) { sb.append(rng.Cells[i, j].Value); } } } FullText = sb.ToString(); WB.Close(false); xl.Quit(); } }

+1

Max weinzierl Mar 04 '17 at 23:39

source share

John wu · Accepted Answer · 2017-03-05T11:39:33+0000

Excel and C # work in different environments completely. C # runs in .NET using managed memory, while Excel is a native C ++ application and runs in unmanaged memory. Transferring data between the two (a process called “marshaling”) is extremely expensive in terms of performance.

Fine-tuning the code will not help. For loops, string building, etc. Everything is incredibly fast compared to the marshaling process. The only way to achieve better performance is to reduce the number of trips that must cross the boundary of the interprocess . Retrieving a data cell by cell will never lead you to the performance you want.

Here are a few options:

Write a sub or function in VBA that does whatever you want, then call that sub-function or function via interop. Walkthrough
Use interop to save the worksheet in a temporary file in CSV format, then open the file with C #. You will need to skip and parse the file to get it into a useful data structure, but this cycle will go much faster.
Use interop to save a range of cells to the clipboard, then use C # to read the clipboard directly.

C # Excel Interop Slow while passing through cells

More articles: