I am trying to extract all the text data from an Excel document in C # and I am having performance issues. In the following code, I open the Workbook, iterate over all the sheets and iterate over all the cells in the range used, extracting text from each cell when I go. The problem is that it takes 14 seconds to complete.
public class ExcelFile { public string Path = @"C:\test.xlsx"; private Excel.Application xl = new Excel.Application(); private Excel.Workbook WB; public string FullText; private Excel.Range rng; private Dictionary<string, string> Variables; public ExcelFile() { WB = xl.Workbooks.Open(Path); xl.Visible = true; foreach (Excel.Worksheet CurrentWS in WB.Worksheets) { rng = CurrentWS.UsedRange; for (int i = 1; i < rng.Count; i++) { FullText += rng.Cells[i].Value; } } WB.Close(false); xl.Quit(); } }
While in VBA, I would do something like this, which takes ~ 1 second:
Sub run() Dim strText As String For Each ws In ActiveWorkbook.Sheets For Each c In ws.UsedRange strText = strText & c.Text Next c Next ws End Sub
Or, even faster (less than 1 second):
Sub RunFast() Dim strText As String Dim varCells As Variant For Each ws In ActiveWorkbook.Sheets varCells = ws.UsedRange For i = 1 To UBound(varCells, 1) For j = 1 To UBound(varCells, 2) strText = strText & CStr(varCells(i, j)) Next j Next i Next ws End Sub
Perhaps something is happening in a for loop in C # that I don't know about? Is it possible to load a range into an array type object (as in my last example) to allow iteration of only values, not cell objects?
performance c # excel office-interop
pwwolff
source share