Open XML SDK v2.0 Performance issue when deleting the first row in 20,000 + lines of an Excel file

Does anyone encounter a performance issue when deleting the first row in an Excel file larger than 20,000 rows using the OpenXML SDK v2.0?

I am using the delete string encoding proposed in the Open XML SDK document. I need a few minutes to delete the first row using the Open XML SDK, but in Excel, it only takes a second.

In the end, I found out that the bottleneck is actually based on a bubble approach when working with row removal. After the deleted row, many rows are updated. Therefore, in my case, about 20,000 rows are updated, which leads to data shift by row.

I wonder if there is a faster way to do a row deletion.

Anyone have an idea?

+7
performance excel openxml openxml-sdk
source share
2 answers

Well, the bad news is here: yes, the way it is.

You can get a little better performance by moving outside the SDK itself, in System.IO.Packaging and just creating IEnumerable / List as Linq-to-XML of all the lines, copy it to the new IEnumerable / List without the first line, rewrite the attribute r <row r="?"/> so that it is placed in the index, and the record that is inside the <sheetData/> by existing child elements.

You will need to do the same for any lines in the sharedStrings.xml file, that is, delete the <ssi>.<si> elements that were on the deleted line, but in this case they are now implicitly indexed, so you can leave by simply deleting them.

+4
source share

The approach to unpacking a file, manipulating it and repacking it is very difficult.

How about this: if you say that it works fine in Excel: have you tried using Interop? This will start a new instance of Excel (visible or invisible), then you can open the file, delete the line, save and close the application again.

 using System; using System.IO; using Microsoft.Office.Interop.Excel; using Excel = Microsoft.Office.Interop.Excel; public void OpenAndCloseExcel() { Excel.Application excelApp = new Excel.Application(); // Open Workbook, open Worksheet, delete line, Save excelApp.Quit(); } 

The Range object is for many purposes. Also to remove items. Take a look at: MSDN Range-Description . Another hint: Interop uses Excel, so all objects should be addressed with an index based on 1! For more resources, check out https://stackoverflow.com/a/3302408/ .

0
source share

All Articles