How to delete rows that meet some criteria in an Excel spreadsheet?

I would like to create a "reduced" version of an Excel spreadsheet (xlsx) (i.e. deleting some rows according to some criteria), and I would like to know if this can be done using openpyxl .

In the pythonish code, what I want to do would look something like this:

 wb = openpyxl.reader.excel.load_workbook('/path/to/workbook.xlsx') sh = wb.get_sheet_by_name('someworksheet') # weed out the rows of sh according to somecriterion sh.rows[:] = [r for r in sh.rows if somecriterion(r)] # save the workbook, with the weeded-out sheet wb.save('/path/to/workbook_reduced.xlsx') 

Is it possible to do something like this with openpyxl , and if so, how?

+7
source share
2 answers

Internally, openpyxl does not seem to have the concept of a โ€œrowโ€, it works with cells and tracks sizes, and if you use Worksheet.rows it computes a 2D array of cells from this. You can mutate an array, but it does not change the worksheet.

If you want to do this in a worksheet, you need to copy the values โ€‹โ€‹from the old position to the new position and set the value of the cells that are no longer needed for '' or None , and call Worksheet.garbage_collect() .

If your data set is small and has a uniform character (for example, all rows), you can better copy the corresponding cell (content) to a new worksheet, delete the old one and set the new one to the just deleted one.

The most elegant thing to do, IMHO, is to extend the Worksheet or subclass using the delete_rows method. I would implement such a method by changing the coordinates of its Cell in place. But this can be broken if the internal elements of openpyxl .

+1
source

Update 2018: I searched for how to delete a line today, and found that functionality was added in openpyxl 2.5.0-b2. Just tried it and it worked perfectly. Here's the link where I found the answer: https://bitbucket.org/openpyxl/openpyxl/issues/964/delete_rows-does-not-work-on-deleting

And here is the syntax for deleting one line:

 ws.delete_rows(index, 1) 

where: "ws" is the worksheet, "index" is the line number, and '1' is the number of lines to delete.

It is also possible to delete columns, but I have not tried this.

0
source

All Articles