What architecture should I use to eliminate this SystemOutOfMemoryException, allowing me to instantiate sheet cells?

Summary

This question is a continuation of the desire to architecture a simple table API, while maintaining its convenience for those who know Excel well.

To summarize, this question is related to these two below:
1. How to implement self-name columns from your index? ;
2. How to make this normal sheet initialization faster? .

purpose

Provide a simplified Excel API used as a wrapper for neuralgic components such as Application , Workbook , Worksheet and Range classes / interfaces when exposing only the most commonly used object properties for each of them.

Usage example

This use case is inspired by unit tests, which allowed me to bring this solution to where it stands.

 Dim file as String = "C:\Temp\WriteTest.xls" Using mgr As ISpreadsheetManager = New SpreadsheetManager() Dim wb as IWorkbook = mgr.CreateWorkbook() wb.Sheets("Sheet1").Cells("A1").Value = 3.1415926 wb.SaveAs(file) End Using 

And now we open it:

 Dim file as String = "C:\Temp\WriteTest.xls" Using mgr As ISpreadsheetManager = New SpreadsheetManager() Dim wb as IWorkbook = mgr.OpenWorkbook(file) // Working with workbook here... End Using 

Discussion

When creating an instance of an Excel workbook:

  • A worksheet instance is automatically initialized in the Workbook.Sheets collection;
  • After initialization, the worksheet initializes its cells through the Range object, which can represent one or more cells.

These cells are immediately available with all their properties as soon as the worksheet exists.

My desire is to reproduce this behavior so that

  • The constructor of the Workbook class initializes the property of the Workbook.Sheets collection with its own sheets;
  • The constructor of the Worksheet class initializes the property of the Worksheet.Cells collection using its own cells.

My problem arises from the constructor of the Worksheet class when initializing the property of the Worksheet.Cells collection shown in # 2.

Thoughts

Following these related issues, there are problems, I want to find out another architecture that would allow me:

  • If necessary, use the special Range cell function;
  • Provide the most commonly used properties through my ICell interface;
  • Access to all Range cells of the worksheet from its initialization.

Remembering that access to the Range.Value property is the fastest possible interaction with the base instance of Excel using Interop.

So, I thought about initializing my ReadonlyOnlyDictionary(Of String, ICell) name of the cells without immediately wrapping an instance of the Range interface, so that I simply generate row and column indices along with the cell name to index my dictionary, then assigning the Cell.NativeCell property only then when you want to access or format a specific range of cells or cells.

Thus, the data in the dictionary will be indexed with the name of the cells obtained from the column indices generated in the constructor of the Worksheet class. Then when you do this:

 Using mgr As ISpreadsheetManager = New SpreadsheetManager() Dim wb As IWorkbook = mgr.CreateWorkbook() wb.Sheet(1).Cells("A1").Value = 3.1415926 // #1: End Using 

# 1: This will allow me to use indexes from my Cell class to write this value to a specific cell, which faster uses its name directly for Range .

Questions and concerns

Also, when working with UsedRange.get_Value() or Cells.get_Value() , this returns arrays of Object (,).

1. So should I be happy to work with Object(,) arrays for cells, without being able to somehow format it?

2. How to archive these Worksheet and Cell classes so that I have better performance when working with Object(,) arrays, while still maintaining the possibility that a Cell instance can represent or wrap a single cell range?

Thanks to any of you who take the time to read my post and my sincere thanks to those who respond.

+7
source share
1 answer

The architecture used passed the class of objects, which I called CellCollection . Here is what he does:

Based on this hypothesis:

  • Given that the Excel worksheet has 256 columns and 65536 rows,

  • Given that it is necessary to create an instance of 16,777,216 (256 * 65536) cells,

  • Given that the most common use of a worksheet is less than 1000 rows and less than 100 columns,

  • Given that I need it to be able to refer to cells with their addresses ("A1"); and

  • Given that it is compared with the fact that it simultaneously accesses all the values ​​and loads them into object[,] as the fastest way to work with the main Excel worksheet, *

I decided not to instantiate any of the cells, allowing my CellCollection property in my IWorksheet interface to IWorksheet initialized and empty after the instance was created, with the exception of the existing workbook. Thus, when I open the book, I check that NativeSheet.UsedRange empty or returns null (Nothing in Visual Basic), otherwise I already used the used "native cells" in memory, so all that remains is to add them to my internal CellCollection when indexing them with the appropriate address.

Finally, the Lazy Initialization Design Pattern to the rescue! =)

 public class Sheet : ISheet { public Worksheet(Microsoft.Office.Interop.Excel.Worksheet nativeSheet) { NativeSheet = nativeSheet; Cells = new CellCollection(this); } public Microsoft.Office.Interop.Excel.Worksheet NativeSheet { get; private set; } public CellCollection Cells { get; private set; } } public sealed class CellCollection { private IDictionary<string, ICell> _cells; private ReadOnlyDictionary<string, ICell> _readonlyCells; public CellCollection(ISheet sheet) { _cells = new Dictionary<string, ICell>(); _readonlyCells = new ReadonlyDictionary<string, ICell>(_cells); Sheet = sheet; } public readonly ReadOnlyDictionary<string, ICell> Cells(string addresses) { get { if (string.IsNullOrEmpty(addresses) || 0 = address.Trim().Length) throw new ArgumentNullException("addresses"); if (!Regex.IsMatch(addresses, "(([A-Za-z]{1,2,3}[0-9]*)[:,]*)")) throw new FormatException("addresses"); foreach(string address in addresses.Split(",") { Microsoft.Office.Interop.Excel.Range range = Sheet.NativeSheet.Range(address) foreach(Microsoft.Office.Interop.Excel.Range cell in range) { ICell c = null; if (!_cells.TryGetValue(cell.Address(false, false), c)) { c = new Cell(cell); _cells.Add(c.Name, c); } } } return _readonlyCells; } } public readonly ISheet Sheet { get; private set; } } 

Obviously, this is the first attempt, and so far it works very well, with more than acceptable performance. Humbly, though, I feel that it can use some optimizations, although I will use it this way for now, and optimize it later if necessary.

After writing this collection, I was able to arrive at the expected behavior. Now I will try to implement some of the .NET interfaces to make it suitable for use with some IEnumerable , IEnumerable<T> , ICollection , ICollection<T> , etc., so that it can accordingly be considered as a true .NET collection.

Feel free to comment and make constructive alternatives and / or changes to this code to make it even bigger than it is now.

I hope this one day will serve one purpose.

Thanks for reading! =)

0
source

All Articles