I convert an Excel spreadsheet to a list of Items (this is a domain term). During this conversion, I need to skip the header lines and throw away invalid lines that cannot be converted.
Now comes the interesting part. I need to fix these invalid entries so that I can report this. I built a crazy LINQ instruction (see below). These are extension methods that hide LINQ messy operations for types from the OpenXml library.
var elements = sheet .Rows() <-- BEGIN sheet data transform .SkipColumnHeaders() .ToRowLookup() .ToCellLookup() .SkipEmptyRows() <-- END sheet data transform .ToElements(strings) <-- BEGIN domain transform .RemoveBadRecords(out discard) .OrderByCompositeKey();
The interesting part begins with ToElements , where I convert the string search to a list of domain objects (details: it is called ElementRow , which is later converted to Element ). Bad records are created with just a key (Excel row index) and are uniquely identified compared to the real item.
public static IEnumerable<ElementRow> ToElements(this IEnumerable<KeyValuePair<UInt32Value, Cell[]>> map) { return map.Select(pair => { try { return ElementRow.FromCells(pair.Key, pair.Value); } catch (Exception) { return ElementRow.BadRecord(pair.Key); } }); }
Then I want to delete these bad entries (the easiest way is to collect them before filtering). This RemoveBadRecords method, which started like this:
public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements) { return elements.Where(el => el.FormatId != 0); }
However, I need to report dropped items! And I do not want to muddle my extension method with reports. So, I went to the out parameter (given the difficulties of using the out parameter in an anonymous block)
public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements, out List<ElementRow> discard) { var temp = new List<ElementRow>(); var filtered = elements.Where(el => { if (el.FormatId == 0) temp.Add(el); return el.FormatId != 0; }); discard = temp; return filtered; }
And so! I thought I was hardcore and it would work in one shot ...
var discard = new List<ElementRow>(); var elements = data .RemoveBadRecords(out discard) discard.ForEach(el => failures.Add(el)); foreach(var el in elements) { } return new Result(elements, failures);
But there was nothing on my discard list at the time I was fixated on it! I walked through the code and realized that I had successfully created the fully stream LINQ statement.
- Temp list was created
Where filter has been assigned (but not yet run)- And a reset list has been assigned
- Then the streaming thing was returned
When discard was repeated, it did not contain elements because the elements had not yet been repeated.
Is there a way to fix this problem using the thing I created? Do I have to force iterate data before or during the bad record filter? Is there another design I missed?
Some comments
John mentioned that the assignment / was / is happening. I just did not expect this. If I check the contents of discard after iterating over the elements , it is really full! So, I have no problem with the appointment. Unless I take John’s advice on what is good / bad to have in a LINQ statement.