How to create a LINQ stream expression that supplies filtered items as well as filtered items?

I convert an Excel spreadsheet to a list of Items (this is a domain term). During this conversion, I need to skip the header lines and throw away invalid lines that cannot be converted.

Now comes the interesting part. I need to fix these invalid entries so that I can report this. I built a crazy LINQ instruction (see below). These are extension methods that hide LINQ messy operations for types from the OpenXml library.

var elements = sheet .Rows() <-- BEGIN sheet data transform .SkipColumnHeaders() .ToRowLookup() .ToCellLookup() .SkipEmptyRows() <-- END sheet data transform .ToElements(strings) <-- BEGIN domain transform .RemoveBadRecords(out discard) .OrderByCompositeKey(); 

The interesting part begins with ToElements , where I convert the string search to a list of domain objects (details: it is called ElementRow , which is later converted to Element ). Bad records are created with just a key (Excel row index) and are uniquely identified compared to the real item.

 public static IEnumerable<ElementRow> ToElements(this IEnumerable<KeyValuePair<UInt32Value, Cell[]>> map) { return map.Select(pair => { try { return ElementRow.FromCells(pair.Key, pair.Value); } catch (Exception) { return ElementRow.BadRecord(pair.Key); } }); } 

Then I want to delete these bad entries (the easiest way is to collect them before filtering). This RemoveBadRecords method, which started like this:

 public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements) { return elements.Where(el => el.FormatId != 0); } 

However, I need to report dropped items! And I do not want to muddle my extension method with reports. So, I went to the out parameter (given the difficulties of using the out parameter in an anonymous block)

 public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements, out List<ElementRow> discard) { var temp = new List<ElementRow>(); var filtered = elements.Where(el => { if (el.FormatId == 0) temp.Add(el); return el.FormatId != 0; }); discard = temp; return filtered; } 

And so! I thought I was hardcore and it would work in one shot ...

 var discard = new List<ElementRow>(); var elements = data /* snipped long LINQ statement */ .RemoveBadRecords(out discard) /* snipped long LINQ statement */ discard.ForEach(el => failures.Add(el)); foreach(var el in elements) { /* do more work, maybe add more failures */ } return new Result(elements, failures); 

But there was nothing on my discard list at the time I was fixated on it! I walked through the code and realized that I had successfully created the fully stream LINQ statement.

  • Temp list was created
  • Where filter has been assigned (but not yet run)
  • And a reset list has been assigned
  • Then the streaming thing was returned

When discard was repeated, it did not contain elements because the elements had not yet been repeated.

Is there a way to fix this problem using the thing I created? Do I have to force iterate data before or during the bad record filter? Is there another design I missed?

Some comments

John mentioned that the assignment / was / is happening. I just did not expect this. If I check the contents of discard after iterating over the elements , it is really full! So, I have no problem with the appointment. Unless I take John’s advice on what is good / bad to have in a LINQ statement.

+4
source share
1 answer

When the statement was actually repeated, the Where and temp clause is populated, but the reset was never assigned again!

No need to reassign - the existing list that was assigned to discard in the calling code will be populated.

However, I would highly recommend against this approach. Using the out parameter here is really contrary to the spirit of LINQ. (If you repeat your results twice, you will get a list containing all the bad elements twice. Ick!)

I suggest materializing a query before deleting bad records, and then you can run individual queries:

 var allElements = sheet .Rows() .SkipColumnHeaders() .ToRowLookup() .ToCellLookup() .SkipEmptyRows() .ToElements(strings) .ToList(); var goodElements = allElements.Where(el => el.FormatId != 0) .OrderByCompositeKey(); var badElements = allElements.Where(el => el.FormatId == 0); 

By materializing the query in List<> you process each row only once in terms of ToRowLookup , ToCellLookup , etc. This means that you need to have enough memory to store all the elements at a time, course. There are alternative approaches (for example, taking action on each bad element when filtering it), but they can still be quite fragile.

EDIT: Another parameter mentioned by Servy is to use ToLookup , which will be implemented and grouped at a time:

 var lookup = sheet .Rows() .SkipColumnHeaders() .ToRowLookup() .ToCellLookup() .SkipEmptyRows() .ToElements(strings) .OrderByCompositeKey() .ToLookup(el => el.FormatId == 0); 

Then you can use:

 foreach (var goodElement in lookup[false]) { ... } 

and

 foreach (var badElement in lookup[true]) { ... } 

Please note that this performs ordering on all elements, good and bad. An alternative is to remove the order from the original request and use:

 foreach (var goodElement in lookup[false].OrderByCompositeKey()) { ... } 

I'm personally not alone about grouping by true / false - this seems like an abuse of what keyword search usually means, but it will certainly work.

+7
source

All Articles