Linq on the big lists

Question

Linq on the big lists

I have two custom classes Grid and Element :

 public class Grid { public double ID { get; set; } public double X { get; set; } public double Y { get; set; } public double Z { get; set; } public MyClass MoreProperties {get; set;} public Grid(int id, double x, double y, double z) { this.ID = id; this.X = x; this.Y = y; this.Z = z; } }

Element :

 public abstract class Element { public int ID { get; set; } public int NumberOfGrids { get; set; } public List<Grid> Grids { get; set; } //4 Grids in this case public Element() { Grids = new List<Grid>(); } }

To illustrate the situation, see this image:

enter image description here

There is a container for the Data classes:

 class Data : ModelBase { public List<Grid> Grids{ get; set; } public List<Element> Elements { get; set; } }

I read text files that have a lot of data: Grids and Elements This is a grid format (simplified):

GRID ID XYZ

And for the item

ELEMENT ID GRID1 GRID2 GRID3 GRID4

Thus, the Grid record contains the position and identifier of the grid point, and Element contains the grid identifier of this element and its own identifier.

I want to connect all 4 grids for each element, so I will have the coordinates of each grid inside the element of the element.

To do this, I read the file twice (because the element is inserted in front of the grid and simplifies things): the first time I read it, I populate the Grids list (from the Data class). The second time I populate the Elements list and do more. When I populate the Elements list, I can only populate the associated Grid id.

If you've read so far, we have this Data class containing two lists of Grid and Elements .

For the association, I came up with this method:

 public void AsociateGridsToElements() { foreach (Element elem in Elements) { for (int i = 0; i < elem.Grids.Count; i++) { elem.Grids[i] = Grids.Where(g => g.ID == elem.Grids[i].ID).FirstOrDefault(); } } }

It goes through each element, and then through each grid of this element (in this case 4), then it looks through the entire list of grids, which grid has the same ID. When he discovers the first one, he assigns this grid, so the element has a “full” Grid object, and not the one that has only a populated ID (because this is the only thing I can get when I read the file).

Here the problem arises: these files are quite large: about 20,000 grid points and 10,000 elements, if I loop for each element, each time looking at the entire collection of grids (4 times): 20,000 x 10,000 = 200,000,000 operations. Therefore, the computer cannot handle this, and I think it should be improved.

Can someone give a hint or help me optimize this problem? Thanks.

+6

c # linq wpf

Sturm Feb 27 '14 at 15:16

source share

1 answer

davisoa · Accepted Answer · 2014-02-27T15:21:44+0000

If the identifiers of each Grid object are guaranteed to be unique, I would start by creating a dictionary of Grid objects with the identifier as the key in the dictionary. Then searching for a filled Grid while listing items will require a dictionary search instead of a new listing of the Grids list.

 public void AsociateGridsToElements() { var gridLookup = Grids.ToDictionary(grid => grid.ID); foreach (Element elem in Elements) { for (int i = 0; i < elem.Grids.Count; i++) { Grid fullyPopulatedGrid; if (gridLookup.TryGetValue(elem.Grids[i].ID, out fullyPopulatedGrid)) { elem.Grids[i] = fullyPopulatedGrid; } else { // Unable to locate Grid Element } } } }

Creating a dictionary search dramatically improves performance in this case, since it prevents additional listings in the Grids list.

The above code performs the following operations (based on your ratings):

List all the elements of the Grid and create a pair of key values for each element. (about 20,000 steps)
List all Element elements (about 10,000 steps)
List each partial Grid in this Element (let's call it 4 steps)
Do a dictionary search to find a correctly filled Grid (1 Hash Search)

The general stages here are about 20,000 + (10,000 * 4) * 2 (1 hash search per element / grid) = 100,000 steps

The following operations are performed in the source code:

List all Element elements (about 10,000 steps)
List each partial Grid in this Element (let's call it 4 steps)
List all filled Grid elements (about 20,000 steps) to find the first match (a separate iteration is required for each element / grid combination)

The general stages here are about 10,000 * 4 * 20,000 = 800,000,000 steps

Linq on the big lists

More articles: