How to efficiently use Where Where or Select in LINQ Parallel in a large dataset

I have about 250,000 entries marked as "Boss", each boss has from 2 to 10 people. Every day I need to get information about the staff. About 1,000,000 people. I use Linq to get a unique list of employees who work daily. Consider the following C # LINQ and Models

void Main() { List<Boss> BossList = new List<Boss>() { new Boss() { EmpID = 101, Name = "Harry", Department = "Development", Gender = "Male", Employees = new List<Person>() { new Person() {EmpID = 102, Name = "Peter", Department = "Development",Gender = "Male"}, new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development",Gender = "Female"}, } }, new Boss() { EmpID = 104, Name = "Raj", Department = "Development", Gender = "Male", Employees = new List<Person>() { new Person() {EmpID = 105, Name = "Kaliya", Department = "Development",Gender = "Male"}, new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development",Gender = "Female"}, } }, ..... ~ 250,000 Records ...... }; List<Person> staffList = BossList .SelectMany(x => new[] { new Person { Name = x.Name, Department = x.Department, Gender = x.Gender, EmpID = x.EmpID } } .Concat(x.Employees)) .GroupBy(x => x.EmpID) //Group by employee ID .Select(g => g.First()) //And select a single instance for each unique employee .ToList(); } public class Person { public int EmpID { get; set; } public string Name { get; set; } public string Department { get; set; } public string Gender { get; set; } } public class Boss { public int EmpID { get; set; } public string Name { get; set; } public string Department { get; set; } public string Gender { get; set; } public List<Person> Employees { get; set; } } 

In the LINQ above, I get a list of distinctive employees or employees, the list contains more than 1,000,000 entries. In the list I need to find "Raj"

 staffList.Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant())); 

For this operation, it took more than 3-5 minutes to get the result.

How can I make it more efficient. Please help me...

+6
source share
2 answers

Will it work for you to change staffList to a dictionary? A better search algorithm than dictionary and SortedList will bring you the greatest improvement.

I checked the code below and it starts in just a few seconds.

  private static void Main() { List<Boss> BossList = new List<Boss>(); var b1 = new Boss() { EmpID = 101, Name = "Harry", Department = "Development", Gender = "Male", Employees = new List<Person>() { new Person() {EmpID = 102, Name = "Peter", Department = "Development", Gender = "Male"}, new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development", Gender = "Female"}, } }; var b2 = new Boss() { EmpID = 104, Name = "Raj", Department = "Development", Gender = "Male", Employees = new List<Person>() { new Person() {EmpID = 105, Name = "Kaliya", Department = "Development", Gender = "Male"}, new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development", Gender = "Female"}, } }; Random r = new Random(); var genders = new [] {"Male", "Female"}; for (int i = 0; i < 1500000; i++) { b1.Employees.Add(new Person { Name = "Name" + i, Department = "Department" + i, Gender = genders[r.Next(0, 1)], EmpID = 200 + i }); b2.Employees.Add(new Person { Name = "Nam" + i, Department = "Department" + i, Gender = genders[r.Next(0, 1)], EmpID = 1000201 + i }); } BossList.Add(b1); BossList.Add(b2); Stopwatch sw = new Stopwatch(); sw.Start(); var emps = BossList .SelectMany(x => new[] {new Person {Name = x.Name, Department = x.Department, Gender = x.Gender, EmpID = x.EmpID}} .Concat(x.Employees)) .GroupBy(x => x.EmpID) //Group by employee ID .Select(g => g.First()); var staffList = emps.ToList(); var staffDict = emps.ToDictionary(p => p.Name.ToLowerInvariant() + p.EmpID); var staffSortedList = new SortedList<string, Person>(staffDict); Console.WriteLine("Time to load staffList = " + sw.ElapsedMilliseconds + "ms"); var rajKeyText = "Raj".ToLowerInvariant(); sw.Reset(); sw.Start(); var rajs1 = staffList.AsParallel().Where(p => p.Name.ToLowerInvariant().Contains(rajKeyText)).ToList(); Console.WriteLine("Time to find Raj = " + sw.ElapsedMilliseconds + "ms"); sw.Reset(); sw.Start(); var rajs2 = staffDict.AsParallel().Where(kvp => kvp.Key.Contains(rajKeyText)).ToList(); Console.WriteLine("Time to find Raj = " + sw.ElapsedMilliseconds + "ms"); sw.Reset(); sw.Start(); var rajs3 = staffSortedList.AsParallel().Where(kvp => kvp.Key.Contains(rajKeyText)).ToList(); Console.WriteLine("Time to find Raj = " + sw.ElapsedMilliseconds + "ms"); Console.ReadLine(); } public class Person { public int EmpID { get; set; } public string Name { get; set; } public string Department { get; set; } public string Gender { get; set; } } public class Boss { public int EmpID { get; set; } public string Name { get; set; } public string Department { get; set; } public string Gender { get; set; } public List<Person> Employees { get; set; } } } 

Output1:

enter image description here

Output 2 (using .AsParallel () to search):

enter image description here

In other words, if you cannot use a faster data structure, you can speed up the search by simply changing the form

 staffList.Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant())); 

to

 staffList.AsParallel().Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant())); 
0
source

If you change Boss to inherit from Person ( public class Boss : Person ), you do not need to duplicate your properties in Person and Boss , you do not need to create all new Person for each Boss , because Boss already has Person :

 IEnumerable<Person> staff = BossList .Concat(BossList .SelectMany(x => x.Employees) ) .DistinctBy(p => p.EmpId) .ToList() 

Where DistinctBy is defined as

 public static IEnumerable<TSource> DistinctBy<TSource, TKey> (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector) { var seenKeys = new HashSet<TKey>(); foreach (TSource element in source) { if (seenKeys.Add(keySelector(element))) { yield return element; } } } 

In addition, in your comparison, you convert each Name to lowercase and do the comparison - this is a lot of creating a string that you don't need. Instead, try something like

 staffList.Where(m => m.Name.Equals("Raj", StringComparison.InvariantCultureIgnoreCase)); 

Also, keep in mind that using Contains will also match names like Rajamussen and mirajii - perhaps not what you expected.

0
source

All Articles