Comparing SubjectId
not a problem because c.SubjectId
is a primitive type value ( int
, I think). An exception complains about Equals(c.Students)
. c.Students
is a constant (as requested by duplicates
), but not a primitive type.
I would also try to do the comparison in memory, not in the database. In any case, you load all the data into memory when you start your first foreach
: it executes an allClasses
request. Then inside the loop, you extend IQueryable allClasses
to IQueryable duplicates
, which is executed, and then in the inner foreach
. This is one database query for each element of your outer loop! This may explain the poor performance of the code.
So, I will try to execute the contents of the first foreach
in memory. To compare the Students
list, you need to compare element by element, and not links to student collections, because they are exactly different.
var sb = new StringBuilder(); using (var ctx = new Ctx()) { ctx.CommandTimeout = 10000; // Perhaps not necessary anymore var allClasses = ctx.Classes.Include("Students").OrderBy(o => o.Id) .ToList(); // executes query, allClasses is now a List, not an IQueryable // everything from here runs in memory foreach (var c in allClasses) { var duplicates = allClasses.Where( o => o.SubjectId == c.SubjectId && o.Id != c.Id && o.Students.OrderBy(s => s.Name).Select(s => s.Name) .SequenceEqual(c.Students.OrderBy(s => s.Name).Select(s => s.Name))); // duplicates is an IEnumerable, not an IQueryable foreach (var d in duplicates) sb.Append(d.LongName) .Append(" is a duplicate of ") .Append(c.LongName) .Append("<br />"); } } lblResult.Text = sb.ToString();
Ordering a sequence by name is necessary because, I believe, SequenceEqual
compares the length of the sequence, then element 0 with element 0, then element 1 with element 1, etc.
Change To your comment that the first request is still slow.
If you have 1300 classes with 30 students, the performance on loading ( Include
) may suffer from the multiplication of data that is transferred between the database and the client. This is explained here: How much Include can I use in an ObjectSet in an EntityFramework to maintain performance? . The query is complex because it requires a JOIN
between classes and students, and materializing material objects is also complicated because EF must filter out duplicate data when creating objects.
An alternative approach is to load only classes without students in the first query, and then load students one by one inside the loop explicitly. It will look like this:
var sb = new StringBuilder(); using (var ctx = new Ctx()) { ctx.CommandTimeout = 10000; // Perhaps not necessary anymore var allClasses = ctx.Classes.OrderBy(o => o.Id).ToList(); // <- No Include! foreach (var c in allClasses) { // "Explicite loading": This is a new roundtrip to the DB ctx.LoadProperty(c, "Students"); } foreach (var c in allClasses) { // ... same code as above } } lblResult.Text = sb.ToString();
In this example, you will have 1 + 1300 database queries, not just one, but you will not have the data multiplication that occurs at high load, and the queries are simpler (no JOIN
between classes and students).
The loading explanation is explained here:
If you are working with Lazy Loading, the first foreach
with LoadProperty
will not be needed, as the Students
collections will be loaded the first time you access it. This should lead to the same 1300 additional requests, for example, when loading the identifier.