Intersection of two sets in the most optimized way

Given the two sets of values, I have to find ours whether there is any common element among them or not. Either their intersection is zero or not.

Which of the standard C # collections is suitable for this purpose (in terms of performance)? I know that linq has an Intersect extension method to find out the intersection of two lists / arrays, but the focus is on performance in terms of Big-O notation .

But what if I also need to find out the intersection of the two sets?

+8
set c #
source share
2 answers

Well, if you use the LINQ Intersect method, it will create a HashSet second sequence, and then check each element of the first sequence for it. So, O (M + N) ... and you can use foo.Intersect(bar).Any() to get an early exit.

Of course, if you first save one (or) installed in the HashSet<T> , you can simply iterate over the other, checking for localization at each step. You still need to create a kit to start with it.

Essentially, you have an O (M + N) problem, no matter what you do - you are not going to be cheaper than that (there is always the possibility that you will have to look at each element), and if your hash codes are reasonable, you should be able to easily achieve this complexity. Of course, some solutions may give better persistent factors than others ... but this performance, not complexity;)

EDIT: as noted in the comments, also ISet<T>.Overlaps - if you already have either a static ISet<T> or a specific implementation, calling Overlaps simplifies what you are doing. If both of your sets are statically entered as ISet<T> , use larger.Overlaps(smaller) (where more and less in terms of set size), as I would expect an Overlaps implementation to iterate over the argument and check each element against the contents of the set. to which you call him.

+24
source share

As mentioned, using Any() will give you some performance.

I tested it on a fairly large dataset and it gave 25% improvement.

It is also very important to use larger.Intersect(smaller) , and not vice versa, in my case it gave 35% improvement.

In addition, when ordering a list before applying the intersection, another 7-8% is given.

Another thing to keep in mind is that depending on the use case, you can completely avoid the use of intersection.

For example, for an integer list, if the maximum and minimum do not fall within the same boundaries, you do not need to apply the intersection, since they will never do.

The same thing applies to a string list with the same idea applied to the first letter.

Again, depending on your case, try to find the rule as much as possible where the intersection cannot be avoided.

+2
source share

All Articles