Performance error setting up a join in Scala

I just ran into weird behavior in the Scala Set APIs. Here is my function, devoid of anything related to the rest of the project

def grade(...): Double = { val setA: HashSet = // get from somewhere else val setB: HashSet = // get from somewhere else if ((setA size) == 0 || (setB size) == 0) return 0 else return (setA & setB size) / (setA | set B size) } 

This function is called a lot of time inside the cycle, and the whole cycle takes about 4.5 seconds. But when replacing the merge size with the sum of the sizes (gross approximation), in order to check the effect of the merge operation, the execution time decreases to about 0.35 s ...

 def grade(...): Double = { val setA: HashSet = // get from somewhere else val setB: HashSet = // get from somewhere else if ((setA size) == 0 || (setB size) == 0) return 0 else return (setA & setB size) / (setA size + set B size) } 
+4
source share
2 answers

Well, you cannot compare a simple operation such as the sum of 2 Ints with a union operation of 2 sets. I expect that the performance of these operations will be very different, especially if your sets contain a lot of elements.

You do not need an alliance because you are already intersecting. Try the following:

 def grade: Double = { val setA: HashSet = // get from somewhere else val setB: HashSet = // get from somewhere else if ((setA size) == 0 || (setB size) == 0) return 0 else { val inter = setA & setB size return inter / ((setA size) + (setB size) - inter) } } 

However, I find your measurement a bit strange, because I expected both operations (union and intersection) to take about the same O (n) time. Removing a connection should improve performance by half (2 s) ...

+5
source

Do you use parallel collections, by any chance? Union is performed sequentially, so any parallel collection is first converted to sequential collection. This may affect performance.

Also, the union is O (n), so you go in O (n) form to O (1), which is of great importance.

0
source

All Articles