First steps in Swift, performance issue when distributing small objects in BST

Question

First steps in Swift, performance issue when distributing small objects in BST

When trying to learn Swift 2.2, I ran into a serious performance hit when trying to highlight many small objects (mostly BST of 262144 elements). My current test is to compile Java 1.8.0_74 from an old utility that I wrote a few years ago that runs on my 2012 Retina Macbook Pro in 59 seconds (59036178 microseconds). The problem that I can observe through the tools is that I get literally dozens of swift_retain_ and swift_release for iteration. Not sure how to avoid them:

import Foundation import Darwin; import Foundation public class BinarySearchTree<T : Comparable> { private var _value : T?; private var _leftTree : BinarySearchTree<T>?; private var _rightTree : BinarySearchTree<T>?; public init(value : T) { _value = value; } var value : T? { get { return self._value; } set { self._value = newValue; } } var leftTree : BinarySearchTree<T>? { get { return self._leftTree; } set { self._leftTree = newValue; } } var rightTree : BinarySearchTree<T>? { get { return self._rightTree; } set { self._rightTree = newValue; } } public func add(newValue : T) -> BinarySearchTree<T> { var navigator : BinarySearchTree<T>?; var subtree : BinarySearchTree<T>?; var done : Bool?; done = false; navigator = self; while (!done!) { if (newValue < navigator?.value) { subtree = navigator?.leftTree; if (subtree != nil) { navigator = subtree; } else { let newNode = BinarySearchTree<T>(value: newValue); navigator!.leftTree = newNode; done = true; } } else if (newValue > navigator?.value) { subtree = navigator?.rightTree; if (subtree != nil) { navigator = subtree; } else { let newNode = BinarySearchTree<T>(value: newValue); navigator?.rightTree = newNode; done = true; } } else { done = true; } } return self; } } /* cut remove/search methods */

And this is the test code that I wrote for the test run

 let count : Int32 = 262144; let base : Int32 = 65536; let target : Int32 = count + 1; var info = mach_timebase_info(numer:0, denom:0); var timebase = mach_timebase_info(&info); let numer = UInt64(info.numer); let denom = UInt64(info.denom); let norm = UInt64(numer/denom); let check1 = (mach_absolute_time() * norm); var root = BinarySearchTree<Int32>(value:base); for var loop in 0 ... count-1 { if (loop % 1000 == 0) { print(loop); } root = root.add(loop); } let check2 = (mach_absolute_time() * norm); print("Creation phase microseconds: [" + String((check2 - check1) / 1000) + "]");

I tried to find the specific issue of quick release / save without any luck, I'm not sure how to do this. Thanks everyone

+7

memory-management swift

Iridiumfx Apr 17 '16 at 12:55

source share

2 answers

The problem is, since you are marking save / release (although this is not really the case, save / release is slightly near ummm capacity ... we will get there at the end). This is not related to distribution. You do not select unnecessary objects, you just save them for a while, and then release them. I'll start with Kenneth code, which optimizes a lot of performance issues in the original, but still has this problem. (I do not consider recursive code because it crashes in your current use case. However, it shies away from some redundant save.)

It’s worth saying that the Kenneth code is good and, as a rule, you should do something (as you will see even more when we go).

First note: when you mentioned -Ofast , for ObjC, not for Swift. The flag for Swift is just -O . You also want -whole-module-optimization , but that doesn't help anything.

Another little thing, then we get to it. Mark final classes anytime. This ensures that there is no dynamic dispatch. It doesn’t matter here when compared to saving / releasing, but hey, take some lightweight material.

Sounds good at 30%?

OK, now great, and this is a trick. I find that I can knock out about 30% of the time (from ~ 6 minutes to ~ 4 minutes for a full import) by rewriting this:

 guard let subtree = navigator.leftTree else { navigator.leftTree = BinarySearchTree<T>(value: newValue) break } navigator = subtree continue

Like this:

 let subtree = navigator.leftTree if subtree == nil { navigator.leftTree = BinarySearchTree(value: newValue) break } navigator = subtree! continue

This is something to be very careful with. In this case, it turns out to be faster, but it may not be so fast on other inputs. This may not be so fast when switching to the optimizer (SIL generation is a bit strange, and I suspect that this may be a mistake, because it seems that in the second case the navigator saved twice, but only if succeeded). But now it is faster. (EDIT: The Swift team was surprised at this finding, and now there is a bug open against . Don't expect this to work in the future.)

How about 85% like that sound?

But as you said, could we avoid all this with structures? But it would be insanely expensive to copy the whole tree every time we touch it. Of course, we could greatly improve this by copying to write, for example using Array. But COW is pretty complicated. If there was only a way to reuse existing material. What if we use Array?

 private struct Node<Element: Comparable> { let value: Element var leftIndex = -1 // Ugly, but ~25% faster than using Int? in my tests var rightIndex = -1 init(_ value: Element) { self.value = value } } // This works exactly the same if you make it a `final class`. Your choice. public struct BinarySearchTree<Element: Comparable> { private var storage: [Node<Element>] = [] init(value: Element) { storage.append(Node(value)) } public mutating func add(newValue: Element) { if storage.isEmpty { storage.append(Node(newValue)) } var index = 0 while (true) { let node = storage[index] if (newValue < node.value) { if node.leftIndex < 0 { storage.append(Node(newValue)) storage[index].leftIndex = storage.count - 1 // Don't use node here; remember value types! break } index = node.leftIndex continue } else if (newValue > node.value) { if node.rightIndex < 0 { storage.append(Node(newValue)) storage[index].rightIndex = storage.count - 1 break } index = node.rightIndex continue } else { break } } } }

It will take about 45 seconds to work on my system. Of course, this makes delete little more complicated. You will either have to accept the "missed" memory (possibly with periodic repackaging), or you will need to support freelist. But freelancer won't be too hard to add.

Try to improve 99.97% without changing `add()` .

And, of course, it is important to remember that this is an almost pathological case for BST. Even if you often transmitted data in order, it would be better for you to shuffle it before inserting it, including the cost of shuffling. For example, using shuffleInPlace (and counting its time), inserting exactly the same values:

 var values = Array(0 ... count - 1) values.shuffleInPlace() for (loop, value) in values.enumerate() { if (loop % 1000 == 0) { print(loop) } root.add(value) }

It takes from 45 to 0.1 s. (Kenneth version and my version of “!” Are approximately 0.2 s on this metric, I would probably use Kenneth's solution, adding final . Even your source code, which has great inefficiency that Kenneth fixed, takes only 0.5 s with this change. Remember that the optimized Kenneth version with the addition in order was 6 minutes on my system.)

Shuffle before inserting. If you get things over time, it might be worth it to dose and shuffle them before inserting. If the tree changes over time, it is worth checking if it is too deep, and periodically rebuild it. Keeping the depth of the tree intelligently overloads every other optimization. Smart ways to work with Swift memory could not touch this change.

Correct the algorithm. Everything else is peanuts in comparison.

+7

Rob napier Apr 18 '16 at 2:51

source share

Colgraff · Accepted Answer · 2016-04-17T14:02:53+0000

I simplified your code by removing some Optionals and your getter / seters, because they were redundant and could contribute to slow code.

I profiled both my code and my own and got this result in the same data set of random elements:

1000 elements:

Yours: microsecond of creation phase: [28680771]
Mine: microseconds of the creation phase: [8564279]

10,000 items:

Yours: microsecond of creation phase: [426233689]
Mine: microsecond of the creation phase: [126725800]

Here is my code:

 public class BinarySearchTree2<T : Comparable> { public init(value : T) { self.value = value } var value : T var leftTree : BinarySearchTree2<T>? var rightTree : BinarySearchTree2<T>? public func add(newValue : T) -> BinarySearchTree2<T> { var navigator = self while (true) { if (newValue < navigator.value) { guard let subtree = navigator.leftTree else { navigator.leftTree = BinarySearchTree2<T>(value: newValue) break } navigator = subtree continue } if (newValue > navigator.value) { guard let subtree = navigator.rightTree else { navigator.rightTree = BinarySearchTree2<T>(value: newValue) break } navigator = subtree continue } break } return self } } /* cut remove/search methods */

Edit:

I also did a more balanced balanced tree test where I created a dataset of 1001 consecutive elements, deleted the middle element, used Shuffle Fisher-Yates to randomize the order, initialized the root with the middle element and launched both sets. Here are my results:

Yours: microseconds of the creation phase: [27648219]
Mine: microsecond of creation phase: [8332361]

Edit 2:

I switched the add() method to use recursion with significant speed gains:

Before (my source code): microsecond of the creation phase: [8088804]
After: microseconds of the creation phase: [1179398]

Here is the new code:

 public class BinarySearchTree3<T : Comparable> { public init(value : T) { self.value = value } let value : T var leftTree : BinarySearchTree3<T>? var rightTree : BinarySearchTree3<T>? public func add(newValue : T) { if (newValue < self.value) { if self.leftTree?.add(newValue) == nil { self.leftTree = BinarySearchTree3<T>(value: newValue) } return } if (newValue > self.value) { if self.rightTree?.add(newValue) == nil { self.rightTree = BinarySearchTree3<T>(value: newValue) } return } } } /* cut remove/search methods */

First steps in Swift, performance issue when distributing small objects in BST

Sounds good at 30%?

How about 85% like that sound?

Try to improve 99.97% without changing add() .

More articles:

Try to improve 99.97% without changing `add()` .