Why is SortedSet <T> .GetViewBetween not O (log N)?

In .NET 4.0+, the SortedSet<T> class has a method called GetViewBetween(l, r) that returns the appearance of the interface on a part of the tree that contains all the values ​​between the two. Given that SortedSet<T> is implemented as a red-black tree, I naturally expect it to start in O(log N) . A similar method in C ++ is std::set::lower_bound/upper_bound , in Java it is TreeSet.headSet/tailSet , and they are logarithmic.

However, this is not true. The following code runs after 32 seconds, while the equivalent version of O(log N) GetViewBetween will make this code work after 1-2 seconds.

 var s = new SortedSet<int>(); int n = 100000; var rand = new Random(1000000007); int sum = 0; for (int i = 0; i < n; ++i) { s.Add(rand.Next()); if (rand.Next() % 2 == 0) { int l = rand.Next(int.MaxValue / 2 - 10); int r = l + rand.Next(int.MaxValue / 2 - 10); var t = s.GetViewBetween(l, r); sum += t.Min; } } Console.WriteLine(sum); 

I decompiled System.dll using dotPeek , and here is what I got:

 public TreeSubSet(SortedSet<T> Underlying, T Min, T Max, bool lowerBoundActive, bool upperBoundActive) : base(Underlying.Comparer) { this.underlying = Underlying; this.min = Min; this.max = Max; this.lBoundActive = lowerBoundActive; this.uBoundActive = upperBoundActive; this.root = this.underlying.FindRange(this.min, this.max, this.lBoundActive, this.uBoundActive); this.count = 0; this.version = -1; this.VersionCheckImpl(); } internal SortedSet<T>.Node FindRange(T from, T to, bool lowerBoundActive, bool upperBoundActive) { SortedSet<T>.Node node = this.root; while (node != null) { if (lowerBoundActive && this.comparer.Compare(from, node.Item) > 0) { node = node.Right; } else { if (!upperBoundActive || this.comparer.Compare(to, node.Item) >= 0) return node; node = node.Left; } } return (SortedSet<T>.Node) null; } private void VersionCheckImpl() { if (this.version == this.underlying.version) return; this.root = this.underlying.FindRange(this.min, this.max, this.lBoundActive, this.uBoundActive); this.version = this.underlying.version; this.count = 0; base.InOrderTreeWalk((TreeWalkPredicate<T>) (n => { SortedSet<T>.TreeSubSet temp_31 = this; int temp_34 = temp_31.count + 1; temp_31.count = temp_34; return true; })); } 

So, FindRange is obviously O(log N) , but after that we call VersionCheckImpl ... which performs a linear stroke of the found subtree only for recounting its nodes!

  • Why do you need to go through this round all the time?
  • Why doesn't .NET contain an O(log N) method for partitioning a tree based on a key, such as C ++ or Java? This is really useful in many situations.
+60
c # complexity-theory sortedset
Mar 24 '12 at 10:29
source share
1 answer

about the version field

Update1:

In my memory, many (maybe all?) Collections in BCL have a version field.

Firstly, near foreach :

according to this msdn link

The foreach statement repeats a group of built-in statements for each element of an array or collection of objects. The foreach statement is used to iterate through the collection to obtain the required information, but should not be used to modify the contents of the collection to avoid unpredictable side effects.

In many other collections, version protection is protected; data does not change during foreach

For example, HashTable MoveNext() :

 public virtual bool MoveNext() { if (this.version != this.hashtable.version) { throw new InvalidOperationException(Environment.GetResourceString("InvalidOperation_EnumFailedVersion")); } .......... } 

But in the SortedSet<T> MoveNext() :

 public bool MoveNext() { this.tree.VersionCheck(); if (this.version != this.tree.version) { ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion); } .... } 

UPDATE2:

But the O (N) loop can be not only for version , but also for the Count property.

Because the MSDN from GetViewBetween said:

This method returns a view of the range of elements that are between lowerValue and upperValue, as defined by the comparator .... You can make changes to both the view and the base SortedSet (Of T) .

Therefore, for each update, it is necessary to synchronize the Count field (the key and value are already the same). To make sure Count correct

To achieve the goal, there were two policies:

  • from Microsoft
  • Mono

First.MS, in their code, sacrifice the performance of GetViewBetween() and win the performance of Count Property.

VersionCheckImpl() is one way to synchronize the Count property.

Secondly, Mono. In the monocode, GetViewBetween() is faster, but in the GetCount() method:

 internal override int GetCount () { int count = 0; using (var e = set.tree.GetSuffixEnumerator (lower)) { while (e.MoveNext () && set.helper.Compare (upper, e.Current) >= 0) ++count; } return count; } 

This is always an O (N) operation!

+18
Mar 30
source share



All Articles