How to make my function run as fast as Contains in an ArrayList?

I cannot understand the discrepancy between the time it takes for the Contains method to find an element in an ArrayList and the time it takes for a little function that I wrote to do the same. The documentation states that Contains does a linear search, so it should be in O(n) , not some other faster method. However, although the exact values ​​may not be relevant, the Contains method returns in 00:00:00.1087087 seconds, and my function takes 00:00:00.1876165 . It may not be so much, but this difference becomes more apparent when working with even larger arrays. What am I missing and how do I write my function to match Contains performances?

I am using C # on .NET 3.5.

 public partial class Window1 : Window { public bool DoesContain(ArrayList list, object element) { for (int i = 0; i < list.Count; i++) if (list[i].Equals(element)) return true; return false; } public Window1() { InitializeComponent(); ArrayList list = new ArrayList(); for (int i = 0; i < 10000000; i++) list.Add("zzz " + i); Stopwatch sw = new Stopwatch(); sw.Start(); //Console.Out.WriteLine(list.Contains("zzz 9000000") + " " + sw.Elapsed); Console.Out.WriteLine(DoesContain(list, "zzz 9000000") + " " + sw.Elapsed); } } 

EDIT:

Ok now guys look:

 public partial class Window1 : Window { public bool DoesContain(ArrayList list, object element) { int count = list.Count; for (int i = count - 1; i >= 0; i--) if (element.Equals(list[i])) return true; return false; } public bool DoesContain1(ArrayList list, object element) { int count = list.Count; for (int i = 0; i < count; i++) if (element.Equals(list[i])) return true; return false; } public Window1() { InitializeComponent(); ArrayList list = new ArrayList(); for (int i = 0; i < 10000000; i++) list.Add("zzz " + i); Stopwatch sw = new Stopwatch(); long total = 0; int nr = 100; for (int i = 0; i < nr; i++) { sw.Reset(); sw.Start(); DoesContain(list,"zzz"); total += sw.ElapsedMilliseconds; } Console.Out.WriteLine(total / nr); total = 0; for (int i = 0; i < nr; i++) { sw.Reset(); sw.Start(); DoesContain1(list, "zzz"); total += sw.ElapsedMilliseconds; } Console.Out.WriteLine(total / nr); total = 0; for (int i = 0; i < nr; i++) { sw.Reset(); sw.Start(); list.Contains("zzz"); total += sw.ElapsedMilliseconds; } Console.Out.WriteLine(total / nr); } } 

I performed an average of 100 times for two versions of my function (forward and backward loops) and for the default function Contains . The time I have is 136 and 133 milliseconds for my functions and a distant winner 87 for the Contains version. Well, if earlier you could argue that there was little data, and I based my conclusions on the first isolated run, what do you say about this test? Not only does Contains perform better on average, but it delivers consistently better results in every run. So, are there any flaws here for third-party functions or what?

+6
contains arraylist c #
source share
11 answers

Using the code below, I was able to get the following timings relatively sequentially (within a few ms):
1: 190 ms DoContainRev
2: 198ms DoesContainRev1
3: 18ms DoesContainFwd
4: 20ms DoesContainFwd1
5: 199ms Contains

A few comments here.

  • Runs with the release of compiled code from the command line. Many people make the mistake of benchmarking inside the Visual Studio debugging environment, not to mention that someone here has done something that needs to be careful.

  • list[i].Equals(element) looks a bit slower than element.Equals(list[i]) .

     using System; using System.Diagnostics; using System.Collections; namespace ArrayListBenchmark { class Program { static void Main(string[] args) { Stopwatch sw = new Stopwatch(); const int arrayCount = 10000000; ArrayList list = new ArrayList(arrayCount); for (int i = 0; i < arrayCount; i++) list.Add("zzz " + i); sw.Start(); DoesContainRev(list, "zzz"); sw.Stop(); Console.WriteLine(String.Format("1: {0}", sw.ElapsedMilliseconds)); sw.Reset(); sw.Start(); DoesContainRev1(list, "zzz"); sw.Stop(); Console.WriteLine(String.Format("2: {0}", sw.ElapsedMilliseconds)); sw.Reset(); sw.Start(); DoesContainFwd(list, "zzz"); sw.Stop(); Console.WriteLine(String.Format("3: {0}", sw.ElapsedMilliseconds)); sw.Reset(); sw.Start(); DoesContainFwd1(list, "zzz"); sw.Stop(); Console.WriteLine(String.Format("4: {0}", sw.ElapsedMilliseconds)); sw.Reset(); sw.Start(); list.Contains("zzz"); sw.Stop(); Console.WriteLine(String.Format("5: {0}", sw.ElapsedMilliseconds)); sw.Reset(); Console.ReadKey(); } public static bool DoesContainRev(ArrayList list, object element) { int count = list.Count; for (int i = count - 1; i >= 0; i--) if (element.Equals(list[i])) return true; return false; } public static bool DoesContainFwd(ArrayList list, object element) { int count = list.Count; for (int i = 0; i < count; i++) if (element.Equals(list[i])) return true; return false; } public static bool DoesContainRev1(ArrayList list, object element) { int count = list.Count; for (int i = count - 1; i >= 0; i--) if (list[i].Equals(element)) return true; return false; } public static bool DoesContainFwd1(ArrayList list, object element) { int count = list.Count; for (int i = 0; i < count; i++) if (list[i].Equals(element)) return true; return false; } } } 
+1
source share

Firstly, you do not use it many times and compare average values.

Secondly, do not split your method until it starts. Thus, compilation time over time is added at runtime.

A true test will run every time and average the results (any number of things can cause one or the other to be slower to start X out of the total number Y), and your assemblies must be pre-processed using ngen.exe .

+11
source share

How do you use .NET 3.5, why are you using ArrayList , not List<string> ?

A few things to try:

  • You could see if foreach helps instead of a for loop
  • You can cache the counter:

     public bool DoesContain(ArrayList list, object element) { int count = list.Count; for (int i = 0; i < count; i++) { if (list[i].Equals(element)) { return true; } return false; } } 
  • You can change the comparison:

     if (element.Equals(list[i])) 

While I do not expect any of them to make a significant (positive) difference, these are the following things that I will try.

Do you need to do this containment test more than once? If so, you can create a HashSet<T> and reuse it.

+5
source share

I'm not sure that you are allowed to publish Reflector code, but if you open the method using Reflector, you will see that it is essentially the same (there are some optimizations for null values, but your test harness doesn't include null values).

The only difference I see is that the call to list[i] checks the bounds on i , while the Contains method does not.

+2
source share

With a good optimizer, there really shouldn't be any differences at all, because the semantics seem the same. However, an existing optimizer may not optimize your function as well as hard-coded Contains optimized. Some points for optimization:

  • Compared to a property, it can be slower each time than counting down and comparing to 0
  • the function call itself has its own performance
  • using iterators instead of explicit indexing can be faster ( foreach loop instead of plain for )
+1
source share

First, if you use pre-known types, I would suggest using generics. So List instead of ArrayList. Under the hood, ArrayList.Contains actually does a bit more than what you do. From the reflector:

 public virtual bool Contains(object item) { if (item == null) { for (int j = 0; j < this._size; j++) { if (this._items[j] == null) { return true; } } return false; } for (int i = 0; i < this._size; i++) { if ((this._items[i] != null) && this._items[i].Equals(item)) { return true; } } return false; } 

Note that it passes itself when passing a null value to the element. However, since all the values ​​in your example are not equal to zero, an additional check for zero at the beginning and in the second cycle theoretically takes longer.

Are you sure you are dealing with fully compiled code? Ie, when your code runs for the first time, it gets a JIT compilation, where, since the structure is obviously already compiled.

+1
source share

After editing, I copied the code and made several improvements. The difference was not reproducible; it turned out to be a measurement / rounding problem.

To see this, change the runs to this form:

  sw.Reset(); sw.Start(); for (int i = 0; i < nr; i++) { DoesContain(list,"zzz"); } total += sw.ElapsedMilliseconds; Console.WriteLine(total / nr); 

I just moved a few lines. The JIT problem was minor with this number of repetitions.

+1
source share

Revised after reading comments:

It does not use some Hash-alogorithm to provide a quick search.

0
source share

I assume that ArrayList is written in C ++ and may use some micro-optimizations (note: this is an assumption).

For example, in C ++ you can use pointer arithmetic (in particular, increment a pointer to an iteration of an array) faster than using an index.

0
source share

Use SortedList<TKey,TValue> , Dictionary<TKey, TValue> or System.Collections.ObjectModel.KeyedCollection<TKey, TValue> for key-based quick access.

 var list = new List<myObject>(); // Search is sequential var dictionary = new Dictionary<myObject, myObject>(); // key based lookup, but no sequential lookup, Contains fast var sortedList = new SortedList<myObject, myObject>(); // key based and sequential lookup, Contains fast 

KeyedCollection<TKey, TValue> also fast and allows you to index your search, however you need to inherit it because it is abstract. Therefore, you need a certain collection. However, with the following, you can create a generic KeyedCollection .

 public class GenericKeyedCollection<TKey, TValue> : KeyedCollection<TKey, TValue> { public GenericKeyedCollection(Func<TValue, TKey> keyExtractor) { this.keyExtractor = keyExtractor; } private Func<TValue, TKey> keyExtractor; protected override TKey GetKeyForItem(TValue value) { return this.keyExtractor(value); } } 

The advantage of using KeyedCollection is that the Add method does not require a key.

0
source share

using the array structure, you cannot perform a search faster than O (n) without any additional information. if you know that the array is sorted, then you can use the binary search algorithm and spend only o (log (n)) otherwise you should use a set.

0
source share

All Articles