C # static string performance [] contains () (slooooow) vs. == operator

Question

C # static string performance [] contains () (slooooow) vs. == operator

Just a quick request: I had a piece of code that compared a string with a long list of values, for example.

if(str == "string1" || str == "string2" || str == "string3" || str == "string4". DoSomething();

And the interest of code clarity and maintainability I changed it to

 public static string[] strValues = { "String1", "String2", "String3", "String4"}; ... if(strValues.Contains(str) DoSomething();

Only in order to find the execution time of the code, it took from 2.5 seconds to 6.8 seconds. (performed approximately 200,000 times).
Of course, I understand a small compromise, but 300%?
Anyway, could I define static strings differently to improve performance?
Greetings.

+6

c #

Andrew White May 20, '10 at 22:30

source share

6 answers

Both of the methods you tried have O (n) performance, so they will slow down when additional lines are added. If you are using .NET 3.5 or later, you can try instead of HashSet<string> and initialize it once at the beginning of the application. Then you can get approximately O (1) searches.

For .NET v2.0, you can emulate a HashSet with Dictionary<string, object> and use ContainsKey and not use the value.

+5

Mark byers May 20, '10 at 22:31

source share

Do you really use this 200,000 times in production code? You might want to consider hash checks as a faster negative check, if that is the case.

If it were just 200,000 times just to illustrate the difference, then I would not worry about that. This is only 0.02 milliseconds increasing in time.

Contains is more versatile than testing static lines, so there is a small amount of overhead. If this code is not a bottleneck, as Mark noted, it should not be optimized. There is a famous quote in CS: "premature optimization is the root of all evil." The quote is not entirely accurate, but it is a good reminder of the final recommendation for optimization: first measure.

+5

Stephen cleary May 20, '10 at 22:32

source share

Here is an alternative that you might find readable and supported, which you might want to check for speed. If you check it for speed, send your result!

  switch (str) { case "String1": case "String2": case "String3": case "String4": DoSomething(); break; }

+2

whybird May 20, '10 at 22:57

source share

Although using a HashSet<string> as suggested may be a better option, the reason strValues.Contains(str) is slower is because it is a common extension method. There is no such thing as the Contains method on arrays.

How it works for arrays, mostly

 if (strValues is ICollection<string>) // true { return ((ICollection<string>) strValues).Contains(str); }

which adds typecheck type, typecast and virtual call. Then it will iterate over the array (causing border checks). Only then will this happen to compare strings. Thus, he does a lot more work.

Note that in C # 3 (which you should use if using extension methods), you can simply initialize the HashSet<string> as follows:

 public static HashSet<string> strValues = new HashSet<string> { "String1", "String2", "String3", "String4" };

This allows your program to be as readable as using arrays now.

+1

Ruben May 20, '10 at 23:56

source share

You may find that Contains () works better for a longer list. It can, for example, sort the list and perform a binary search (for example, a thought experiment, for example).

0

whybird May 20, '10 at 22:43

source share

Rusty · Accepted Answer · 2010-05-21T16:07:59+0000

Fyi ..

Using:

 private static HashSet<string> strHashSet = new HashSet<string>() { "0string", "1string", "2string", "3string", "4string", "5string", "6string", "7string", "8string", "9string", "Astring", "Bstring" }; private static List<string> strList = strHashSet.ToList(); private static string[] strArray = strList.ToArray(); private static Dictionary<int, string> strHashDict = strHashSet.ToDictionary(h => h.GetHashCode()); private static Dictionary<string, string> strDict = strHashSet.ToDictionary(h => h); // Only one test uses this method. private static bool ExistsInList(string str) { return strHashDict.ContainsKey(str.GetHashCode()); }

Checking the first and last lines in a list, and then checking a line that is not on the list: "xstring" Performs 500,000 iterations, all in milliseconds.

 1.A Test: result = (str == "0string" || str == "1string" ... [storage var] [first]:[ last ]:[ none ]:[average] strArray 3.78 : 45.90 : 57.77 : 35.82 2.A Test: ExistsInList(string); [storage var] [first]:[ last ]:[ none ]:[average] none 36.14 : 28.97 : 24.02 : 29.71 3.A Test: .ContainsKey(string.GetHashCode()); [storage var] [first]:[ last ]:[ none ]:[average] strHashDict 34.86 : 28.41 : 21.46 : 28.24 4.A Test: .ContainsKey(string); [storage var] [first]:[ last ]:[ none ]:[average] strDict 38.99 : 32.34 : 22.75 : 31.36 5.A Test: .Contains(string); [storage var] [first]:[ last ]:[ none ]:[average] strHashSet 39.54 : 34.78 : 24.17 : 32.83 strList 23.36 : 122.07 : 127.38 : 90.94 strArray 350.34 : 426.29 : 426.05 : 400.90 6.A Test: .Any(p => p == string); [storage var] [first]:[ last ]:[ none ]:[average] strHashSet 75.70 : 331.38 : 339.40 : 248.82 strList 72.51 : 305.00 : 319.29 : 232.26 strArray 38.49 : 213.63 : 227.13 : 159.75

Interesting (if not unexpected) results when changing rows in a list:

 private static HashSet<string> strHashSet = new HashSet<string>() { "string00", "string01", "string02", "string03", "string04", "string05", "string06", "string07", "string08", "string09", "string10", "string11" };

With "string99" as a check, no.

 1.B Test: result = (str == "string00" || str == "string01" ... [storage var] [first]:[ last ]:[ none ]:[average] strArray 85.45 : 87.06 : 91.82 : 88.11 2.B Test: ExistsInList(string); [storage var] [first]:[ last ]:[ none ]:[average] none 30.12 : 27.97 : 21.36 : 26.48 3.B Test: .ContainsKey(string.GetHashCode()); [storage var] [first]:[ last ]:[ none ]:[average] strHashDict 32.51 : 28.00 : 20.83 : 27.11 4.B Test: .ContainsKey(string); [storage var] [first]:[ last ]:[ none ]:[average] strDict 36.45 : 32.13 : 22.39 : 30.32 5.B Test: .Contains(string); [storage var] [first]:[ last ]:[ none ]:[average] strHashSet 37.29 : 34.33 : 23.56 : 31.73 strList 23.34 : 147.75 : 153.04 : 108.04 strArray 349.62 : 460.19 : 459.99 : 423.26 6.B Test: .Any(p => p == string); [storage var] [first]:[ last ]:[ none ]:[average] strHashSet 76.26 : 355.09 : 361.31 : 264.22 strList 70.20 : 332.33 : 341.79 : 248.11 strArray 37.23 : 234.70 : 251.81 : 174.58

For cases where A and B look like tests 2 and 3 take precedence.

However, HashSet.Contains (string) is very efficient, not executed by the contents of the list, and has clear syntax ... might be a better choice.

Yes, it's true, I have no life.

C # static string performance [] contains () (slooooow) vs. == operator

More articles: