Upper and lower case

When making case-insensitive comparisons, is it better to convert the string to uppercase or lowercase? Does it even matter?

This SO post suggests that C # is more efficient with ToUpper because "Microsoft optimized it that way." But I also read this argument that the ToLower vs. conversion ToUpper depends on what your strings contain more, and what usually strings contain a lower one which makes ToLower more efficient.

In particular, I would like to know:

  • Is there a way to optimize ToUpper or ToLower so that one is faster than the other?
  • Is it faster to do case-insensitive string comparisons in upper or lower case and why?
  • Are there any programming environments (e.g. C, C #, Python, whatever) where one case is clearly better than the other, and why?
+79
string language-agnostic uppercase
Oct 24 '08 at 17:47
source share
10 answers

Converting to uppercase or lowercase to make case-insensitive comparisons is incorrect due to the “interesting” characteristics of some cultures, especially Turkey. Instead, use StringComparer with the appropriate parameters.

MSDN has some excellent string handling guidelines . You can also verify that your code passes the Turkish test .

EDIT: Note that Neil comments on case-insensitive comparisons. All this kingdom is quite muddy :(

+88
Oct 24 '08 at 18:28
source share

From Microsoft to MSDN:

Best Practices for Using Strings in the .NET Framework

String Recommendations

Why? From Microsoft :

Normalize uppercase strings

There is a small group of characters that, when converted to lowercase, cannot travel around the world.

What is an example of such a character who cannot make a trip around the world?

  • Start : Greek Rho Symbol (U + 03f1) ϱ
  • Uppercase: Capital Greek Rho (U + 03a1) Ρ
  • Lowercase: Small Greek Rho (U + 03c1) ρ

ϱ, Ρ , ρ

That is why, if you want to make case insensitive comparisons, you convert the strings to uppercase, not lowercase.

+21
Jan 02 '13 at
source share

According to MSDN , it is more efficient to pass strings and report this in order to ignore the case:

String.Compare (strA, strB, StringComparison.OrdinalIgnoreCase) is equivalent ( but faster ) to the caller

String.Compare (ToUpperInvariant (strA), ToUpperInvariant (strB), StringComparison.Ordinal).

These comparisons are still very quick.

Of course, if you compare one row over and over again, this may not work.

+19
Oct 24 '08 at 17:54
source share

Based on strings that tend to have lowercase entries, ToLower should theoretically be faster (many comparisons, but several assignments).

In C or when using individually accessible elements of each string (for example, C strings or the STL string type in C ++), this is actually a byte comparison, so the UPPER comparison is no different from lower .

If you were sneaky and loaded your lines into long arrays instead, you would get a very fast comparison over the entire line, because it could compare 4 bytes at a time. However, loading times may make it impractical.

Why do you need to know which is faster? If you don’t perform the metric load of comparisons, one that performs a couple of loops faster is not related to the overall execution speed and sounds like a premature optimization :)

+12
Oct. 24 '08 at 17:51
source share

Microsoft optimized ToUpperInvariant() , not ToUpper() . The difference is that the invariant is more culture friendly. If you need to make case-insensitive string comparisons that may vary across cultures, use Invariant, otherwise the performance of the invariant conversion should not matter.

I can’t say if ToUpper () or ToLower () is faster. I never tried, since I never had a situation in which performance meant.

+6
Oct 24 '08 at 17:56
source share

If you are comparing strings in C #, it is much faster to use .Equals () instead of converting both strings to upper or lower case. Another big plus point for using .Equals () is that no more memory is allocated for 2 new upper / lower case strings.

+4
Oct 24 '08 at 17:56
source share

It really shouldn't matter. With ASCII characters, this definitely doesn’t matter - these are just a few comparisons and a little flip for any direction. Unicode can be a little more complicated, as there are some characters that change the case in strange ways, but there really shouldn't be any difference if your text is not filled with these special characters.

+1
Oct 24 '08 at 17:52
source share

Doing this correctly, there should be a small, insignificant advantage in speed if you convert to lowercase, but this, as many have hinted, depends on the culture and does not inherit the function, but in the lines that you convert (a lot of lowercase letters mean several assignments memory) - Converting to uppercase is faster if you have a string with a lot of uppercase letters.

+1
Jun 04 '10 at 15:48
source share

It depends. As stated above, simple is just ASCII, its identical. In .NET, read and use String.Compare , this is correct for i18n material (cultural and unicode languages). If you know anything about input probability, use the more common case.

Remember that if you are executing multiple lines, then length is a great first discriminator.

0
Oct 24 '08 at 18:05
source share

If you are dealing with pure ASCII, it does not matter. It's just OR x, 32 vs AND x, 224. Unicode, I have no idea ...

-2
Oct 24 '08 at 17:49
source share



All Articles