Upper and lower case

Question

Upper and lower case

When making case-insensitive comparisons, is it better to convert the string to uppercase or lowercase? Does it even matter?

This SO post suggests that C # is more efficient with ToUpper because "Microsoft optimized it that way." But I also read this argument that the ToLower vs. conversion ToUpper depends on what your strings contain more, and what usually strings contain a lower one which makes ToLower more efficient.

In particular, I would like to know:

Is there a way to optimize ToUpper or ToLower so that one is faster than the other?
Is it faster to do case-insensitive string comparisons in upper or lower case and why?
Are there any programming environments (e.g. C, C #, Python, whatever) where one case is clearly better than the other, and why?

+79

string language-agnostic uppercase

Parappa Oct 24 '08 at 17:47

source share

10 answers

From Microsoft to MSDN:

Best Practices for Using Strings in the .NET Framework
String Recommendations
Use String.ToUpperInvariant instead of String.ToLowerInvariant when you normalize strings for comparison.

Why? From Microsoft :

Normalize uppercase strings
There is a small group of characters that, when converted to lowercase, cannot travel around the world.

What is an example of such a character who cannot make a trip around the world?

Start : Greek Rho Symbol (U + 03f1) ϱ
Uppercase: Capital Greek Rho (U + 03a1) Ρ
Lowercase: Small Greek Rho (U + 03c1) ρ

ϱ, Ρ , ρ

That is why, if you want to make case insensitive comparisons, you convert the strings to uppercase, not lowercase.

+21

Ian Boyd Jan 02 '13 at

source share

According to MSDN , it is more efficient to pass strings and report this in order to ignore the case:

String.Compare (strA, strB, StringComparison.OrdinalIgnoreCase) is equivalent ( but faster ) to the caller
String.Compare (ToUpperInvariant (strA), ToUpperInvariant (strB), StringComparison.Ordinal).
These comparisons are still very quick.

Of course, if you compare one row over and over again, this may not work.

+19

Rob Walker Oct 24 '08 at 17:54

source share

Based on strings that tend to have lowercase entries, ToLower should theoretically be faster (many comparisons, but several assignments).

In C or when using individually accessible elements of each string (for example, C strings or the STL string type in C ++), this is actually a byte comparison, so the UPPER comparison is no different from lower .

If you were sneaky and loaded your lines into long arrays instead, you would get a very fast comparison over the entire line, because it could compare 4 bytes at a time. However, loading times may make it impractical.

Why do you need to know which is faster? If you don’t perform the metric load of comparisons, one that performs a couple of loops faster is not related to the overall execution speed and sounds like a premature optimization :)

+12

warren Oct. 24 '08 at 17:51

source share

Microsoft optimized ToUpperInvariant() , not ToUpper() . The difference is that the invariant is more culture friendly. If you need to make case-insensitive string comparisons that may vary across cultures, use Invariant, otherwise the performance of the invariant conversion should not matter.

I can’t say if ToUpper () or ToLower () is faster. I never tried, since I never had a situation in which performance meant.

+6

Dan Herbert Oct 24 '08 at 17:56

source share

If you are comparing strings in C #, it is much faster to use .Equals () instead of converting both strings to upper or lower case. Another big plus point for using .Equals () is that no more memory is allocated for 2 new upper / lower case strings.

+4

Jon Tackabury Oct 24 '08 at 17:56

source share

It really shouldn't matter. With ASCII characters, this definitely doesn’t matter - these are just a few comparisons and a little flip for any direction. Unicode can be a little more complicated, as there are some characters that change the case in strange ways, but there really shouldn't be any difference if your text is not filled with these special characters.

+1

Adam Rosenfield Oct 24 '08 at 17:52

source share

Doing this correctly, there should be a small, insignificant advantage in speed if you convert to lowercase, but this, as many have hinted, depends on the culture and does not inherit the function, but in the lines that you convert (a lot of lowercase letters mean several assignments memory) - Converting to uppercase is faster if you have a string with a lot of uppercase letters.

+1

Clearer Jun 04 '10 at 15:48

source share

It depends. As stated above, simple is just ASCII, its identical. In .NET, read and use String.Compare , this is correct for i18n material (cultural and unicode languages). If you know anything about input probability, use the more common case.

Remember that if you are executing multiple lines, then length is a great first discriminator.

0

Sanjaya R Oct 24 '08 at 18:05

source share

If you are dealing with pure ASCII, it does not matter. It's just OR x, 32 vs AND x, 224. Unicode, I have no idea ...

-2

Brian Knoblauch Oct 24 '08 at 17:49

source share

Jon Skeet · Accepted Answer · 2008-10-24 18:28

Converting to uppercase or lowercase to make case-insensitive comparisons is incorrect due to the “interesting” characteristics of some cultures, especially Turkey. Instead, use StringComparer with the appropriate parameters.

MSDN has some excellent string handling guidelines . You can also verify that your code passes the Turkish test .

EDIT: Note that Neil comments on case-insensitive comparisons. All this kingdom is quite muddy :(

Upper and lower case

Best Practices for Using Strings in the .NET Framework

Normalize uppercase strings

More articles: