IE8 does not handle ignoreCase RegExps in Greek

If I take several Greek month names and make a case-insensitive regular expression out of them, they will not match the same month in uppercase:

<!doctype html> <html> <head> </head> <body> <pre></pre> <script> var names = [ 'Μάρτιος', 'Μάιος', 'Ιούνιος', 'Ιούλιος', 'Αύγουστος', 'Νοέμβριος' ]; var pre = document.getElementsByTagName('pre')[0]; var i; for (i = 0; i < names.length; ++i) { var m = names[i]; var r = new RegExp(m, 'i'); pre.innerHTML += m + ' ' + r.test(m.toLocaleUpperCase()) + '\n'; } </script> </body> </html> 

In Ie8, this prints names and then false. In other browsers, it displays true.

+7
javascript regex internet-explorer-8 internationalization
source share
3 answers

Just use .toUpperCase() instead of .toLocaleUpperCase() .

The latter translates Μάρτιος to ΜΆΡΤΙΟΣ , the former translates it to ΜΆΡΤΙΟς .

Which option is right, I can’t say because I don’t know the capitalization rules for ς .

+5
source share

Well, all of my available versions of IE translate Μάρτιος always to ΜΆΡΤΙΟς , even when using .toUpperCase() .

I assume the problem is variations on some letters ( http://de.wikipedia.org/wiki/Griechisches_Alphabet#Klassische_Zeichen ).

For example, the letters Σ σ Σ and ς are all “sigma”. The first are classic, the others are options. Another example would be Β, β and β for "Beta".

To make sure that these options are recognized, I recommend spoofing before creating a regular expression.

Here I made a short (possibly incomplete) helper function to do this

 function regextendVariants(s) { var variants = [ ['β', 'ϐ'], ['ε', 'ϵ'], ['θ', 'ϑ'], ['κ', 'ϰ'], ['π', 'ϖ'], ['ρ', 'ϱ'], ['σ', 'Ϲ', 'ς'], ['φ', 'ϕ'] ]; for (var j = 0; j < variants.length; j++) { var variant = variants[j]; for (var k = 1; k < variant.length; k++) { s = s.replace(variant[k], '['+variant.join('')+']'); } } return s; } 

This function converts your strings to

  • Μάρτιο [σΣς]
  • Μάιο [σΣς]
  • Ιούνιο [σΣς]
  • Ιούλιο [σΣς]
  • Αύγουστο [σΣς]
  • Νοέμβριο [σΣς]

These lines allow you to use different variants of the same letter. I'm sure this is grammatically incorrect, but it needs to be more durable to fit the lines.

In your code you have to replace

 var r = new RegExp(m, 'i'); 

from

 var r = new RegExp(regextendVariants(m), 'i'); 

As I said, my versions of IE do not make a mistake, so I can not promise you that this will be the final solution to your problem, I hope it will;)

+1
source share

ς is \xCF\x82 in UTF-8 or U+03C2 as the hexadecimal value of the Unicode codeword present since Unicode 1.1.

Writing Unicode Character Data (UCD) to SpecialCasing.txt for this:

 # <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment> 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA 

where U+03A3 is the Greek letter for sigma ( Σ ). This is determined by at least Unicode 2.1 Update 3 ( http://www.unicode.org/Public/2.1-Update3/SpecialCasing-1.txt ), so IE8 must support case mapping.

Therefore, Σ is the correct capital letter for ς .

The MSDN documentation for toUpperCase and toLocaleUpperCase says that both use Unicode event mappings. The toLocaleUpperCase function uses system language case comparisons if there is a conflict with the current system language (for example, for some Turkish comparisons). Thus, if you just need Unicode mappings, you should use toUpperCase .

+1
source share

All Articles