LocaleCompare shows inconsistent behavior when sorting words with leading umlaut characters

Tested in recent versions of Firefox and Chrome (which have "de" on my system):

"Ä".localeCompare("A") 

gives me 1, which means that he believes that "Ä" should appear after "A" in sorted order, which is correct.

But:

 "Ägypten".localeCompare("Algerien") 

gives me -1, which means that he thinks "Ägypten" should appear before "Algerien" in sorted order.

Why? Why is it viewed after the first character of each line, if it says that the first character of the first line should appear after the first character of the second line, when you check it yourself?

+5
source share
2 answers

Here you have a method just for your needs, copy this method:

Recursive parsing of strings and giving char the result of locale comparison is not a string :)

FINAL RESULT Error Fixed, comparison was added (incorrect stop or recursive loop) for whole lines:

 String.prototype.MylocaleCompare = function (right, idx){ idx = (idx == undefined) ? 0 : idx++; var run = right.length <= this.length ? (idx < right.length - 1 ? true : false) : (idx < this.length - 1 ? true : false); if (!run) { if (this[0].localeCompare(right[0]) == 0) { return this.localeCompare(right); } else { return this[0].localeCompare(right[0]) } } if(this.localeCompare(right) != this[0].localeCompare(right[0])) { var myLeft = this.slice(1, this.length); var myRight = right.slice(1, right.length); if (myLeft.localeCompare(myRight) != myLeft[0].localeCompare(myRight[0])) { return myLeft.MylocaleCompare(myRight, idx); } else { if (this[0].localeCompare(right[0]) == 0) { return myLeft.MylocaleCompare(myRight, idx); } else { return this[0].localeCompare(right[0]) } } } else { return this.localeCompare(right); } } 
+1
source

http://en.wikipedia.org/wiki/Diaeresis_(diacritic)#Printing_conventions_in_German

"When alphabetically sorting German words, the umlaut usually does not differ from the underlying vowel, although if two words differ only in the umlaut, then the umlauted second appears [...]
"There is a second limited use system, mainly for sorting names (colloquially called" sorting phone directories "), which treats ü as ue , etc."

Assuming the second type of sorting algorithm is applied, the results you see make sense.

Ä will become Ae , and it will be "longer", then your other value is A , so sorting A to Ae and therefore A to Ä will be correct (and as you said, you think it is correct, and even the first algorithm that simply considers Ä as A will also be correct).

Now Ägypten becomes Aegypten for sorting purposes, and therefore it should appear before Algerien in the same sorting logic - the first letters of both terms are equal, so you can determine the sorting order before the second, and e has a lexicographically smaller sorting value than l . For this, Aegypten to Algerien , which means Ägypten to Algerien .


The German Wikipedia speaks even more about this ( http://de.wikipedia.org/wiki/Alphabetische_Sortierung#Einsortierungsregeln_f.C3.BCr_weitere_Buchstaben ) and notes that there are two versions of the corresponding DIN 5007.

DIN 5007-1 states that ä should be considered as A , ö as o and ü as u and that this type of sorting should be used for dictionaries and.

DIN 5007-1 says that ä considered as Ae , etc., and this should be used mainly for name names such as phone books.

Wikipedia further states that this takes into account that there can be more than one form of spelling for personal names (someones last name can be Moeller or Möller, both versions exist), while for words in a dictionary there is usually only one spelling that is considered correct.


Now, I think, the remaining price question is: can I get browsers to use a different sorting form for the German language? To be honest, I do not know.

Perhaps it would be desirable to be able to choose between the two forms of sorting, because since Wikipedia says that there are personal names Moller and Möller, but there is only Ägypten and not Aegypten when it comes to the dictionary.

+1
source

Source: https://habr.com/ru/post/1215062/


All Articles