Tom Christiansen is an active contributor to StackOverflow and answers many Perl questions. There is a good chance that he will answer this question.
Certain character sequences, such as ff , can be represented in UTF-8 as two Unicode characters f and f , or as one Unicode character ( ff ). When you decompose your characters, you do things like ff , become two separate characters that are important for sorting. You want it to be two separate letters f when sorting.
When you recompose UTF-8 f and f , they return to the same UTF-8 character, which will be important for display (you want them to be well formatted) and for editing (you want to edit it as one character).
Unfortunately, my theory falls apart for things like Spanish -. This is represented as U + 00F1 as a single character and decomposes into U + 006E (n) and U + 0303 (in-place ~). Perl may have built-in logic to handle this type of two UTF-8 markup characters.
David W.
source share