I make the assumption that you are sorting according to English sorting rules and have alphabetical text. The code below is a good start, but the real world is more complex. (For example, the Chinese text has different lexicographic rules depending on the context, for example, a general dictionary, lists of karaoke songs, an electronic list of doorbell names, ...) I can not imagine the ideal solution, because the question had so little information.
use 5.010; use utf8; use Unicode::Collate::Locale 0.96; use Unicode::Normalize qw(normalize); my $c = Unicode::Collate::Locale->new(locale => 'en'); say for $c->sort(qw( eye egg estate etc. eleven eg England ensure educate each equipment elephant ex- ending écrit )); say '-' x 40; for my $word (qw(écrit Ëmëhntëhtt-Rê Ênio ècole Ēadƿeard Ėmma Ędward Ẽfini)) { say sprintf '%s should be stored under the heading %s', $word, ucfirst substr normalize('D', $word), 0, 1; } __END__ each écrit educate eg egg elephant eleven ending England ensure equipment estate etc. ex- eye ---------------------------------------- écrit should be stored under the heading E Ëmëhntëhtt-Rê should be stored under the heading E Ênio should be stored under the heading E ècole should be stored under the heading E Ēadƿeard should be stored under the heading E Ėmma should be stored under the heading E Ędward should be stored under the heading E Ẽfini should be stored under the heading E
daxim source share