I found a solution that worked in all my test cases (copied from http://php.net/manual/en/transliterator.transliterate.php ):
var_dump(transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0080-\u7fff] remove', "A æ Übérmensch på høyeste nivå! PHP! . fi ¦"));
see http://www.php.net/normalizer
EDIT: This solution is not dependent on a set of locales using setlocale (). Another advantage over iconv () is that even non-Latin characters are not ignored.
EDIT2: I found that there are some characters that are not covered by the transliteration I published originally. Any-Latin translates a Cyrillic character to a character that does not fit into the Latin character set: ʹ ( http://en.wikipedia.org/wiki/Prime_%28symbol%29 ). I added [\u0100-\u7fff] remove to remove all these non-Latin characters. I also added a test to the text;)
I suggest that they mean the Latin alphabet, and not one of the Latin characters in Latin here. But in any case - in my opinion, they should transliterate it into something ASCII, and then in Latin-ASCII ...
EDIT3: Sorry for another change here. I had to take characters up to u0080 instead of u0100 to get only ASCII characters as output. The above test has been updated.
SimonSimCity Apr 15 '13 at 18:40 2013-04-15 18:40
source share