ICU: transliterate and then delete all non-alphanumeric characters

Can this be done with ICU without returning to regex?

I am currently normalizing file names as follows:

protected function normalizeFilename($filename)
{
    $transliterator = Transliterator::createFromRules(
        'Any-Latin; Latin-ASCII; [:Punctuation:] Remove;'
    );
    $filename = $transliterator->transliterate($filename);
    $filename = preg_replace('/[^A-Za-z0-9_]/', '', $filename);
    return $filename;

}

Is it possible to get rid of regex here and do everything with ICU calls?

+4
source share
1 answer

Use the right tool for the job.

I see nothing wrong with what you are doing now.

ICU transliteration is primarily language oriented. He is trying to keep the point.

Regular expressions, on the other hand, can manipulate characters, giving you confidence that the file name is limited to the selected characters.

.

, , . , , -, .

, , , [:Punctuation:] Remove;. : , . id : Kornilʹev Kirill. , , .

:

  • ICU, ASCII-enquivalent. Latin-ASCII;, id. .
  • , , , , .

.

PS: , , ICU, . .

+5

All Articles