Remove garbage characters in utf

I use utf8 format to store all my data in mysql. Before the data is inserted into the database, I need to clear the lines with unnecessary characters. Strings are in utf8 format. I know how to use regex and string replace, but I don't know how to work with Arabic characters.

An example of a line to be cleared: "████ .. القوانين الجديدة في قسم العناي";

Thank you

+1
php regex arabic
source share
1 answer

Ok As @Jonathan Leffler already mentioned, if you can specify Unicode character ranges for the characters you want to replace, you can use a regular expression to replace characters with an empty string.

The Unicode character is specified as \x{FFFF} in the expression (in PHP). In addition, you must set the u modifier so that PHP treats the template as UTF8.

So, in the end, you have something like this:

 preg_replace('/[\x{FFFF}-\x{FFFF}]+/u','',$string); 

Where

  • /.../u - delimiters plus modifier
  • [...]+ is a symbol-symbol plus a quantifier, which means the coincidence of any of these symbols for one or more time
  • \x{FFFF}-\x{FFFF} is the range of Unicode characters (obviously, you must specify the correct codes / character numbers).

You can also negate the group with ^ , you can specify the range you want to keep:

 preg_replace('/[^\x{FFFF}-\x{FFFF}]+/u','',$string); 

Additional Information:

+4
source share

All Articles