Removing ASCII Hidden Characters [PHP or MySQL]

I am having a problem with hidden characters (spaces) other than ASCII in my database.

How can I replace them with regular spaces and convert them before inserting them to avoid future problems?

I'm still not 100% sure what is happening, but I think this is not in ASCII spaces. Any tips on tracking it will help.


Here's what happens:

I have a database with keywords, and if I search for "test keyword", nothing appears. I know that the "test keyword" is in the database.

If I search for “test” or “keyword”, it will be displayed.

If I make a request with:

SELECT * FROM keywords WHERE keyword regexp '[^ -~]'; ( found here )

It will display a “test keyword” - giving me the conclusion that there is a non-ASCII character with a space in the “keyword”.

+4
source share
4 answers

This works with PHP:

 str_replace("\xA0", ' ', $keyword) 

Now I am trying to replace all existing ones in the database.

I think this should work, but it is not:

 update keywords set keyword = replace(keyword, char(160), " ") WHERE keyword regexp char(160); 

Any ideas?

+5
source

I had the same problem and managed to create an update request to replace (in my case) non-exhaustive spaces.

First, I analyzed the binary values ​​of the strings that contained these characters (I used the Mysql workbench "Open value in editor"). I realized that in my case, the char (s) that I wanted to replace has the hexadecimal value 'a0'.

Then I went to this page http://www.fileformat.info/info/unicode/char/a0/charset_support.htm and checked all the encodings that interpret a0 as unpackable space.

Next I built this query

 UPDATE keywords SET keyword = TRIM(REPLACE(keyword, CONVERT(char(160) USING hp8), ' ')); 

I chose hp8, but utf8 also worked.

It took me a while to reach this solution ... so I hope this helps someone with the same problem and not lose their mind trying to figure out a solution.

+3
source

What about:

 update keywords set keyword = replace(keyword, char(160), ' ') WHERE keyword LIKE concat('%',char(160),'%'); 
+1
source

Do you want to delete all non-alphanumeric characters?

 $string = "Here! is some text, and numbers 12345, and symbols !£$%^&"; $new_string = preg_replace("/[^a-zA-Z0-9\s]/", "", $string); 
0
source

All Articles