HTML pound symbol from database displayed as? even with charset = UTF-8

We have a bunch of database data that someone manually entered. They contain many characters of the British pound (£). The original user was copying / pasting the pound sign from somewhere, not sure where (I'm not sure if this matters or not ...).

In any case, when printing data on a PHP page, the pound signs are displayed as a replacement character . There is <meta charset="utf-8"/> on the page. In the browser, if you change the encoding to ISO-8859-1 , then the pound signs will appear correctly.

After some digging, I came to the conclusion that the original data entry person copied / pasted the ISO-8859-1 encoded pound sign into the database. Therefore, if a page is not displayed using ISO-8859-1 , it will not display correctly.

Here is the header information from Chrome:

 Request URL:http://www.mysite.com/test.php Request Method:GET Status Code:200 OK Request Headersview source Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3 Accept-Encoding:gzip,deflate,sdch Accept-Language:en-US,en;q=0.8 Cache-Control:max-age=0 Connection:keep-alive Cookie:X-Mapping-goahf.... Host:www.mysite.com User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2 Response Headersview source Connection:Keep-Alive Content-Type:text/html; charset=UTF-8 Date:Wed, 07 Dec 2011 22:38:14 GMT Server:Apache/2.2 Transfer-Encoding:chunked 

The MySQL table also states that it uses latin1_swedish_ci , which was the default.

So how do I solve this problem? I don’t know much about how character encoding works and what happens when you copy / paste characters from one place to another.

I tried to go to this page:

http://www.fileformat.info/info/unicode/char/a3/browsertest.htm

And having copied the pound symbol and pasted it into the database, thinking that it will fix it, but it didn’t seem to me ... How to make the pound symbol that is in the database instead of the pound symbol UTF-8 ISO-8859-1?

+7
source share
2 answers

It doesn't matter where the original pound sign was copied from. It does not even matter in which encoding it is stored in the database. The database works at the character level, which means that if you ask it to save the £ symbol, it will save the £ symbol; exactly how this happens behind the scenes and what encoding used for this is an implementation detail that does not matter.

What you are missing is that there is a connection encoding. When you connect to a database, you speak to it implicitly or explicitly using a specific character set. This means that any bytes that you send to the database are expected to represent characters in this encoding (so the database knows what characters it should receive), and any text data that you get from the database will be encoded in this encoding (so you know how you should view the results). By default, Latin-1 (aka ISO-8859-1) is often used for this connection encoding. Therefore, when you get the E character from the database, it converts it on the fly to Latin-1, regardless of what encoding was stored in the database. This way you get the £ sign encoded in Latin-1, and output it as is on your page, but you tell the browser to interpret the page as UTF-8. This, of course, leads to a misinterpreted character.

You can change the default settings in different ways, either in the MySQL configuration, using certain methods in your client library (which you did not specify), or by issuing the SET NAMES utf8; query SET NAMES utf8; after connecting to the database.

+5
source

You cannot just take raw text in one encoding and use the utf8 meta tag to display it.

I do not know what the encoding latin1_swedish_ci is, but perhaps it is an alias of iso-8859-1. Therefore, either you convert the encoding to UTF-8, or correct the meta tag to show the correct encoding.

If you are going to convert it, I suggest iconv . You may need to make sure mysql also knows the new encoding. Someone seems to have gone through it http://drupal.org/node/62258

+1
source

All Articles