Invalid characters in XML feed data

I have a feed from which I am retrieving data from a database. It provides data in XML format. However, the data includes “illegal” characters. For instance:

A GREAT NEIGHBOURHOOD â€" WITH A 

or

 large “country style†eat-in 

or

 Garage 14’x32’, large 

or

  OR…….ENDLESS POSSIBILITIES!! 

My question is first, how to determine the encoding of these characters, and secondly, how to change the encoding in accordance with the UTF8 format expected by my database?

EDIT: To be clear, there is no database in this process (at this stage of the process, anyway). The data will be added to the database later, but for now I am just reading the data through a PHP script and printing it on the screen using var_dump .

EDIT 2: data retrieved from RETS channel using PHP PHRETS library

+6
source share
3 answers

The problem is that your UTF-8 response is being processed differently or the database is not configured correctly. Here are some examples of where this can happen and how to fix it.

Before Using Curl

 header("Content-Type: text/html; charset=utf-8"); 

Mysql (my.cnf)

 [client] default-character-set=utf8 [mysql] default-character-set=utf8 [mysqld] collation-server = utf8_unicode_ci init-connect='SET NAMES utf8' character-set-server = utf8 

When creating a database manually

 CREATE DATABASE `your_table_name` DEFAULT CHARACTER SET utf8 COLLATE utf8_polish_ci; 

When using frameworks such as Doctrine

 $conn = array( 'driver' => 'pdo_mysql', 'dbname' => 'test', 'user' => 'root', 'password' => '*****', 'charset' => 'utf8', 'driverOptions' => array(1002=>'SET NAMES utf8') ); 
+7
source

It seems that at some point, the source or XML data, i.e. UTF-8, is processed as ISO-8859-1 and converted to UTF-8. Depending on how you create the feed, this can happen at several points.

The most likely point is the encoding of the database connection. Make sure it is UTF-8.

Another possibility is the header of the type of content you submit.

+4
source

Please add the type of database encoding so that we can respond better.

To determine the type of string encoding, you need to use mb_detect_encoding as follows:

 echo mb_detect_encoding("your-string"); 

You can also use this function to convert from one type of encoding to another,

 $str = mb_convert_encoding($str, $source_encode, $destination_encode); 
+4
source

All Articles