Problem with simpleXML and entity not defined

I am trying to parse an XML file, but when I load it, simpleXML gives the following warning:

Warning: simplexml_load_file () [function.simplexml-load-file]: gpr_545.xml: 55: parser error: "Oslash" object is not defined in import.php on line 35

This is that line:

<forenames>B&Oslash;IE</forenames><x> </x> 

Like this warning, I could ignore it, but I would like to understand what is happening.

+2
source share
5 answers

HTML Latin1 character encoding (for example, the Ø that describes this character) is what violated the XML parser. If you control the data, you need to avoid this by using an XML-style character encoding (Ø it just happens and # 216;)

+2
source

HTML objects such as & Oslash do not match XML entities. Here is a table for replacing HTML objects with XML objects.

As I can say from one of your comments on another post, you are having problems with the & sol; entity. I don’t know if this is even a valid HTML entity, my Firefox does not display the symbol - it only gives the name of the entity. But I found another table for most objects and their symbol reference number. Try adding them to the substitution table and you should be safe. & Zola; reference number / by the way.

+3
source

I think this is a coding problem. php, simplexml in this particular case, doesn't like Danish O, which you have in the fornames tag. You can try to encode the whole file in utf-8 and remove this version from the tag. Aferwards, you can read the file with the file with a fully escaped character in simplexml.

TO

+2
source

I had a very similar problem and it was solved as follows. The main idea was to load the file into a string, replace all the bad objects with something like "[[entity]] Oslash;" and do a reverse swap before displaying some xml node.

 function readXML($filename){ $xml_string = implode("", file($filename)); $xml_string = str_replace("&", "[[entity]]", $xml_string); return simplexml_load_string($xml_string); } function xml2str($xml){ $str = str_replace("[[entity]]", "&", (string)$xml); $str = iconv("UTF-8", "WINDOWS-1251", $str); return $str; } $xml = readXML($filename); echo xml2str($xml->forenames); 

iconv ("UTF-8", "WINDOWS-1251", $ str), because my page has the encoding "WINDOWS-1251"

+1
source

Try using this line:

 <forenames><![CDATA[B&Oslash;IE]]></forenames><x> </x> 

and read it about CDATA

0
source

All Articles