PHP loading a DomDocument XML document with broken XML data

How do you deal with broken data in XML files? For example, if I had

<text>Some &improper; text here.</text> 

I am trying to do:

  $doc = new DOMDocument(); $doc->validateOnParse = false; $doc->formatOutput = false; $doc->load(...xml'); 

and he fails because there is an unknown entity. Note that I cannot use CDATA because of the way the software is written. I write a module that reads and writes XML, and sometimes the user inserts the wrong text.

I noticed that DOMDocument-> loadHTML () encodes everything beautifully, but how can I continue from there?

+4
source share
3 answers

Perhaps you can use preg_replace_callback for hard work with objects for you:

http://php.net/manual/en/function.preg-replace-callback.php

 function fixEntities($data) { switch(substr($data, 1, strlen($data) - 2)) { case 'amp': case 'lt': case 'gt': case 'quot': // etc., etc., etc. return $data; } return ''; } $xml = preg_replace_callback('/&([a-zA-Z0-9#]*);{1}/', 'fixEntities', $xml); 
0
source

Use htmlspecialchars to serialize xml special characters before embedding input in your xml / xhtml dom. Although his name is prefixed with "html" based on the only characters he replaces, it is really useful for serializing XML data.

+1
source

If you are the one who writes xml, there should be no problem, since you can encode any user input into entities before putting it into xml.

0
source

All Articles