Get xml children without replacing html objects in PHP
I have this code:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> $strXml = ' <root> <kid><div>ABC•></div></kid> <kid2>DEF</kid2> </root>'; $objXml = new SimpleXMLElement($strXml); $arrNodes = $objXml->xpath('/root/*'); foreach($arrNodes as $objNode) { /* @var $objNode SimpleXMLElement */ echo $objNode->asXML(); } The code retrieves the first children of the root and displays the contents. The problem is that html objects are converted to characters. Is there a way that the code outputs the source XML content without any conversion?
Is there a way that the code outputs the source XML content without any conversion?
Not.
Beyond: Why does it bother you? They are one and the same symbol.
SimpleXML / DOMDocument / etc will always convert these objects, because numbered objects are not valid XML.
So either:
- Epic search and replacement.
- Or perhaps fix everything that generates XML?
This strikes me as a truly strange behavior, and I am unable to find the information.
This seems to affect all relevant XML materials. It is also worth noting that characters are stored as regular characters after parsing XML:
php > print_r($objXml); SimpleXMLElement Object ( [kid] => SimpleXMLElement Object ( [div] => ABCβ’> ) [kid2] => DEF ) ... they are written as objects when XML is converted to text. I assume that everyone uses the same internal procedure to convert to text.
If you really need this function, you can create your own function to avoid characters, something like this:
// function to escape some utf8 characters with xml character reference function xmlCharEncode($string) { $out = ''; $len = mb_strlen($string, 'UTF-8'); for ($i = 0; $i < $len; $i++) { $char = mb_substr($string, $i, 1, 'UTF-8'); $convmap = array( 60, 60, 0, 0xffff, // < 62, 62, 0, 0xffff, // > 38, 38, 0, 0xffff, // ampersand // you may want to filter quotes or other characters here 127, 0xffff, 0, 0xffff, // everything after basic latin ); $enc = mb_encode_numericentity($char, $convmap, 'UTF-8'); $out .= $enc; } return $out; } ... and then use the XMLReader and XMLWriter to write the XML using your custom character escape procedure:
// read and write your xml string $r = new XMLReader(); $w = new XMLWriter(); $r->xml($strXml); $w->openMemory(); while($r->read()) { switch ($r->nodeType) { // write elements, attributes, and text nodes case XMLReader::ELEMENT: $w->startElement($r->name); while ($r->moveToNextAttribute()) { echo $w->outputMemory(true); $w->writeAttribute($r->name, $r->value); } break; case XMLReader::END_ELEMENT: $w->endElement(); break; case XMLReader::TEXT: $w->writeRaw(xmlCharEncode($r->value)); // the magic happens here break; } echo $w->outputMemory(true); } I'm not sure if it is worth it, but at least you have an idea of ββwhat things can be done to make it work.
This will work with your original example, by the way.