$str...">

Get xml children without replacing html objects in PHP

I have this code:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> $strXml = ' <root> <kid><div>ABC&#8226;&#62;</div></kid> <kid2>DEF</kid2> </root>'; $objXml = new SimpleXMLElement($strXml); $arrNodes = $objXml->xpath('/root/*'); foreach($arrNodes as $objNode) { /* @var $objNode SimpleXMLElement */ echo $objNode->asXML(); } 

The code retrieves the first children of the root and displays the contents. The problem is that html objects are converted to characters. Is there a way that the code outputs the source XML content without any conversion?

+7
source share
3 answers

Is there a way that the code outputs the source XML content without any conversion?

Not.

Beyond: Why does it bother you? They are one and the same symbol.

+1
source

SimpleXML / DOMDocument / etc will always convert these objects, because numbered objects are not valid XML.

So either:

  • Epic search and replacement.
  • Or perhaps fix everything that generates XML?
0
source

This strikes me as a truly strange behavior, and I am unable to find the information.

This seems to affect all relevant XML materials. It is also worth noting that characters are stored as regular characters after parsing XML:

 php > print_r($objXml); SimpleXMLElement Object ( [kid] => SimpleXMLElement Object ( [div] => ABCβ€’> ) [kid2] => DEF ) 

... they are written as objects when XML is converted to text. I assume that everyone uses the same internal procedure to convert to text.

If you really need this function, you can create your own function to avoid characters, something like this:

 // function to escape some utf8 characters with xml character reference function xmlCharEncode($string) { $out = ''; $len = mb_strlen($string, 'UTF-8'); for ($i = 0; $i < $len; $i++) { $char = mb_substr($string, $i, 1, 'UTF-8'); $convmap = array( 60, 60, 0, 0xffff, // < 62, 62, 0, 0xffff, // > 38, 38, 0, 0xffff, // ampersand // you may want to filter quotes or other characters here 127, 0xffff, 0, 0xffff, // everything after basic latin ); $enc = mb_encode_numericentity($char, $convmap, 'UTF-8'); $out .= $enc; } return $out; } 

... and then use the XMLReader and XMLWriter to write the XML using your custom character escape procedure:

 // read and write your xml string $r = new XMLReader(); $w = new XMLWriter(); $r->xml($strXml); $w->openMemory(); while($r->read()) { switch ($r->nodeType) { // write elements, attributes, and text nodes case XMLReader::ELEMENT: $w->startElement($r->name); while ($r->moveToNextAttribute()) { echo $w->outputMemory(true); $w->writeAttribute($r->name, $r->value); } break; case XMLReader::END_ELEMENT: $w->endElement(); break; case XMLReader::TEXT: $w->writeRaw(xmlCharEncode($r->value)); // the magic happens here break; } echo $w->outputMemory(true); } 

I'm not sure if it is worth it, but at least you have an idea of ​​what things can be done to make it work.

This will work with your original example, by the way.

0
source

All Articles