You want to use SimpleXMLElement to extract data from XML and convert it to an array.
This is generally possible, but there are some caveats. Before XML Namespaces, your XML comes with CDATA. To convert XML to an array using Simplexml, you need to convert CDATA to text when loading an XML string. This is done using the LIBXML_NOCDATA flag. Example:
$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA); print_r($xml);
This gives you the following result:
SimpleXMLElement Object ( [@attributes] => Array ( [version] => 2.0 ) [title] => Blah [description] => Blah )
As you can see, there is no good form for representing attributes in an array, so Simplexml by convention puts them in the @attributes key.
Another problem is to handle these multiple XML namespaces. The previous example did not use a special namespace. This is the default namespace. When you convert SimpleXMLElement to an array, the SimpleXMLElement namespace is used . Since none of these were explicitly specified, the default namespace was used.
But if you specify a namespace when creating the array, that namespace will be accepted.
Example:
$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA, "http://base.google.com/ns/1.0"); print_r($xml);
This gives you the following result:
SimpleXMLElement Object ( [id] => Blah [product_type] => Blah )
As you can see, this time the namespace specified when creating SimpleXMLElement is used in array conversion: http://base.google.com/ns/1.0 .
When you write, you want all namespaces from the document to be taken into account, you must first get them, including the default:
$xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA); $namespaces = [null] + $xml->getDocNamespaces(true);
Then you can iterate over all namespaces and recursively merge them into the same array , shown below:
$array = []; foreach ($namespaces as $namespace) { $xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA, $namespace); $array = array_merge_recursive($array, (array) $xml); } print_r($array);
This finally should create and output an array of your choice:
Array ( [@attributes] => Array ( [version] => 2.0 ) [title] => Blah [description] => Blah [id] => Blah [product_type] => Blah )
As you can see, this is entirely possible with SimpleXMLElement . However, it is important to understand how SimpleXMLElement is converted to an array (or serialized to JSON, which follows the same rules). To simulate a SimpleXMLElement -to-array transformation, you can use print_r for quick output.
Note that not all XML constructs can be equally well converted to an array. This is not a specific limitation of Simplexml, but what XML structures can represent and which structures an array can represent.
Therefore, it is often better to store XML inside an object, such as SimpleXMLElement (or DOMDocument ) for accessing and processing data, rather than an array.
However, itโs great to convert the data to an array as long as you know what you are doing and you donโt need to write a lot of code to access the elements deeper down the tree in the structure. Otherwise, SimpleXMLElement should be preferred to an array, since it provides dedicated access not only to many XML functions, but also queries the database with the SimpleXMLElement::xpath method . You will need to write many lines of native code to access the data inside the XML tree, which is convenient in an array.
To get the best of both worlds, you can extend SimpleXMLElement to your specific conversion needs:
$buffer = <<<BUFFER <?xml version="1.0" encoding="utf-8" ?> <rss version="2.0" xmlns:g="http://base.google.com/ns/1.0"> ... <g:id><![CDATA[Blah]]></g:id> <title><![CDATA[Blah]]></title> <description><![CDATA[Blah]]></description> <g:product_type><![CDATA[Blah]]></g:product_type> </rss> BUFFER; $feed = new Feed($buffer, LIBXML_NOCDATA); print_r($feed->toArray());
Which concludes:
Array ( [@attributes] => stdClass Object ( [version] => 2.0 ) [title] => Blah [description] => Blah [id] => Blah [product_type] => Blah [@text] => ... )
For the base implementation:
class Feed extends SimpleXMLElement implements JsonSerializable { public function jsonSerialize() { $array = array(); // json encode attributes if any. if ($attributes = $this->attributes()) { $array['@attributes'] = iterator_to_array($attributes); } $namespaces = [null] + $this->getDocNamespaces(true); // json encode child elements if any. group on duplicate names as an array. foreach ($namespaces as $namespace) { foreach ($this->children($namespace) as $name => $element) { if (isset($array[$name])) { if (!is_array($array[$name])) { $array[$name] = [$array[$name]]; } $array[$name][] = $element; } else { $array[$name] = $element; } } } // json encode non-whitespace element simplexml text values. $text = trim($this); if (strlen($text)) { if ($array) { $array['@text'] = $text; } else { $array = $text; } } // return empty elements as NULL (self-closing or empty tags) if (!$array) { $array = NULL; } return $array; } public function toArray() { return (array) json_decode(json_encode($this)); } }
What is the namespace adoption in the example of changing the JSON encoding rules given in SimpleXML and JSON Encode in PHP - part III and the end .