Allow namespaces using SimpleXML regardless of structure or namespace

I got a Google Shopping feed like this (extract):

<?xml version="1.0" encoding="utf-8" ?> <rss version="2.0" xmlns:g="http://base.google.com/ns/1.0"> ... <g:id><![CDATA[Blah]]></g:id> <title><![CDATA[Blah]]></title> <description><![CDATA[Blah]]></description> <g:product_type><![CDATA[Blah]]></g:product_type> 

Now SimpleXML can read tags "title" and "description", but can not read tags with the prefix "g:".

There are stackoverflow solutions for this particular case using the "children" function. But I not only want to read the XML files of Google Shopping, I need it to be independent of the structure or namespace, I donโ€™t know anything about the file (I recursively go through the nodes as a multidimensional array).

BTW: " http://base.google.com/ns/1.0 " offline "or you may just not be able to access it through a browser?

Is there a way to do this using SimpleXML? I could replace the colons, but I want to be able to store an array and collect XML (in this case, especially for Google Shopping), so I do not want to lose information.

0
xml php simplexml xml-namespaces
source share
1 answer

You want to use SimpleXMLElement to extract data from XML and convert it to an array.

This is generally possible, but there are some caveats. Before XML Namespaces, your XML comes with CDATA. To convert XML to an array using Simplexml, you need to convert CDATA to text when loading an XML string. This is done using the LIBXML_NOCDATA flag. Example:

 $xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA); print_r($xml); // print_r shows how SimpleXMLElement does array conversion 

This gives you the following result:

 SimpleXMLElement Object ( [@attributes] => Array ( [version] => 2.0 ) [title] => Blah [description] => Blah ) 

As you can see, there is no good form for representing attributes in an array, so Simplexml by convention puts them in the @attributes key.

Another problem is to handle these multiple XML namespaces. The previous example did not use a special namespace. This is the default namespace. When you convert SimpleXMLElement to an array, the SimpleXMLElement namespace is used . Since none of these were explicitly specified, the default namespace was used.

But if you specify a namespace when creating the array, that namespace will be accepted.

Example:

 $xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA, "http://base.google.com/ns/1.0"); print_r($xml); 

This gives you the following result:

 SimpleXMLElement Object ( [id] => Blah [product_type] => Blah ) 

As you can see, this time the namespace specified when creating SimpleXMLElement is used in array conversion: http://base.google.com/ns/1.0 .

When you write, you want all namespaces from the document to be taken into account, you must first get them, including the default:

 $xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA); $namespaces = [null] + $xml->getDocNamespaces(true); 

Then you can iterate over all namespaces and recursively merge them into the same array , shown below:

 $array = []; foreach ($namespaces as $namespace) { $xml = simplexml_load_string($buffer, null, LIBXML_NOCDATA, $namespace); $array = array_merge_recursive($array, (array) $xml); } print_r($array); 

This finally should create and output an array of your choice:

 Array ( [@attributes] => Array ( [version] => 2.0 ) [title] => Blah [description] => Blah [id] => Blah [product_type] => Blah ) 

As you can see, this is entirely possible with SimpleXMLElement . However, it is important to understand how SimpleXMLElement is converted to an array (or serialized to JSON, which follows the same rules). To simulate a SimpleXMLElement -to-array transformation, you can use print_r for quick output.

Note that not all XML constructs can be equally well converted to an array. This is not a specific limitation of Simplexml, but what XML structures can represent and which structures an array can represent.

Therefore, it is often better to store XML inside an object, such as SimpleXMLElement (or DOMDocument ) for accessing and processing data, rather than an array.

However, itโ€™s great to convert the data to an array as long as you know what you are doing and you donโ€™t need to write a lot of code to access the elements deeper down the tree in the structure. Otherwise, SimpleXMLElement should be preferred to an array, since it provides dedicated access not only to many XML functions, but also queries the database with the SimpleXMLElement::xpath method . You will need to write many lines of native code to access the data inside the XML tree, which is convenient in an array.

To get the best of both worlds, you can extend SimpleXMLElement to your specific conversion needs:

 $buffer = <<<BUFFER <?xml version="1.0" encoding="utf-8" ?> <rss version="2.0" xmlns:g="http://base.google.com/ns/1.0"> ... <g:id><![CDATA[Blah]]></g:id> <title><![CDATA[Blah]]></title> <description><![CDATA[Blah]]></description> <g:product_type><![CDATA[Blah]]></g:product_type> </rss> BUFFER; $feed = new Feed($buffer, LIBXML_NOCDATA); print_r($feed->toArray()); 

Which concludes:

 Array ( [@attributes] => stdClass Object ( [version] => 2.0 ) [title] => Blah [description] => Blah [id] => Blah [product_type] => Blah [@text] => ... ) 

For the base implementation:

 class Feed extends SimpleXMLElement implements JsonSerializable { public function jsonSerialize() { $array = array(); // json encode attributes if any. if ($attributes = $this->attributes()) { $array['@attributes'] = iterator_to_array($attributes); } $namespaces = [null] + $this->getDocNamespaces(true); // json encode child elements if any. group on duplicate names as an array. foreach ($namespaces as $namespace) { foreach ($this->children($namespace) as $name => $element) { if (isset($array[$name])) { if (!is_array($array[$name])) { $array[$name] = [$array[$name]]; } $array[$name][] = $element; } else { $array[$name] = $element; } } } // json encode non-whitespace element simplexml text values. $text = trim($this); if (strlen($text)) { if ($array) { $array['@text'] = $text; } else { $array = $text; } } // return empty elements as NULL (self-closing or empty tags) if (!$array) { $array = NULL; } return $array; } public function toArray() { return (array) json_decode(json_encode($this)); } } 

What is the namespace adoption in the example of changing the JSON encoding rules given in SimpleXML and JSON Encode in PHP - part III and the end .

+7
source share

All Articles