How to use XMLReader in PHP?

I have the following xml file, the file is quite large and I could not get simplexml to open and read the file, so I am trying to use XMLReader without success in php

<?xml version="1.0" encoding="ISO-8859-1"?> <products> <last_updated>2009-11-30 13:52:40</last_updated> <product> <element_1>foo</element_1> <element_2>foo</element_2> <element_3>foo</element_3> <element_4>foo</element_4> </product> <product> <element_1>bar</element_1> <element_2>bar</element_2> <element_3>bar</element_3> <element_4>bar</element_4> </product> </products> 

Unfortunately, I did not find a good tutorial on this subject for PHP and would like to know how I can get the contents of each element for storage in the database.

+63
xml php parsing xmlreader simplexml
Dec 02 '09 at 19:17
source share
8 answers

It all depends on how big the unit of work is, but I think you are trying to process each <product/> node in a row.

To do this, the easiest way would be to use XMLReader to access each node, and then use SimpleXML to access them. This way you keep memory usage low because you are considering a single node at the same time and you are still using SimpleXML. For example:

 $z = new XMLReader; $z->open('data.xml'); $doc = new DOMDocument; // move to the first <product /> node while ($z->read() && $z->name !== 'product'); // now that we're at the right depth, hop to the next <product/> until the end of the tree while ($z->name === 'product') { // either one should work //$node = new SimpleXMLElement($z->readOuterXML()); $node = simplexml_import_dom($doc->importNode($z->expand(), true)); // now you can use $node without going insane about parsing var_dump($node->element_1); // go to next <product /> $z->next('product'); } 

A quick overview of the pros and cons of the various approaches:

XMLReader only

  • Pros: fast, low memory

  • Cons: it’s overly difficult to write and debug, it takes a lot of user code to do something useful. Userland code is slow and error prone. In addition, it leaves you more lines of code to support.

XMLReader + SimpleXML

  • Pros: it does not use a lot of memory (only the memory needed to process one node), and SimpleXML, as the name implies, is very easy to use.

  • Cons: Creating a SimpleXMLElement object for each node is not very fast. You really need to compare it to see if this is a problem for you. However, even a modest machine can handle thousands of nodes per second.

XMLReader + DOM

  • Pros: Uses about the same amount of memory as SimpleXML, and XMLReader :: expand () is faster than creating a new SimpleXMLElement element. I would like to use simplexml_import_dom() , but in this case it does not work

  • Cons: DOM annoying to work. This is halfway between XMLReader and SimpleXML. Not as difficult and inconvenient as XMLReader, but for a few light years from working with SimpleXML.

My advice: write a prototype with SimpleXML, see if it works for you. If performance is paramount, try the DOM. Stay as far away as possible from XMLReader. Remember that the more code you write, the higher the likelihood of errors or the introduction of performance regressions.

+183
Dec 02 '09 at 19:45
source share

For xml formatted with attributes ...

data.xml:

 <building_data> <building address="some address" lat="28.902914" lng="-71.007235" /> <building address="some address" lat="48.892342" lng="-75.0423423" /> <building address="some address" lat="58.929753" lng="-79.1236987" /> </building_data> 

php code:

 $reader = new XMLReader(); if (!$reader->open("data.xml")) { die("Failed to open 'data.xml'"); } while($reader->read()) { if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'building') { $address = $reader->getAttribute('address'); $latitude = $reader->getAttribute('lat'); $longitude = $reader->getAttribute('lng'); } $reader->close(); 
+8
05 Oct
source share

Most of my work on parsing XML is spent on extracting nuggets of useful information from XML Trucks (Amazon MWS). Thus, my answer assumes that you only want specific information, and you know where it is.

I believe the easiest way to use XMLReader is to find out which tags I want to get from them and use. If you know the XML structure and it has many unique tags, I believe that using the first case is easy. Cases 2 and 3 are just to show you how this can be done for more complex tags. It's very fast; I have a discussion of speed on What is the fastest XML parser in PHP?

The most important thing to consider when doing tag-based analysis is to use if ($myXML->nodeType == XMLReader::ELEMENT) {... - which checks that we are only dealing with open nodes, not spaces or closing nodes or something else.

 function parseMyXML ($xml) { //pass in an XML string $myXML = new XMLReader(); $myXML->xml($xml); while ($myXML->read()) { //start reading. if ($myXML->nodeType == XMLReader::ELEMENT) { //only opening tags. $tag = $myXML->name; //make $tag contain the name of the tag switch ($tag) { case 'Tag1': //this tag contains no child elements, only the content we need. And it unique. $variable = $myXML->readInnerXML(); //now variable contains the contents of tag1 break; case 'Tag2': //this tag contains child elements, of which we only want one. while($myXML->read()) { //so we tell it to keep reading if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') { // and when it finds the amount tag... $variable2 = $myXML->readInnerXML(); //...put it in $variable2. break; } } break; case 'Tag3': //tag3 also has children, which are not unique, but we need two of the children this time. while($myXML->read()) { if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') { $variable3 = $myXML->readInnerXML(); break; } else if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Currency') { $variable4 = $myXML->readInnerXML(); break; } } break; } } } $myXML->close(); } 
+4
Oct 15 '14 at 18:02
source share

XMLReader is well documented on a PHP site . This is an XML Pull Parser, which means that it was used to iterate through the nodes (or DOM nodes) of this XML document. For example, you can go through the whole document that you specified like this:

 <?php $reader = new XMLReader(); if (!$reader->open("data.xml")) { die("Failed to open 'data.xml'"); } while($reader->read()) { $node = $reader->expand(); // process $node... } $reader->close(); ?> 

Then you decide how to handle the node returned by XMLReader :: expand () .

+2
Dec 02 '09 at 19:42
source share
 Simple example: public function productsAction() { $saveFileName = 'ceneo.xml'; $filename = $this->path . $saveFileName; if(file_exists($filename)) { $reader = new XMLReader(); $reader->open($filename); $countElements = 0; while($reader->read()) { if($reader->nodeType == XMLReader::ELEMENT) { $nodeName = $reader->name; } if($reader->nodeType == XMLReader::TEXT && !empty($nodeName)) { switch ($nodeName) { case 'id': var_dump($reader->value); break; } } if($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'offer') { $countElements++; } } $reader->close(); exit(print('<pre>') . var_dump($countElements)); } } 
+1
Oct 10 '13 at 7:04
source share

The accepted answer gave me a good start, but brought more classes and more processing than I would like; so this is my interpretation:

 $xml_reader = new XMLReader; $xml_reader->open($feed_url); // move the pointer to the first product while ($xml_reader->read() && $xml_reader->name != 'product'); // loop through the products while ($xml_reader->name == 'product') { // load the current xml element into simplexml and we're off and running! $xml = simplexml_load_string($xml_reader->readOuterXML()); // now you can use your simpleXML object ($xml). echo $xml->element_1; // move the pointer to the next product $xml_reader->next('product'); } // don't forget to close the file $xml_reader->close(); 
+1
Jun 20 '14 at 6:07
source share

This question is long gone, but I just found it. Thank God.

My problem is that I have to read the ONIX file (book data) and save it in our database. I use simplexml_load before, and although it used a lot of memory, it is still good for a relatively small file (up to 300 MB). Besides this size, disaster is for me.

After reading, especially the interpretation of Francis Lewis, I use a combination of xmlreader and simplexml. The result is exceptional, the memory usage is small and inserts it into the database fast enough for me.

Here is my code:

 <?php $dbhost = "localhost"; // mysql host $dbuser = ""; //mysql username $dbpw = ""; // mysql user password $db = ""; // mysql database name //i need to truncate the old data first $conn2 = mysql_connect($dbhost, $dbuser, $dbpw); mysql_select_db($db); mysql_query ("truncate ebiblio",$conn2); //$xmlFile = $_POST['xmlFile']; //$xml=simplexml_load_file("ebiblio.xml") or die("Error: Cannot create object"); $reader = new XMLReader(); //load the selected XML file to the DOM if (!$reader->open("ebiblio.xml")) { die("Failed to open 'ebiblio.xml'"); } while ($reader->read()): if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'product'){ $xml = simplexml_load_string($reader->readOuterXML()); $productcode = (string)$xml->a001; $title = (string)$xml->title->b203; $author = (string)$xml->contributor->b037; $language = (string)$xml->language->b252; $category = $xml->subject->b069; $description = (string)$xml->othertext->d104; $publisher = (string)$xml->publisher->b081; $pricecover = (string)$xml->supplydetail->price->j151; $salesright = (string)$xml->salesrights->b090; @$productcode1 = htmlentities($productcode,ENT_QUOTES,'latin1_swedish_ci'); @$title1 = htmlentities($title,ENT_QUOTES,'latin1_swedish_ci'); @$author1 = htmlentities($author,ENT_QUOTES,'latin1_swedish_ci'); @$language1 = htmlentities($language,ENT_QUOTES,'latin1_swedish_ci'); @$category1 = htmlentities($category,ENT_QUOTES,'latin1_swedish_ci'); @$description1 = htmlentities($description,ENT_QUOTES,'latin1_swedish_ci'); @$publisher1 = htmlentities($publisher,ENT_QUOTES,'latin1_swedish_ci'); @$pricecover1 = htmlentities($pricecover,ENT_QUOTES,'latin1_swedish_ci'); @$salesright1 = htmlentities($salesright,ENT_QUOTES,'latin1_swedish_ci'); $conn = mysql_connect($dbhost, $dbuser, $dbpw); mysql_select_db($db); $sql = "INSERT INTO ebiblio VALUES ('" . $productcode1 . "','" . $title1 . "','" . $author1 . "','" . $language1 . "','" . $category1 . "','" . $description1 . "','" . $publisher1 . "','" . $pricecover1 . "','" . $salesright1 . "')"; mysql_query($sql, $conn); $reader->next('product'); } endwhile; ?> 
+1
Jul 12 '16 at 3:03
source share

I'm afraid that using XmlReader :: expand () might consume quite a lot of RAM when the subtree is not so small. I am not sure if this is a good alternative to XmlReader. However, I agree that XmlReader is really weak and not very suitable for handling complex nested XML trees. I really do not like two things: firstly, the current node does not have this path in the XML tree available as a property, and secondly, when reading the nodes, XPath-like processing cannot be started. Of course, an actual XPath query will be very time-consuming for large XML, but you can use "path hooks" instead - for example, when the current path to an element matches the root subtree, the PHP function / method is launched. So a few years ago, I developed my own classes on top of XmlReader. They are not perfect, and perhaps I would have written better today, but it can still be useful to someone:

https://bitbucket.org/sdvpartnership/questpc-framework/src/c481a8b051dbba0a6644ab8a77a71e58119e7441/includes/Xml/Reader/?at=master

I create the XML path 'node1 / node2' and then use the hooks with PCRE matches, which are less efficient than XPath, however it was enough for my needs. I have processed quite complex large XML with these classes.

0
Jan 13 '15 at 8:43
source share



All Articles