XML parsing with PHP and XMLReader

I am trying to parse a very large XML file with PHP and XMLReader, but it seems I can not get the results I'm looking for. Basically, I am looking for a ton of information, and if a contains a specific zipcode, I would like to return this XML bit or continue searching until I find this zipcode. In fact, I will break this large file into several small pieces, so instead of looking at thousands or millions of groups of information, it can be 10 or 20 years.

Here is some XML with what I would like

//search through xml <lineups country="USA"> //cache TX02217 as a variable <headend headendId="TX02217"> //cache Grande Gables at The Terrace as a variable <name>Grande Gables at The Terrace</name> //cache Grande Communications as a variable <mso msoId="17541">Grande Communications</mso> <marketIds> <marketId type="DMA">635</marketId> </marketIds> //check to see if any of the postal codes are equal to $pc variable that will be set in the php <postalCodes> <postalCode>11111</postalCode> <postalCode>22222</postalCode> <postalCode>33333</postalCode> <postalCode>78746</postalCode> </postalCodes> //cache Austin to a variable <location>Austin</location> <lineup> //cache all prgSvcID to an array ie 20014, 10722 <station prgSvcId="20014"> //cache all channels to an array ie 002, 003 <chan effDate="2006-01-16" tier="1">002</chan> </station> <station prgSvcId="10722"> <chan effDate="2006-01-16" tier="1">003</chan> </station> </lineup> <areasServed> <area> //cache community to a variable $community <community>Thorndale</community> <county code="45331" size="D">Milam</county> //cache state to a variable ie TX <state>TX</state> </area> <area> <community>Thrall</community> <county code="45491" size="B">Williamson</county> <state>TX</state> </area> </areasServed> </headend> //if any of the postal codes matched $pc //echo back the xml from <headend> to </headend> //if none of the postal codes matched $pc //clear variables and move to next <headend> <headend> etc etc etc </headend> <headend> etc etc etc </headend> <headend> etc etc etc </headend> </lineups> 

PHP:

 <?php $pc = "78746"; $xmlfile="myFile.xml"; $reader = new XMLReader(); $reader->open($xmlfile); while ($reader->read()) { //search to see if groups contain $pc and echo info } 

I know that I am doing it harder than it should be, but I am a bit overloaded trying to manipulate such a large file. Any help is appreciated.

+4
source share
2 answers

To get more flexibility with XMLReader , I usually create iterators for myself that can work with the XMLReader object and provide the steps I need .

This starts with a simple iteration over all nodes to iteration over the elements, optionally with a specific name. Let us call the last XMLElementIterator , taking the reader and element name as parameters.

In your scenario, I would then create an iterator that returns a SimpleXMLElement for the current element, taking only the <headend> elements:

 require('xmlreader-iterators.php'); // https://gist.github.com/hakre/5147685 class HeadendIterator extends XMLElementIterator { const ELEMENT_NAME = 'headend'; public function __construct(XMLReader $reader) { parent::__construct($reader, self::ELEMENT_NAME); } /** * @return SimpleXMLElement */ public function current() { return simplexml_load_string($this->reader->readOuterXml()); } } 

Equipped with this iterator, the rest of your work is basically a piece of cake. First upload a 10 gigabyte file:

 $pc = "78746"; $xmlfile = '../data/lineups.xml'; $reader = new XMLReader(); $reader->open($xmlfile); 

And then check if the <headend> element contains this information, and if so, display the / XML data:

 foreach (new HeadendIterator($reader) as $headend) { /* @var $headend SimpleXMLElement */ if (!$headend->xpath("/*/postalCodes/postalCode[. = '$pc']")) { continue; } echo 'Found, name: ', $headend->name, "\n"; echo "==========================================\n"; $headend->asXML('php://stdout'); } 

This literally does what you are trying to achieve: go through a large document (which is memory friendly) until you find the elements you are interested in. Then you process the specific element and its XML only; XMLReader::readOuterXml() is a great tool here.

Sample output:

 Found, name: Grande Gables at The Terrace ========================================== <?xml version="1.0"?> <headend headendId="TX02217"> <name>Grande Gables at The Terrace</name> <mso msoId="17541">Grande Communications</mso> <marketIds> <marketId type="DMA">635</marketId> </marketIds> <postalCodes> <postalCode>11111</postalCode> <postalCode>22222</postalCode> <postalCode>33333</postalCode> <postalCode>78746</postalCode> </postalCodes> <location>Austin</location> <lineup> <station prgSvcId="20014"> <chan effDate="2006-01-16" tier="1">002</chan> </station> <station prgSvcId="10722"> <chan effDate="2006-01-16" tier="1">003</chan> </station> </lineup> <areasServed> <area> <community>Thorndale</community> <county code="45331" size="D">Milam</county> <state>TX</state> </area> <area> <community>Thrall</community> <county code="45491" size="B">Williamson</county> <state>TX</state> </area> </areasServed> </headend> 
+6
source

Edit: Oh, do you want to return the parent piece? One moment.

Here is an example to pull all postal codes into an array.

http://codepad.org/kHss4MdV

 <?php $string='<lineups country="USA"> <headend headendId="TX02217"> <name>Grande Gables at The Terrace</name> <mso msoId="17541">Grande Communications</mso> <marketIds> <marketId type="DMA">635</marketId> </marketIds> <postalCodes> <postalCode>11111</postalCode> <postalCode>22222</postalCode> <postalCode>33333</postalCode> <postalCode>78746</postalCode> </postalCodes> <location>Austin</location> <lineup> <station prgSvcId="20014"> <chan effDate="2006-01-16" tier="1">002</chan> </station> <station prgSvcId="10722"> <chan effDate="2006-01-16" tier="1">003</chan> </station> </lineup> <areasServed> <area> <community>Thorndale</community> <county code="45331" size="D">Milam</county> <state>TX</state> </area> <area> <community>Thrall</community> <county code="45491" size="B">Williamson</county> <state>TX</state> </area> </areasServed> </headend></lineups>'; $dom = new DOMDocument(); $dom->loadXML($string); $xpath = new DOMXPath($dom); $elements= $xpath->query('//lineups/headend/postalCodes/*[text()=78746]'); if (!is_null($elements)) { foreach ($elements as $element) { echo "<br/>[". $element->nodeName. "]"; $nodes = $element->childNodes; foreach ($nodes as $node) { echo $node->nodeValue. "\n"; } } } 

Outputs:

 <br/>[postalCode]78746 
0
source

All Articles