Case-insensitive xpath in php

I have an xml file, for example:

<volume name="Early"> <book name="School Years"> <chapter number="1"> <line number="1">Here the first line with Chicago in it.</line> <line number="2">Here a line that talks about Atlanta</line> <line number="3">Here a line that says chicagogo </line> </chapter> </book> </volume> 

I am trying to do a simple keyword search using PHP, which finds this word and displays the string it was in. I have this job

 $xml = simplexml_load_file($data); $keyword = $_GET['keyword']; $kw=$xml->xpath("//line[contains(text(),'$keyword')]"); ...snip... echo $kw[0]." is the first returned item"; 

However, using this technique, the user must search for “Chicago” and not “chicago,” or the search will return nothing.

I understand that I need to use the translation function, but all my trial versions and errors were in vain.

I tried:

 $upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; $lower = "abcdefghijklmnopqrstuvwxyz"; $kw = $xml->xpath("line[contains(text(),'translate('$keyword','$upper','$lower'))]"); 

but nothing works. any tips?

+6
php xpath
source share
3 answers

Gordon’s recommendation to use the XPath PHP function will be more flexible if you decide to use it. However, contrary to his answer, the translate string function Available in XPath 1.0, so you can use it; your problem is how .

Firstly, there is an obvious typo that Charles indicated in his commentary on this subject. Then there is the logic of how you are trying to match text values.


In text form, you are currently asking: "Does the text contain lower case keywords?" This is not what you want to ask. Instead, ask: "Does lowercase keyword contain lowercase?" Translation (pardon the pun) that will return to XPath-land:

(Note: truncated alphabets for readability)

 //line[contains(translate(text(),'ABC...Z','abc...z'),'chicago')] 

The text below contains the text contained in the line node, then checks whether it contains (lowercase text) the keyword chicago .


And now for the required piece of code (but in fact, the above idea is what you really need to take home):

 $xml = simplexml_load_file($data); $search = strtolower($keyword); $nodes = $xml->xpath("//line[contains(translate(text(), 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$search')]"); echo 'Got ' . count($nodes) . ' matches!' . PHP_EOL; foreach ($nodes as $node){ echo $node . PHP_EOL; } 

Edit after the dijon comment

Inside foreach, you can access the line number, chapter number and book name, as shown below.

The line number is just an attribute of the <line> element, which makes access to it super-easy. There are two ways to access it using SimpleXML: $node['number'] or $node->attributes()->number (I prefer the first).

Chapter number - to understand this, as you correctly said, we need to go through the tree. If we used the DOM classes, we would have a convenient property $node->parentNode , which would lead us directly to the <chapter> (since this is the immediate ancestor for our <line> ). SimpleXML does not have such a convenient property, but we can use the relative XPath query to retrieve it. the parent axis allows us to move around the tree.

Since xpath() returns an array, we can trick and use current() to access the first (and only) element in the array returned from it. Then it's just a matter of accessing the number attribute, as stated above.

 // In the near future we can use: current(...)['number'] but not yet $chapter = current($node->xpath('./parent::chapter'))->attributes()->number; 

The name of the book . The process is the same as for accessing the chapter number. A relative XPath query from <line> may use the ancestor axis , for example ./ancestor::book (or ./parent:chapter/parent::book ). We hope you can figure out how to access the name attribute.

+7
source share

See salathe's answer on how to do this with SimpleXml and translate ().

As an alternative / added option to use XPath functions, you can use any PHP function with PHP5.3, including self-defined, in XPath expressions when using the DOM . I'm not sure if this is available in SimpleXml.

 // create a DOMDocument and load your XML string into it $dom = new DOMDocument; $dom->loadXML($xml); // create a new Xpath and register PHP functions as XPath functions $xPath = new DOMXPath($dom); $xPath->registerNamespace("php", "http://php.net/xpath"); $xPath->registerPHPFunctions(); // Setup the query $keyword = 'chicago'; $q = "//line[php:functionString('stripos', text(), '$keyword')]"; $nodes = $xPath->query($q); // Iterate the resulting NodeList foreach($nodes as $node) { echo $node->nodeValue, PHP_EOL; } 

This will lead to the conclusion

 Here the first line with Chicago in it. Here a line that says chicagogo 

See @salathes blog post and PHP Guide for more details .

+2
source share

Perhaps I missed something ... but here is another approach, which IMHO is easier. How about using PHP strtolower() before loading XML into SimpleXML via simplexml_load_string() ?

IE

 $xml = simplexml_load_string(strtolower(file_get_contents($xml_file_path))); $keyword = strtolower($_GET['keyword']); //Make sure you sanitize this! $kw = $xml->xpath("//line[contains(text(),'$keyword')]"); 

So you compare lowercase letters: lowercase

0
source share

All Articles