Php, strpos extract a digit from a string

I have a great html code to scan. So far, I have used preg_match_all to extract the desired details from it. From the very beginning, the problem was that it was extremely complex. Finally, we decided to use a different extraction method. In some articles, I read that preg_match can be compared with strpos performance. They claim that strpos surpasses the effective regular expression scanner by up to 20 times. I thought I would try this method, but I do not know how to start.

Let's say I have this html line:

 <li id="ncc-nba-16451" class="che10"><a href="/en/star">23 - Star</a></li> <li id="ncd-bbt-5674" class="che10"><a href="/en/moon">54 - Moon</a></li> <li id="ertw-cxda-c6543" class="che10"><a href="/en/sun">34,780 - Sun</a></li> 

I want to extract only the number from each identifier and only the text (letters) from the contents of the tags a . so I do this preg_match_all scan:

'/<li.*?id=".*?([\d]+)".*?<a.*?>.*?([\w]+)<\/a>/s'

here you can see the result: LINK

Now, if I wanted to replace my method with strpos functionality, what would the approach look like? I understand that strpos returns the index of the start where the match occurred. But how can I use it for:

  • get all possible matches, not just
  • Extract numbers or text from the desired location in a string

Thanks for the help and advice;)

+5
source share
2 answers

Using DOM

 $html = ' <html> <head></head> <body> <li id="ncc-nba-16451" class="che10"><a href="/en/star">23 - Star</a></li> <li id="ncd-bbt-5674" class="che10"><a href="/en/moon">54 - Moon</a></li> <li id="ertw-cxda-c6543" class="che10"><a href="/en/sun">34,780 - Sun</a></li> </body> </html>'; $dom_document = new DOMDocument(); $dom_document->loadHTML($html); $rootElement = $dom_document->documentElement; $getId = $rootElement->getElementsByTagName('li'); $res = []; foreach($getId as $tag) { $data = explode('-',$tag->getAttribute('id')); $res['li_id'][] = end($data); } $getNode = $rootElement->getElementsByTagName('a'); foreach($getNode as $tag) { $res['a_node'][] = $tag->parentNode->textContent; } print_r($res); 

Output:

 Array ( [li_id] => Array ( [0] => 16451 [1] => 5674 [2] => c6543 ) [a_node] => Array ( [0] => 23 - Star [1] => 54 - Moon [2] => 34,780 - Sun ) ) 
+3
source

This regex finds a match in 24 steps using 0 return traces

 (?:id="[^\d]*(\d*))[^<]*(?:<a href="[^>]*>[^az]*([az]*)) 

Regular regex requires 134 steps. Maybe you notice the difference? Please note that regex engines can be optimized to minimize rollback. I used the RegexBuddy debugger to access numbers.

+3
source

All Articles