PHP restriction preg_match_all

I use preg_match_all for a very long template.

when running the code, I got this error:

Warning: preg_match_all (): Compilation error: regex is too large at offset 707830

After searching, I got a solution, so I need to increase the value of pcre.backtrack_limit and pcre.recursion_limit in php.ini

But after I increased the value and restarted my apache, it still had the same problem. My PHP verison is 5.3.8

+3
php regex preg-match preg-match-all
source share
3 answers

increasing PCRE backtracking and limiting recursion can solve the problem, but still fail when the size of your data reaches a new limit. (does not scale with a lot of data)

Example:

 <?php // essential for huge PCREs ini_set("pcre.backtrack_limit", "23001337"); ini_set("pcre.recursion_limit", "23001337"); // imagine your PCRE here... ?> 

in order to really solve the main problem, you have to optimize your expression and (if possible) divide your complex expression into "parts" and move some logic into PHP. I hope you understand this idea by reading an example. Instead of trying to find the substructure directly with a single PCRE, I demonstrate a more β€œiterative” approach that penetrates deeper and deeper into the structure using PHP. Example:

 <?php $html = file_get_contents("huge_input.html"); // first find all tables, and work on those later $res = preg_match_all("!<table.*>(?P<content>.*)</table>!isU", $html, $table_matches); if ($res) foreach($table_matches['content'] as $table_match) { // now find all cells in each table that was found earlier .. $res = preg_match_all("!<td.*>(?P<content>.*)</td>!isU", $table_match, $cell_matches); if ($res) foreach($cell_matches['content'] as $cell_match) { // imagine going deeper and deeper into the structure here... echo "found a table cell! content: ", $cell_match; } } 
+8
source share

This error is not related to the performance of the regular expression, but to the regular expression itself. Changing pcre.backtrack_limit and pcre.recursion_limit will have no effect, because the regex will never work. The problem is that the regex is too large, and the solution is to make the regex smaller - much smaller.

+11
source share

I am writing this answer because I came across the same problem. As Alan Moore pointed out, adjusting return and recursion constraints won't help solve the problem.

The described error occurs when the needle exceeds the maximum possible needle size, which is limited by the pcre base library. The described error is NOT caused by php, but the pcre base library. This is error message # 20, which is defined here:

https://github.com/php/.../pcre_compile.c#L477

php just prints errortext obtained from pcre library on error.

However, this error appears in my environment when I try to use previously captured fragments as a needle, and they are more than 32 Kbytes.

It can be easily tested using this simple script from php cli

 <?php // This script demonstrates the above error and dumps an info // when the needle is too long or with 64k iterations. $expand=$needle="_^b_"; while( ! preg_match( $needle, "Stack Exchange Demo Text" ) ) { // Die after 64 kbytes of accumulated chunk needle // Adjust to 32k for a better illustration if ( strlen($expand) > 1024*64 ) die(); if ( $expand == "_^b_" ) $expand = ""; $expand .= "a"; $needle = '_^'.$needle.'_ism'; echo strlen($needle)."\n"; } ?> 

To correct the error, either the corresponding needle decreased or decreased, or - if everything is necessary for capture - it is necessary to use several preg_match with the additional offset parameter.

 <?php if ( preg_match( '/'.preg_quote( substr( $big_chunk, 0, 20*1024 ) // 1st 20k chars ) .'.*?'. preg_quote( substr( $big_chunk, -5 ) // last 5 ) .'/', $subject ) ) { // do stuff } // The match all needles in text attempt if ( preg_match( $needle_of_1st_32kbytes_chunk, $subj, $matches, $flags = 0, $offset = 32*1024*0 // Offset -> 0 ) && preg_match( $needle_of_2nd_32kbytes_chunk, $subj, $matches, $flags = 0, $offset = 32*1024*1 // Offset -> 32k ) // && ... as many preg matches as needed ) { // do stuff } // it would be nicer to put the texts in a foreach-loop iterating // over the existings chunks ?> 

You get the idea.

Although this answer is courtesy of laaaaate, I hope that it still helps people facing this problem, without a good explanation of the reasons for the error.

+3
source share

All Articles