Sequential strpos () faster than function with one preg_match?

Question

Sequential strpos () faster than function with one preg_match?

I need to check if any of the lines "hello", "i am", "dumb" exist in a longer line called $ ohreally , if even one of them exists, my test is completed and I have knowledge about that none of them will appear if one of them has.

In these conditions, I ask you for help on the most efficient way to record this search,

strpos () 3 times is how is it?

if (strpos ($ohreally, 'hello')){return false;} else if (strpos ($ohreally, 'i am')){return false;} else if (strpos ($ohreally, 'dumb')){return false;} else {return true;}

or one preg_match?

 if (preg_match('hello'||'i am'||'dumb', $ohreally)) {return false} else {return true};

I know that the preg_match code is incorrect, I would really appreciate it if someone could suggest the correct version.

Thanks!

Answer

Please read what cletus said and the middaparka test roared. I also ran a mirco time test on different lines, long and short. with these results

IF, you know the likelihood that string values will have an ORDER value from most probable to least. (I did not notice presentable differences in the ordering of the regular expression itself, that is, between /hello|i am|dumb/ or /i am|dumb|hello/ .

In consecutive strpos , on the other hand, probability matters. For example, if "hello" is 90%, "I" is 7% and "dumb" is 3 percent of the time. would you like to organize your code to check hello first and exit the function as soon as possible.

my microtime tests show this.

for haystacks, A, B, and C, in which the needle is on the first, second, and third strpos (), respectively, the time is as follows:

StrPos:
A: 0.00450 seconds // 1 strpos ()
B: 0.00911 seconds // 2 strpos ()
C: 0.00833 seconds // 3 strpos ()
C: 0.01180 seconds // 4 strpos () one extra added

and for preg_match:
A: 0.01919 seconds // 1 preg_match ()
B: 0.02252 seconds // 1 preg_match ()
C: 0.01060 seconds // 1 preg_match ()

as the numbers show, strpos is faster until the 4th execution, so I will use it instead, since I only have 3 sub-stings to check :)

+4

performance optimization string php search

Mohammad Jan 19 '10 at 12:53

source share

3 answers

Crazy idea, but why not try "n" a thousand times in two separate loops, both surrounded by microtime (); and related debug output.

Based on the code above (with a few fixes) for 1000 iterations, I get something like:

 strpos test: 0.003315 preg_match test: 0.014241

Thus, in this case (with the limitations set forth by others) strpos really seems to be faster, albeit largely meaningless. (The joy of pointless microoptimization, etc.)

Never evaluate what you can measure.

+4

John parker Jan 19 '10 at 13:03

source share

It depends on the number of lines you want to find and the length of the line you are looking for.

You will need to experiment with a representative dataset to find out what is true (repeat the operation, say 1000 times and measure the time delay).

BTW - I think the regex you are looking for is '(hello | i am | dumb)'

Also, your code is more verbose than it should be:

 return strpos($ohreally, 'hello') || strpos($ohreally, 'i am') || strpos($ohreally, 'dumb');

or

 return preg_match('(hello|i am|dumb)',$ohreally);

In addition, by all common coding standards, there should be no spaces between the function name and the bracket.

FROM.

+1

symcbean Jan 19 '10 at 13:03

source share

cletus · Accepted Answer · 2010-01-19T12:58:10+0000

The correct syntax is:

 preg_match('/hello|i am|dumb/', $ohreally);

I doubt that there are many, but I would not be surprised if the strpos() method were faster depending on the number of lines you are looking for. The performance of strpos() will deteriorate as the number of search queries increases. A regular expression is likely to be, but not so fast.

Regular expressions are obviously more powerful. For example, if you want to combine the word "dumb", but not "dumber", then this is easy to do with:

 preg_match('/\b(hello|i am|dumb)\b/', $ohreally);

which is much harder to do with strpos() .

Note: technically, \b is the word boundary with zero width. "Zero-width" means that it does not consume any part of the input line and word boundary, means that it corresponds to the beginning of the line, the end of the line, the transition from words (numbers, letters or underscores) to word-characters or the transition from non- words to word symbols. Very useful.

Edit: It is also worth noting that using strpos() is wrong (but many people make this mistake). Namely:

 if (strpos ($ohreally, 'hello')) { ... }

will not enter the condition block if the needle is at position 0 in a row. Proper use:

 if (strpos ($ohreally, 'hello') !== false) { ... }

due to the type of juggling. Otherwise, 0 is converted to false.

Sequential strpos () faster than function with one preg_match?

More articles: