The regular expression matches the length of the word "1"

Question

The regular expression matches the length of the word "1"

I am trying to parse product names that have several abbreviations for sizes. For example, the medium may be

m, medium, med

I tried simple

 preg_match('/m|medium|med/i',$prod_name,$matches);

which is great for "m xyz product". However, when I try "product s / m abc", I get a false positive match. I also tried

 preg_match('/\bm\b|\bmedium\b|\bmed\b/i',$prod_name,$matches);

to make him find the word, but m in s / m is still matched. I assume this is because the engine treats the "/" in the title as a word delimiter?

So, to summarize, I need to match "m" in the string, but not "s / m" or "small", etc. Any help is appreciated.

+4

php regex

Conor May 29 '12 at 23:02

source share

3 answers

I always think about these things in ERE. And according to re_format (7), the boundaries of the word ERE, [[:<:]] and [[:>:]] correspond to the zero line at the beginning and end of the word respectively, So ... since preg needs to understand the ERE notation, I can go with:

 /[[:<:]](m(ed(ium)?)?)[[:>:]]/

Or for readability, perhaps:

 /[[:<:]](m|med|medium)[[:>:]]/

In PHP, however, you can use PREG instead of ERE. In PREG, \b indicates the word boundary, therefore:

 preg_match('/\b(m(ed(ium)?)?)\b/', $prod_name, $matches);

+1

ghoti May 30 '12 at 1:40

source share

Try this, it should match medium , med and m .

 medium|med|^m$

0

David May 29 '12 at 23:09

source share

Amadan · Accepted Answer · 2012-05-29T23:11:12+0000

 %\b(?<![/-])(m|med|medium)(?![/-])\b%

You can use a negative lookbehind or lookahead to exclude intruders. This means "m"/"med"/"medium" , which is its own word, but is not preceded or accompanied by a slash or dash. It also works at the beginning and at the end of a line, since a negative lookahead / lookbehind does not make the corresponding character present.

If you want to distinguish between spaces, you can use the positive version:

 %\b(?<=\s|^)(m|med|medium)(?=\s|$)\b%

( "m"/"med"/"medium" preceded by spaces or the beginning of a line, and then a space or the end of a line)

The regular expression matches the length of the word "1"

More articles: