Requires python regex to handle substring

I want to check where the string (Product Name) contains the word beta, since I did not write very well in the regular expression: for example.

"Crome beta" "Crome_beta" "Crome beta2" "Crome_betaversion" "Crome 3beta" "CromeBerta2.3" "Beta Crome 4" 

So that I can cause an error, this is an invalid product name, its version of the product. I wrote a regex that is able to withstand specified lines

 parse_beta = re.compile( "(beta)", re.I) if re.search(parse_data, product_name): logging error 'Invalid product name' 

But if the product name contains a word having the beta init substring, like "tibetan product", so the above regular expression parses the beta version and raises the error. I want to handle this case. Anyone can offer me some regex.

Thank you very much.

+4
source share
4 answers

We should cover all cases of beta names where the regex should match.

So, we begin to write a template with the first beta example "Crome beta" :

 ' [Bb]eta' 

We use [Bb] to match B or B in second place.

The second "Crome_beta" example adds _ as a delimiter:

 '[ _][Bb]eta' 

The third example, "Crome beta2" and the fourth, "Crome_betaversion" covered by the last regular expression.

The fifth example of "Crome 3beta" forces us to change the template this way:

 '[ _]\d*[Bb]eta' 

where \d is a replacement for [0-9] and * allows from 0 to infinity elements \d .

The sixth example of "CromeBeta2.3" shows that beta cannot have a preceding _ or space, just start with capital. Therefore, we cover it with the construction | which matches the or operator in Python:

 '[ _]\d*[Bb]eta|Beta' 

The seventh example of Beta Crome 4 matches the smallest regular expression (since it starts with Beta ). But it could also be beta Chrome 4 , so we could change the template this way:

 '[ _]\d*[Bb]eta|Beta|^beta ' 

We do not use ^[Bb]eta , since Beta has already been reviewed.

Also, I must mention, we cannot use re.I , since we must distinguish between Beta and Beta in regular expression.

So, the test code (for Python 2.7):

 from __future__ import print_function import re, sys match_tests = [ "Crome beta", "Chrome Beta", "Crome_beta", "Crome beta2", "Crome_betaversion", "Crome 3beta" , "Crome 3Beta", "CromeBeta2.3", "Beta Crome 4", "beta Chrome ", "Cromebeta2.3" #no match, "betamax" #no match, "Betamax"] compiled = re.compile(r'[ _]\d*[Bb]eta|Beta|^beta ') for test in match_tests: search_result = compiled.search(test) if search_result is not None: print("{}: OK".format(test)) else: print("{}: No match".format(test), file=sys.stderr) 

I do not see the need to use a negative lookbehind. In addition, you used the capture group (beta) (brackets). And there is no need for this. This will simply slow down the regular expression.

0
source

Try ((?<![az])beta|cromebeta) . (the word beta does not precede the letter or the full word cromebeta)

I will add a quote from http://docs.python.org/library/re.html to explain the first part.

(? <! ...) Matches if the current position in the line does not precede by coincidence for .... This is called a negative lookbehind statement. Like the positive lookbehind statements, the contained pattern should only match strings of some fixed length. Patterns that start with negative lookbehind statements may match at the beginning of the search string.

+2
source

It looks like you really got two concepts in the Product Name line: Product and version, with space and underscore separator, from the examples you provided. Use a regular expression that separates the two concepts and searches for the word beta only in the version concept.

0
source
 "[Bb]eta(\d+|$|version)|^[Bb]eta " 

test with grep:

 kent$ cat a Crome beta Crome_beta Crome beta2 Crome_betaversion Crome 3beta CromeBeta2.3 tibetans product Beta Crome 4 kent$ grep -P "[Bb]eta(\d+|$|version)|^[Bb]eta " a Crome beta Crome_beta Crome beta2 Crome_betaversion Crome 3beta CromeBeta2.3 Beta Crome 4 
0
source

All Articles