Finding duplicate operands using regex - Python

I am trying to find through file expressions such as A * B.

A and B may be any of [AZ] [AZ] [0-9] and may include < > ( ) [ ] _ . etc., but not with commas, semicolons, spaces, new lines, or any other arithmetic operator (+ - \ *) . These are 8 separators. There may also be spaces between A and * and B. Also, the number of opening brackets should be the same as the closing brackets in and B.

I tried unsuccessfully something like this (without considering the statements inside A and B):

 import re fp = open("test", "r") for line in fp: p = re.compile("( |,|;)(.*)[*](.*)( |,|;|\n)") m = p.match(line) if m: print 'Match found ',m.group() else: print 'No match' 

Example 1:

(A1 * B1.list(), C * D * E) should give 3 matches:

  • A1 * B1.list ()
  • C * d
  • D * e

An extension of the problem statement may be that A and B allow commas, semicolons, spaces, newlines or any other arithmetic operator (+ - \ *) if inside backets:

Example 2:

(A * B.max(C * D, E)) must give 2 matches:

  • A * B.max (C * D, E)
  • C * d

I am new to regular expressions and am curious to find a solution to this.

+4
source share
1 answer

Regular expressions have limitations. The boundary between regular expressions and text syntax can be tight. IMO using a parser is a more robust solution in your case.

The examples in the question suggest recursive patterns. The parser is again superior to the regix flavor in this area.

Take a look at this suggested solution: Parsing equations in Python .

+1
source

All Articles