Python regex: XOR operator

Suppose I have lines like this:

  • "DT NN IN NN"
  • "DT RB JJ NN"
  • "DT JJ JJ NN"
  • "DT RB RB NN NN"
  • "DT RB RB"

So, I have a list of lines:

list = ["DT NN IN NN", "DT RB JJ NN", "DT JJ JJ NN", "DT RB RB NN NN", "DT RB RB"]

I have the following code:

pattern = "(?:DT\s+)+([?:RB\s+|?:JJ\s+])+(?:NN\s+)*NN$"
for item in list:
    m = re.match(pattern, item)
    if m:
        print item

What I want from patternshould match the lines starting with DT(appears one or more times), has either RB, or JJ(appears one or more times), but not both, and then to complete with NN(again, appearing one or more time).

So, in the end result, I should get 3 and 4 printed on the screen. However, with my regular expression, in addition, I get 2, which I do not want. How can I change patternit to work? How to replace pipe (OR) with XOR?

+4
3

, RB JJ. , . , | (pipe) (+). :

pattern = "(?:DT\s+)+(?:(RB\s+)+|(JJ\s+)+)(?:NN\s+)*NN$"

, (?:<something>) . , " , <something> , , . - . item ( ' ). , , :

pattern = "(DT\s+)+((RB\s+)+|(JJ\s+)+)(NN\s*)*NN$"

, 0 , , . .

+2

[...] - , , ?, : + |, . -, , .

RB, JJ:

pattern = r"(?:DT\s+)+(?:(?:RB\s+)+|(?:JJ\s+)+)NN"

; .

- : https://regex101.com/r/iH4lE6/1

, , ; (...) (?:...) .

, , DT , NN (NN\s+)*NN$, , , :

pattern = r"^DT(\s+DT)*((\s+RB)+|(\s+JJ)+)(\s+NN)+$"

- : https://regex101.com/r/iH4lE6/2

+1

, , :

  • , DT, RB, NN s:

    ^DT(\s+DT)*(\s+RB)+(\s+NN)+$
    
  • , DT, JJ, NN s:

    ^DT(\s+DT)*(\s+JJ)+(\s+NN)+$
    

( ) :

^((DT(\s+DT)*(\s+RB)+(\s+NN)+)|(DT(\s+DT)*(\s+JJ)+(\s+NN)+))$

, :

^DT(\s+DT)*((\s+RB)+|(\s+JJ)+)(\s+NN)+$

Regexper:

visual representation of regular expression

+1
source

All Articles