Suppose I have lines like this:
"DT NN IN NN""DT RB JJ NN""DT JJ JJ NN""DT RB RB NN NN""DT RB RB"
So, I have a list of lines:
list = ["DT NN IN NN", "DT RB JJ NN", "DT JJ JJ NN", "DT RB RB NN NN", "DT RB RB"]
I have the following code:
pattern = "(?:DT\s+)+([?:RB\s+|?:JJ\s+])+(?:NN\s+)*NN$"
for item in list:
m = re.match(pattern, item)
if m:
print item
What I want from patternshould match the lines starting with DT(appears one or more times), has either RB, or JJ(appears one or more times), but not both, and then to complete with NN(again, appearing one or more time).
So, in the end result, I should get 3 and 4 printed on the screen. However, with my regular expression, in addition, I get 2, which I do not want. How can I change patternit to work? How to replace pipe (OR) with XOR?