How do you use regexp in list comprehension in Python?

I am trying to find all the positions of a row index in a list of words, and I want the values ​​to be returned as a list. I would like to find a string if it is on its own, or if it is preceded or followed by punctuation, but not if it is a substring of a larger word.

The following code only captures the β€œcow” and skips the β€œtest, cow” and β€œcow”.

myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow'] myString = 'cow' indices = [i for i, x in enumerate(myList) if x == myString] print indices >> 5 

I tried changing the code to use a regex:

 import re myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow'] myString = 'cow' indices = [i for i, x in enumerate(myList) if x == re.match('\W*myString\W*', myList)] print indices 

But this gives an error: expected line or buffer

If someone knows what I'm doing wrong, I would be very happy to hear. I feel this is due to the fact that I am trying to use a regex there when it expects a string. Is there a solution?

The result I'm looking for should look like this:

 >> [0, 4, 5] 

thanks

+7
source share
2 answers

You do not need to assign the result of match back to x . And your match should be on x , not list .

Also, you need to use re.search instead of re.match , since your regular expression pattern '\W*myString\W*' will not match the first element. This is because test; does not match \W* . In fact, you only need to check for the presence of the next and previous character, and not the full line.

So you can use word boundaries around the line:

 pattern = r'\b' + re.escape(myString) + r'\b' indices = [i for i, x in enumerate(myList) if re.search(pattern, x)] 
+15
source

There are several problems with your code. First, you need to map expr to the list item ( x ), not the entire list ( myList ). Secondly, to insert a variable into an expression, you must use + (string concatenation). And finally, use raw literals ( r'\W ) to correctly interpret the slash in expr:

 import re myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow'] myString = 'cow' indices = [i for i, x in enumerate(myList) if re.match(r'\W*' + myString + r'\W*', x)] print indices 

If it is likely that myString contains special regular expression characters (for example, a slash or a period), you also need to apply re.escape to it:

 regex = r'\W*' + re.escape(myString) + r'\W*' indices = [i for i, x in enumerate(myList) if re.match(regex, x)] 

As pointed out in the comments, perhaps the best option:

 regex = r'\b' + re.escape(myString) + r'\b' indices = [i for i, x in enumerate(myList) if re.search(regex, x)] 
+4
source

All Articles