Can you search backward from offset using Python regex?

Given a string and character offsets inside that string, can I search in reverse order with Python regex?

The real problem I'm trying to solve is getting the appropriate phrase at a specific offset within the string, but I need to match the first instance before this offset.

In a situation where I have a regular expression that one character is long (ex: word boundary), I use a solution in which I change the line.

my_string = "Thanks for looking at my question, StackOverflow." offset = 30 boundary = re.compile(r'\b') end = boundary.search(my_string, offset) end_boundary = end.start() end_boundary 

Output: 33

 end = boundary.search(my_string[::-1], len(my_string) - offset - 1) start_boundary = len(my_string) - end.start() start_boundary 

Output: 25

 my_string[start_boundary:end_boundary] 

Exit: "question"

However, this β€œreverse” method will not work if I have a more complex regular expression that can include multiple characters. For example, if I wanted to match the first instance of "ing" that appears before the specified offset:

 my_new_string = "Looking feeding dancing prancing" offset = 16 # on the word dancing m = re.match(r'(.*?ing)', my_new_string) # Except looking backwards 

Ideal way out: power

I can probably use other approaches (split the file into lines and scroll backward), but using a regex back seems conceptually a simpler solution.

+8
python regex
source share
2 answers

Using a positive lookbehind to make sure the word must have at least 30 characters in front of it:

 # re like: r'.*?(\w+)(?<=.{30})' m = re.match(r'.*?(\w+)(?<=.{%d})' % (offset), my_string) if m: print m.group(1) else: print "no match" 

Otherwise, a negative lookbehind may help:

 my_new_string = "Looking feeding dancing prancing" offset = 16 m = re.match(r'.*(\b\w+ing)(?<!.{%d})' % offset, my_new_string) if m: print m.group(1) 

which is greedy first, matches any character, but goes back until it can match 16 characters back ( (?<!.{16}) ).

+7
source share

We can use the regex engine preference for python for greed (like, actually) and say that we want as many characters as possible, but no more than 30, then ....

The corresponding regular expression may be r'^.{0,30}(\b)' . We want to start the first capture.

 >>> boundary = re.compile(r'^.{0,30}(\b)') >>> boundary.search("hello, world; goodbye, world; I am not a pie").start(1) 30 >>> boundary.search("hello, world; goodbye, world; I am not").start(1) 30 >>> boundary.search("hello, world; goodbye, world; I am").start(1) 30 >>> boundary.search("hello, world; goodbye, pie").start(1) 26 >>> boundary.search("hello, world; pie").start(1) 17 
+1
source share

All Articles