Python regex error: unbalanced brackets

I am new to python, so I have a dictionary with some keys in it and a string. I have to replace the string if there is a pattern found in the dictionary in the string. both the dictionary and the string are very large. I use regex to search for patterns.

Everything works fine until such a key appears: - ('or this' (-)', in which case python gives an error for unbalanced brackets.

This is what the code I wrote looks like:

somedict={'-(':'value1','(-)':'value2'} somedata='this is some data containing -( and (-)' for key in somedict.iterkeys(): somedata=re.sub(key, 'newvalue', somedata) 

Here is the error I received in the console

 Traceback (most recent call last): File "<console>", line 2, in <module> File "C:\Python27\lib\re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count) File "C:\Python27\lib\re.py", line 244, in _compile raise error, v # invalid expression error: unbalanced parenthesis 

I also tried many times using the regex compiler and searched a lot, but did not find anything that could solve the problem. Any help is appreciated.

+4
source share
2 answers

You need to exit the key using re.escape() :

 somedata = re.sub(re.escape(key), 'newvalue', somedata) 

otherwise, the content will be interpreted as a regular expression.

Here you do not use regular expressions at all, so you can simply use:

 somedata = somedata.replace(key, 'newvalue') 

If you want to replace only whole words (so that there are spaces or punctuation marks around them, at the beginning or end of the input line), you need some kind of boundary anchors, in which case it makes sense to use regular expressions. If all you have is alphanumeric words (plus underscores), \b will work:

 somedata = re.sub(r'\b{}\b'.format(re.escape(key)), 'newvalue', somedata) 

This puts \b before and after the line you want to replace, so baz in foo baz bar changed, but foo bazbaz bar is not.

For input, which includes non-alphanumeric words, you will need to match whitespace or start and concrete or end anchors with external and external images:

 somedata = re.sub(r'(?:^|(?<=\s)){}(?:$|(?=\s))'.format(re.escape(key)), 'newvalue', somedata) 

Here, the template (?:^|(?<=\s)) uses two anchors, a line-beginning binding and a look-behind statement to match places where there is either the beginning of a line or a space immediately to the left. Similarly (?:$|(?=\s) does the same for the other end, matching the end of the line or the position followed by a space.

+7
source

Do not use re for anything simple - just replace:

 somedata = somedata.replace(key, 'newvalue') 

However, if you are creating a regex from something, use re.escape to exit the special characters:

 somedata=re.sub(re.escape(key), 'newvalue', somedata) 
+1
source

All Articles