The r before the regular expression in the search () call indicates that the regular expression is a raw string. This allows you to use the backslash in a regular expression as regular characters, rather than in an escape sequence of characters. Let me explain ...
Before the re module search method processes the lines passed to it, the Python interpreter performs an initial pass through the line. If there are backslashes in the string, the Python interpreter must decide whether each of them is part of the Python escape sequence (e.g. \ n or \ t) or not.
Note: at the moment, Python does not care about whether the '\' is a regular expression meta-character.
If the "\" is followed by a recognized Python escape character (t, n, etc.), then the backslash and escape character are replaced with the actual Unicode or 8-bit character. For example, '\ t' will be replaced by the ASCII character for the tab. Otherwise, it is transmitted and interpreted as the '\' character.
Think about the following.
>>> s = '\t' >>> print ("[" + s + "]") >>> [ ] // an actual tab character after preprocessing >>> s = '\d' >>> print ("[" + s + "]") >>> [\d] // '\d' after preprocessing
Sometimes we want to include a character sequence in a string that includes '\', without interpreting Python as an escape sequence. To do this, we avoid "\" with "\". Now that Python sees "\", it replaces the two backslashes with a single "\" character.
>>> s = '\\t' >>> print ("[" + s + "]") >>> [\t] // '\t' after preprocessing
After the Python interpreter passes both strings, they will be passed to the re module search method. The search method parses the regular expression string to determine the regular expression metacharacters.
Now '\' is also a special regular expression metacharacter and is interpreted as one IF it is not escaped during the execution of the re search () method.
Consider the following call.
>>> match = re.search('a\\t','a\\t') //Match is None
There are no matches. Why? Let's look at the lines after the Python interpreter does this.
String 1: 'a\t' String 2: 'a\t'
So why is the match equal to None? When search () interprets line 1 because it is a regular expression, the backslash is interpreted as a metacharacter, not a regular character. The backslash in line 2, however, is not in the regular expression and has already been processed by the Python interpreter, so it is interpreted as a regular character.
Therefore, the search () method searches in the string 'a \ t' for escape-t that do not match.
To fix this, we can say that the search () method should not interpret '\' as a metacharacter. We can do this by avoiding it.
Consider the following call.
>>> match = re.search('a\\\\t','a\\t') // Match contains 'a\t'
Again, let's look at the lines after the Python interpreter has passed.
String 1: 'a\\t' String 2: 'a\t'
Now that the search () method processes the regular expression, it sees that the second backslash is escaped first and should not be treated as a metacharacter. Therefore, it interprets the line as 'a \ t', which corresponds to line 2.
An alternative way to make search () look at the "\" character is to put r in front of the regular expression. This tells the Python interpreter NOT to pre-process the string.
Keep this in mind.
>>> match = re.search(r'a\\t','a\\t') // match contains 'a\t'
Here, the Python interpreter does not change the first line, but processes the second line. Lines passed to search ():
String 1: 'a\\t' String 2: 'a\t'
As in the previous example, search interprets '\' as a separate character '\', and not as a metacharacter, so it matches line 2.