I have a selenium / python project that uses a regex to search for html elements. These element attributes sometimes include Danish / Norwegian ÆØÅ characters. The problem is in this snippet below:
if (re.match(regexp_expression, compare_string)): result = True else : result = False
The regex_expression and compare_string processed before the regular expression matches. If I print them before the code snippet above is executed, as well as print the result, I get the following output:
Regex_expression: [^log på$] compare string: [log på] result = false
I put brackets to make sure there were no spaces. They are only part of the print statement, and are not part of the String variables.
If I try to reproduce the problem in a separate script, for example:
#!/usr/bin/env python # -*- coding: utf-8 -*- import re regexp_expression = "^log på$" compare_string = "log på" if (re.match(regexp_expression, compare_string)): print("result true") result = True else : print("result = false") result = False
Then the result is correct.
How can it be? To make it even weirder, it worked before, and I'm not sure if I edited what made it go boom ...
The full module of the regular expression comparison method is given below. I myself have not encoded this, so I do not know 100% of all the substitution and string manipulation operators, but I think it doesn’t matter when I can check the strings directly before the bad match method below ...
#!/usr/bin/env python # -*- coding: utf-8 -*- import re def regexp_compare(regexp_expression, compare_string): #final int DOTALL #try: // include try catch for "PatternSyntaxException" while testing/including a new symbol in this method.. #catch(PatternSyntaxException e): # System.out.println("Regexp>>"+regexp_expression) # e.printStackTrace() #*/ if(not compare_string.strip() and (not regexp_expression.strip() or regexp_expression.strip().lower() == "*".lower()) or (regexp_expression.strip().lower() == ".*".lower())): print("return 1") return True if(not compare_string or not regexp_expression): print("return 2") return False regexp_expression = regexp_expression.lower() compare_string = compare_string.lower() if(not regexp_expression.strip()): regexp_expression = "" if(not compare_string.strip() and (not regexp_expression.strip() or regexp_expression.strip().lower() == "*".lower()) or (regexp_expression.strip().lower() == ".*".lower())): regexp_expression = "" else: regexp_expression = regexp_expression.replace("\\","\\\\") regexp_expression = regexp_expression.replace("\\.","\\\\.") regexp_expression = regexp_expression.replace("\\*", ".*") regexp_expression = regexp_expression.replace("\\(", "\\\\(") regexp_expression = regexp_expression.replace("\\)", "\\\\)") regexp_expression_arr = regexp_expression.split("|") regexp_expression = "" for i in range(0, len(regexp_expression_arr)): if(not(regexp_expression_arr[i].startswith("^"))): regexp_expression_arr[i] = "^"+regexp_expression_arr[i] if(not(regexp_expression_arr[i].endswith("$"))): regexp_expression_arr[i] = regexp_expression_arr[i]+"$" regexp_expression = regexp_expression_arr[i] if regexp_expression == "" else regexp_expression+"|"+regexp_expression_arr[i] result = None print("Regex_expression: [" + regexp_expression+"]") print("compare string: [" + compare_string+"]") if (re.match(regexp_expression, compare_string)): print("result true") result = True else : print("result = false") result = False print("return result") return result