How to increase the performance of this recursive function?

I am trying to write a function that will look for str for substr, given the various possibilities of writing strange letters, such as æ, ø, å in Danish. For example, you can search for “Ålborg,” and the function will return true if there is, say, “Aalborg” on the page.

The function below works, but the performance is unbearable. What would you recommend to increase productivity?

def danish_tolerating_search(substr, str): '''Figure out if substr is in str, taking into account possible deviations in writing letters æ, ø, å. æ <-> ae a ea ø <-> oe o å <-> aa ao ''' # normalize input substr = substr.lower().replace('aa',u'å') str = str.lower() # normalized recursive search # TODO fix perfomance def s(substr, str): if str.find(substr) >= 0: return True if substr.find(u'æ') >= 0: if s(substr.replace(u'æ','ae', 1), str): return True elif s(substr.replace(u'æ', 'a', 1), str): return True elif s(substr.replace(u'æ','ea', 1), str): return True if str.find(u'æ') >= 0: if s(substr, str.replace(u'æ','ae', 1)): return True elif s(substr, str.replace(u'æ', 'a', 1)): return True elif s(substr, str.replace(u'æ','ea', 1)): return True if substr.find(u'ø') >= 0: if s(substr.replace(u'ø','oe', 1), str): return True elif s(substr.replace(u'ø', 'o', 1), str): return True if str.find(u'ø') >= 0: if s(substr, str.replace(u'ø','oe', 1)): return True elif s(substr, str.replace(u'ø', 'o', 1)): return True if substr.find(u'å') >= 0: if s(substr.replace(u'å','aa', 1), str): return True elif s(substr.replace(u'å', 'a', 1), str): return True elif s(substr.replace(u'å', 'o', 1), str): return True if str.find(u'å') >= 0: if s(substr, str.replace(u'å','aa', 1)): return True elif s(substr, str.replace(u'å', 'a', 1)): return True elif s(substr, str.replace(u'å', 'o', 1)): return True return False return s(substr, str) 
+4
source share
3 answers

I think you should completely eliminate recursion. Instead of doing all this find and replace , you could, for example, select the "normal form" of your input lines, convert them accordingly (that is, replace these "ambiguous" characters) and make simple

 return substring in string_ 

Note also that you do not need to bind find and replace together, the latter is enough. If the search bar is not found, the replacement will simply not replace anything.

+3
source

Try

 #!/usr/bin/env python # -*- coding: utf-8 -*- import re def danish_tolerating_search(search, content): search = search.lower() content = content.lower() variants = { u'a': u'[aæå]', u'o': u'[oøå]', u'ae': u'(?:ae|æ)', u'ea': u'(?:ea|æ)', u'aa': u'(?:aa|å)', u'oe': u'(?:oe|ø)', u'\\å': u'(?:[oå]|aa?)', u'\\ø': u'(?:ø|oe?)', u'\\æ': u'(?:æ|ae?|ea)', } search = re.escape(search) search = re.sub(ur'[ae]a|[ao]e?|\\[åøæ]', lambda m: variants[m.group(0)], search) return re.search(search, content) is not None 

I did not test its performance because OP did not release. I just assume that the regex engine is better optimized than calling OP s() recursively and doing a lot of .find and .replace .

Here, the key letters in the search string are replaced with possible equivalent classes in terms of a regular expression, for example. Ålborg becomes (?:[oå]|aa?)lb[oøå]rg . This regular expression should include all possible variations equivalent to Ålborg (ålbørg "or" ålbårg "or" aalborg "or" aalbørg "or" aalbårg "or" alborg "or" albørg "or" albårg ", as indicated in comment 101100 ) Then the regular expression is simply distorted in the context.

+3
source

This is a classic parser example. Read about things like lex and yacc, you won’t need all their functions, but the principles still apply.

After that, use the python re module to match the corresponding regular expressions. If you need additional functionality, use the pyparsing libraries.

 def danish_tolerating_search(substr, str): '''Figure out if substr is in str, taking into account possible deviations in writing letters æ, ø, å. æ <-> ae a ea ø <-> oe o å <-> aa ao for all of these combinations replace with appropriate regex as in example ''' substring = substring.lower().replace('aa', '[ae]{1,2}') string = string.lower() re.search(substring, string) 
+1
source

All Articles