Without any idea of tuning, I would start by assuming that I have a list of names and their frequencies, then create dictionary matching prefixes for the set of names with this prefix, and then include each set in the list with only 5 wrt frequency names on top.
Using a list of names of only boys obtained from here , it is massaged to create a text file where each line represents an integer frequency of occurrence, some spaces, and then the following name:
8427 OLIVER 7031 JACK 6862 HARRY 5478 ALFIE 5410 CHARLIE 5307 THOMAS 5256 WILLIAM 5217 JOSHUA 4542 GEORGE 4351 JAMES 4330 DANIEL 4308 JACOB ...
The following code builds a dictionary:
from collections import defaultdict MAX_SUGGEST = 5 def gen_autosuggest(name_freq_file_name): with open(name_freq_file_name) as f: name2freq = {} for nf in f: freq, name = nf.split() if name not in name2freq: name2freq[name] = int(freq) pre2suggest = defaultdict(list) for name, freq in sorted(name2freq.items(), key=lambda x: -x[1]):
If you give dict your prefix, it will return your sentences (along with their frequencies in this case, but if necessary, they can be discarded:
>>> len(pre2suggest) 15303 >>> pre2suggest['OL'] [('OLIVER', 8427), ('OLLIE', 1130), ('OLLY', 556), ('OLIVIER', 175), ('OLIWIER', 103)] >>> pre2suggest['OLI'] [('OLIVER', 8427), ('OLIVIER', 175), ('OLIWIER', 103), ('OLI', 23), ('OLIVER-JAMES', 16)] >>>
Look no attempts :-)
Time killer
If it takes a long time to execute, you can pre-compute the dict file and save it to a file, and then load the pre-computed values when necessary, using the brine module:
>>> import pickle >>> >>> savename = 'pre2suggest.pcl' >>> with open(savename, 'wb') as f: pickle.dump(pre2suggest, f) >>>