Perhaps you are looking for something like:
#!/usr/bin/env python def match_prefix(pfx,seq): '''return subset of seq that starts with pfx''' results = list() for i in seq: if i.startswith(pfx): results.append(i) return results def extract_prefixes(lngth,seq): '''return all prefixes in seq of the length specified''' results = dict() lngth += 1 for i in seq: if i[0:lngth] not in results: results[i[0:lngth]] = True return sorted(results.keys()) def gen_prefix_indexed_list(depth,seq): '''return a dictionary of all words matching each prefix up to depth keyed on these prefixes''' results = dict() for each in range(depth): for prefix in extract_prefixes(each, seq): results[prefix] = match_prefix(prefix, seq) return results if __name__ == '__main__': words='''Apple Ape Arc Abraid Bridge Braide Bray Boolean'''.split() test = gen_prefix_indexed_list(2, words) for each in sorted(test.keys()): print "%s:\t\t" % each, print ' '.join(test[each])
That is, you want to generate all the prefixes that are present in the list of words between one and some number that you specify (2 in this example). Then you want to create an index of all the words corresponding to each of these prefixes.
I am sure there are more elegant ways to do this. For a quick and easy-to-explain approach, I just built this from a simple upstream functional decomposition of an apparent specification. Of the final values of the result are lists, each of which corresponds to a given prefix, then we start with a function to filter out such matches from our inputs. If the keys of the final result are all the prefixes between 1 and some N that appear on our input, then we have a function to extract them. Then our specification. this is a very simple nested loop around this.
Of course, this socket loop can be a problem. Such things are usually equated with an efficiency of O (n ^ 2). As shown, this will iterate over the original list C * N * N times (C is a constant number representing prefixes of length 1, 2, etc., and N is the length of the list).
If this decomposition provides the desired semantics, we can look at increasing efficiency. An obvious approach would be lazy to generate dictionary keys, since we repeat once over the list ... for each word, for each prefix length, generate a key ... attach this word to the list / value stored on this key. and move on to the next word.
There is still a nested loop ... but this is a short loop for each key / prefix length. This alternative design has the advantage of allowing us to iterate over word lists from any iterative, not just a list in memory. Thus, we could iterate over the lines of the file, the results obtained from the database query, etc. --- Without imposing memory overhead on saving the entire source list of words in memory.
Of course, we still keep the dictionary in mind. However, we can also change this, separate the logic from input and storage. When we add each input to different prefix / key values, we don’t care whether they are lists in a dictionary or strings in a set of files, or values that are retrieved from (and discarded back to) DBM or other key / value storage (for example, some type CouchDB or another "clustered / noSQL database".
The implementation of this is left as an exercise for the reader.