How many ordinary English words of 4 letters or more can you make from the letters of a given word (each letter can be used only once)

On the back of the calendar block, I found the following riddle:

How many ordinary English words of 4 letters or more can you make from the letters of the word "textbook" (each letter can be used only once).

My first solution I came across was:

from itertools import permutations with open('/usr/share/dict/words') as f: words = f.readlines() words = map(lambda x: x.strip(), words) given_word = 'textbook' found_words = [] ps = (permutations(given_word, i) for i in range(4, len(given_word)+1)) for p in ps: for word in map(''.join, p): if word in words and word != given_word: found_words.append(word) print set(found_words) 

This gives the result set(['tote', 'oboe', 'text', 'boot', 'took', 'toot', 'book', 'toke', 'betook']) , but on my machine it took more 7 minutes

My next iteration:

 with open('/usr/share/dict/words') as f: words = f.readlines() words = map(lambda x: x.strip(), words) given_word = 'textbook' print [word for word in words if len(word) >= 4 and sorted(filter(lambda letter: letter in word, given_word)) == sorted(word) and word != given_word] 

Which return the answer almost immediately, but give as the answer: ['book', 'oboe', 'text', 'toot']

What is the fastest, correct and most pythonic solution to this problem?

( edit : added my earlier permutation solution and its other output).

+7
source share
6 answers

I thought I would share this slightly interesting trick, although it requires even more code than the rest, and in fact it is not "pythonic". This will require much more code than other solutions, but should be pretty fast if I look at the time others need.

We do a little preprocessing to speed up the calculations. The basic approach is this: each letter in the alphabet assigns a prime number. For example. A = 2, B = 3, etc. Then we calculate the hash for each word in the alphabet, which is simply the product of simple representations of each character in the word. Then we save each word in a hash indexed dictionary.

Now, if we want to know which words are equivalent, say a textbook , we need to calculate the same hash for the word and find it in our dictionary. Usually (say, in C ++) we have to worry about overflows, but in python it is even simpler: every word in the list with the same index will contain exactly the same characters.

Here is a code with a little optimization, which in our case we only need to worry about the characters that also appear in this word, which means that we can do with a much smaller prime table than otherwise (the obvious optimization will only assign characters that meanings appear in the word in general - it was fast enough, so I wasn’t worried, and so we could pre-process only once and do it for several words). The simplest algorithm is useful quite often, so you should have it yourself;)

 from collections import defaultdict from itertools import permutations PRIMES = list(gen_primes(256)) # some arbitrary prime generator def get_dict(path): res = defaultdict(list) with open(path, "r") as file: for line in file.readlines(): word = line.strip().upper() hash = compute_hash(word) res[hash].append(word) return res def compute_hash(word): hash = 1 for char in word: try: hash *= PRIMES[ord(char) - ord(' ')] except IndexError: # contains some character out of range - always 0 for our purposes return 0 return hash def get_result(path, given_word): words = get_dict(path) given_word = given_word.upper() result = set() powerset = lambda x: powerset(x[1:]) + [x[:1] + y for y in powerset(x[1:])] if x else [x] for word in (word for word in powerset(given_word) if len(word) >= 4): hash = compute_hash(word) for equiv in words[hash]: result.add(equiv) return result if __name__ == '__main__': path = "dict.txt" given_word = "textbook" result = get_result(path, given_word) print(result) 

Running on my ubuntu wordlist (98k words) is pretty fast, but not what I would call pythonic, since it is basically a port of the C ++ algorithm. Useful if you want to compare multiple words in this way.

+3
source

How about this?

 from itertools import permutations, chain with open('/usr/share/dict/words') as fp: words = set(fp.read().split()) given_word = 'textbook' perms = (permutations(given_word, i) for i in range(4, len(given_word)+1)) pwords = (''.join(p) for p in chain(*perms)) matches = words.intersection(pwords) print matches 

which gives

 >>> print matches set(['textbook', 'keto', 'obex', 'tote', 'oboe', 'text', 'boot', 'toto', 'took', 'koto', 'bott', 'tobe', 'boke', 'toot', 'book', 'bote', 'otto', 'toke', 'toko', 'oket']) 
+3
source

There is an itertools.permutations generator with which you can collect all permutations of a sequence with a given length. This makes it easier:

 from itertools import permutations GIVEN_WORD = 'textbook' with open('/usr/share/dict/words', 'r') as f: words = [s.strip() for s in f.readlines()] print len(filter(lambda x: ''.join(x) in words, permutations(GIVEN_WORD, 4))) 

Edit # 1: Oh! He says “4 or more”;) Forget what I said!

Edit # 2: This is the second version I came with:

 LETTERS = set('textbook') with open('/usr/share/dict/words') as f: WORDS = filter(lambda x: len(x) >= 4, [l.strip() for l in f]) matching = filter(lambda x: set(x).issubset(LETTERS) and all([x.count(c) == 1 for c in x]), WORDS) print len(matching) 
+2
source

Create the entire power set, then check if the dictionary word is in the set (the order of the letters doesn't matter):

 powerset = lambda x: powerset(x[1:]) + [x[:1] + y for y in powerset(x[1:])] if x else [x] pw = map(lambda x: sorted(x), powerset(given_word)) filter(lambda x: sorted(x) in pw, words) 
+2
source

The following simply checks each word in the dictionary to find out if it has the appropriate length, and then if it is a permutation of the “textbook”. I borrowed the permutation. Verifying that the two lines are permutations of each other in Python, but slightly changed it.

 given_word = 'textbook' with open('/usr/share/dict/words', 'r') as f: words = [s.strip() for s in f.readlines()] matches = [] for word in words: if word != given_word and 4 <= len(word) <= len(given_word): if all(word.count(char) <= given_word.count(char) for char in word): matches.append(word) print sorted(matches) 

It ends almost immediately and gives the correct result.

+2
source

The permutations become very large for longer words. Try, for example, counter-revolution.

I would filter the dict for words from 4 to len (word) (8 for a tutorial). Then I will filter with the regular expression "oboe" .matches ("[tutorial] +").

The rest of the words, I would sort and compare them with the sorted version of your word ("beoo", "bekoottx") with the transition to the next index of the matching character to find the wrong number of characters:

 ("beoo", "bekoottx") ("eoo", "ekoottx") ("oo", "koottx") ("oo", "oottx") ("o", "ottx") ("", "ttx") => matched ("bbo", "bekoottx") ("bo", "ekoottx") => mismatch 

Since I don't speak python, I leave the implementation as an exercise for the audience.

+1
source

All Articles