Scrambler with maximum score.

I was asked a question

You are given a list of characters, a score associated with each character, and a dictionary of valid words (for example, a regular English dictionary). You need to form a word from a list of characters so that the score is maximum and the word is valid.

I might have thought of a solution that included three words made from a dictionary and a backtrack with available characters, but I could not formulate it correctly. Does anyone know the right approach or came up with one?

+5
source share
6 answers

First, sort through the letters and count how many times you have each of the characters in the English alphabet. Store this in static, say, in a char array of size 26, where the first cell corresponds to a second to b and so on. Name this original cnt array. Now we sort through all the words and for each word we form a similar array of size 26. For each of the cells in this array, check if there are at least as many cases in cnt . If so, you can write a word, otherwise you cannot. If you can write a word, you calculate its score and maximize the score in the auxiliary variable.

This approach will have linear complexity, and it is also the best asymptotic complexity you can have (after all the data you entered is linear in size).

+3
source

Inspired by the Respondent programmer (at first I thought the approach was O (n!), So I dropped it). He needs to configure O (nr words) and then O (2 ^ (characters in the query)) for each question. What is 256 for scrabble, so it is likely that the person asked what you expect.

The first observation is that the order of characters in the query or word does not matter, so you want to process your list into a set of character bags. The way to do this is to "sort" the word so "bac", "cab" becomes "abc".

Now you take your request and repeat all possible answers. All keep / discard options for each letter. It is easier to see in binary form: 1111 to save everything, 1110 to abandon the last letter ...

Then check if every opportunity exists in your dictionary (hash map for simplicity), and then return it with the maximum score.

 import nltk from string import ascii_lowercase from itertools import product scores = {c:s for s, c in enumerate(ascii_lowercase)} sanitize = lambda w: "".join(c for c in w.lower() if c in scores) anagram = lambda w: "".join(sorted(w)) anagrams = {anagram(sanitize(w)):w for w in nltk.corpus.words.words()} while True: query = input("What do you have?") if not query: break # make it look like our preprocessed word list query = anagram(sanitize(query)) results = {} # all variants for our query for mask in product((True, False), repeat=len(query)): # get the variant given the mask masked = "".join(c for i, c in enumerate(query) if mask[i]) # check if it valid if masked in anagrams: # score it, also getting the word back would be nice results[sum(scores[c] for c in masked)] = anagrams[masked] print(*max(results.items())) 
+2
source

Create a search query of only the sorted-anagram of each dictionary word. This is a one-time cost.

By sorted anagram, I mean: if the word eat , you represent it as aet . This is the word tea , you represent it as aet , bubble represents as bbbelu , etc.

Since this is a scrabble, assuming you have 8 tiles (let's say you want to use them from the board), you will need to test the possibilities of 2 ^ 8 as much as possible.

For any subset of tiles in the set of 8, you sort the tiles and search the trie anagram.

There are no more than 2 ^ 8 such subsets, and this can potentially be optimized (in the case of repeating fragments) by creating a smarter generation of subsets.

If this is a more general problem, where 2 ^ {number of tiles} can be much larger than the total number of anagram words in the dictionary, it would be better to use the frequency, as in Ivaylo's answer, and search queries could potentially be optimized using multidimensional range queries. (In this case, 26 measurements!)

Sorry, this may not help you as you are (I suppose you are trying to do some exercise and have limitations), but I hope this helps future readers who do not have these restrictions.

+1
source

Below is the brute force approach in python using an English dictionary containing 58,109 words. This approach is actually quite fast, with an interval of about 0.3 seconds for each run.

 from random import shuffle from string import ascii_lowercase import time def getValue(word): return sum(map( lambda x: key[x], word)) if __name__ == '__main__': v = range(26) shuffle(v) key = dict(zip(list(ascii_lowercase), v)) with open("/Users/james_gaddis/PycharmProjects/Unpack Sentance/hard/words.txt", 'r') as f: wordDict = f.read().splitlines() f.close() valued = map(lambda x: (getValue(x), x), wordDict) print max(valued) 

Here is the dictionary I used, with one portable entry deleted for convenience.

0
source

Is it possible to assume that the dictionary is fixed, and the score is fixed and that only the available letters will change (as in scrabble)? Otherwise, I think it’s better not to look at every word of the dictionary, as suggested earlier.

So, let's say that we are in this setting. Select order <for letter costs. For example, Q> Z> J> X> K> ..> A> E> I ..> U.

Replace the dictionary D with the dictionary D 'made up of anagrams of the words D with letters ordered in the previous order (for example, the word buzz is displayed on zzbu), as well as deleting duplicates and words of length> 8, if you have no more than 8 letters.

Then build a trie with the words D ', where the child nodes are ordered by the value of their letters (so the first descendant of the root will be Q, the second Z, .., the last child U). On each node trie also keep the maximum value of the word passing through this node.

Given the set of available characters, you can first examine the trie in depth, moving from left to right and storing the current best value in memory. Examine the branches, the value of node is greater than the current best value. Thus, you will study only a few branches after the first (for example, if you have Z in your game, any branch starting with the letter of one point is examined when A is discarded, because it will pick up no more than 8x1, which is less than Z) . I bet you will only explore a few branches each time.

0
source

If the number of dictionary entries is relatively small (up to several million), you can use brute force: for each word, create a 32-bit mask. Data preprocessing: set one bit if the letter a / b / c /.../ z is used. For the six most common English characters, this epoin sets another bit if the letter is used twice.

Create a similar bitmap for the letters you have. Then scan the dictionary for words, where all the bits that are needed for the word are set in the bitmap for the available letters. You have reduced the problem to words where you have all the necessary characters once, and the six most common characters twice if they are needed twice. You still have to check if the word can be formed if you have the word bubble, and the first test only says that you have the letters b, u, l, e, but not necessarily 3 b.

In addition, by sorting the word list by point values ​​before performing the check, the first hit is the best. This has another advantage: you can count the points that you have, and do not interfere with checking words with a lot of points. For example, a bubble has 12 points. If you have only 11 points, then there is no need to check this word at all (have a small table with indices of the first word with any given number of points).

To improve anagrams: only different bitmasks with the same number of points are stored in the table (therefore, we will have entries for bubbles and blue, because they have different point values, but not for the team and assistant). Then save all possible words, possibly more than one, for each bitmask and check all of them. This should reduce the number of bit masks to check.

0
source

All Articles