Scrabble word finder: building a trie, storing a trie using trie?

What am I trying to do:

  • Create a mobile web application where the user can get help finding words to play when playing scrabble.
  • Users get word sentences by entering any number of letters and 0 or more wildcards

How am I trying to do this:

  • Using a MySQL database with a dictionary containing more than 400 thousand words
  • Using ASP.NET with C # as a server-side programming language
  • Using HTML5, CSS and Javascript

My current plan:

  • Creating Trie with all the words from the database, so that I can quickly and accurately search for words depending on the input of user letters / substitutions

Having a plan doesn’t work if you cannot complete it, here is what I need to help:

  • How do I create Trie from a database? (UPDATE: I want to generate Trie using words that are already in my database, after that I will no longer use the database to match words)
  • How to save Trie for quick and easy access? (UPDATE: I can delete my database)
  • How to use C # to search for words using Trie based on letters and wildcards?

Finally:
Any help is much appreciated, I'm still new to C # and MySQL, so please be careful.

Thank you very much!

+7
source share
1 answer

First of all, consider the limitations of the problem. You want to keep the word list for the game in a data structure that effectively supports the anagram problem. That is, given the "rack" of n letters, all words are n-or-less letters in the list of words that can be made from this rack. the list of words will be about 400 thousand words, and probably about one to ten megabytes of string data when uncompressed.

A trie is the classic data structure used to solve this problem, because it combines both memory efficiency and search efficiency. With a list of words of about 400 thousand words of reasonable length, you should be able to save the trie in mind. (Unlike the b-tree solution, where you store most of the tree on disk, because it is too large to fit into memory immediately).

Basically this is nothing more than a 26-ary tree (assuming you use the Latin alphabet), where each node has a letter and one more bit on each node that says that this is the end of the word.

So let's draw the data structure:

class TrieNode { char Letter; bool IsEndOfWord; List<TrieNode> children; } 

This, of course, is just a sketch; you probably want to make them your own accessories and property constructors and much more. Also, perhaps a flat list is not the best data structure; maybe some kind of dictionary is better. My advice is to run it first and then measure its performance, and if that is unacceptable, experiment with making changes to improve its performance.

You can start with an empty trie:

 TrieNode root = new TrieNode('^', false, new List<TrieNode>()); 

That is, it is the “root” trie node that represents the beginning of the word.

How do you add the word "AA", the first word in the Scrabble dictionary? First make a node for the first letter:

 root.Children.Add('A', false, new List<TrieNode>()); 

Ok now our twi

 ^ | A 

Now add node for the second letter:

 root.Children[0].Children.Add(new trieNode('A', true, new List<TrieNode>())); 

Now our twi

 ^ | A | A$ -- we notate the end of word flag with $ 

Great. Now suppose we want to add AB. We already have a node for "A", so add a "B $" node to it:

 root.Children[0].Children.Add(new trieNode('B', true, new List<TrieNode>()); 

and now we have

  ^ | A / \ A$ B$ 

Continue like this. Of course, instead of writing "root.Children [0] ..." you will write a loop that looks for trie to see if the node you want exists, and if not, create it.

To save the trick to disk - to be honest, I would just save the word list as a regular text file and rebuild three when you need to. This should not take more than 30 seconds or so, and then you can reuse the trie in memory. If you want to save trie in some format that looks more like trie, it shouldn't be hard to find a serialization format.

To search for a trie to match a rack, the idea is to examine each part of the trie, but cut out areas where the rack cannot match. If you do not have an “A” on the rack, there is no need to lower the “A” node. I sketched the search algorithm in your previous question.

I got a functional style implementation of a constant syntax tree that I mean blogging about for a while, but did not bother with it. If I eventually post, I will update this question.

+17
source

All Articles