You need to develop a heuristic that will get likely matches from the domain. The way I do this will first find a large body of text. For example, you can download Wikipedia.
Then take your case and combine all two adjacent words. For example, if your sentence:
quick brown fox jumps over the lazy dog
You will create a list:
quickbrown brownfox foxjumps jumpsover overthe thelazy lazydog
Each of them will have one score. When you disassemble your case, you will track the frequency pairs of every two words. In addition, for each pair you need to sort the original two words.
Sort this list by frequency, and then try to find matches in your domain based on these words.
Finally, do a domain check for the top two phrases that are not registered!
I think sites like DomainTool take a list of the highest words. Then they try to make out these words. Depending on the purpose, you may want to use MTurk to complete the task. Different people will analyze the same words in different ways and may not do this in proportion to how common the words are.
brianegge
source share