Is there a better way to find anagrams using SQL?

Question

Is there a better way to find anagrams using SQL?

Given the following database table:

WORDS alphagram....varchar(15) word.........varchar(15) PK length.......int

Where:

'alphagram' are the letters of the word in alphabetical order (for example, AEINNRTT is the alphabet INTRANET)
The primary key is the word, and there are indexes in alphabetical order and length

I found a way to find the anagrams of a given string of letters through SQL. For example, to find AEINNRTT anagrams, this will work:

 select alphagram, word, definition from words where length = 8 and alphagram like '%A%' and alphagram like '%E%' and alphagram like '%I%' and alphagram like '%NN%' and alphagram like '%R%' and alphagram like '%TT%'

This will return 1 row (for INTRANET)

And if I wanted to include a known number of wildcards, for example, how many words with INTRANET + are empty (wildcard), I just need to change the “length” to the total number of letters + the number of wildcards

eg.

 select alphagram, word, definition from words where length = 9 and alphagram like '%A%' and alphagram like '%E%' and alphagram like '%I%' and alphagram like '%NN%' and alphagram like '%R%' and alphagram like '%TT%'

... will return 8 rows (ENTERTAIN, INSTANTER, INTEGRANT, INTRANETS, ITINERANT, NATTRING, RATTENING and TRANSIENT)

My question is this: is there a more efficient way to do this with SQL only?

This works pretty fast in SQLServer, but pretty slow in SqlLite. I understand that% xxx% search is not fast.

+6

sql

eponymous23 Oct 21 '10 at 20:41

source share

4 answers

Glenner003 · Answer 1 · 2010-10-22T12:27:34+0000

You can create a kind of index column for each record that has all the letters of the word in alphabetical order, and then compare them. Each anagram will have the same index value.

DVK · Answer 2 · 2010-10-21T20:53:55+0000

One idea is to do it like this (for a given word length):

split the word into separate characters (possibly using SUBSTRING() in a loop, although the best approach is probably a separate SO target question)
generate all permutations
PROFIT!

Although, as the commentator said, I would strongly advise you to do this outside of SQL, if you have no reason for this, or you just do it to challenge your skills.

Akhil reddy · Answer 3 · 2011-07-03T18:42:44+0000

The best way I decided to do this: I created a ... z columns and analyzed each word and counted the number of occurrences of a given letter and put it under the corresponding column, then when I entered a word for disassembly, I counted every occurrence of each letter for this words and compared it with words in the database. It can be a little difficult to understand, let me know if you need further clarification.

Jerome WAGNER · Answer 4 · 2016-01-15T13:05:59+0000

This question is old and I might misunderstand something, but it looks like your first request might be

 select alphagram, word, definition from words where length = 8 and alphagram = 'AEINNRTT' and word <> alphagram

This works because all anagrams of the same length have the same letter. It will use an index in the alphabet and be very fast.

for a case with a length> 8, it is more difficult to have a simple script, but I would try to add 26 columns to the table: alpha_a, alpha_b, .. containing the number of each letter in the alphabet. Each of them can have an index, and then search

 select alphagram, word, definition from words where length = 9 and alpha_a >= 1 and alpha_e >= 1 and alpha_i >= 1 and alpha_n >= 2 and alpha_r >= 1 and alpha_t >= 2

Is there a better way to find anagrams using SQL?

More articles: