Is there a better way to find anagrams using SQL?

Given the following database table:

WORDS alphagram....varchar(15) word.........varchar(15) PK length.......int 

Where:

  • 'alphagram' are the letters of the word in alphabetical order (for example, AEINNRTT is the alphabet INTRANET)
  • The primary key is the word, and there are indexes in alphabetical order and length

I found a way to find the anagrams of a given string of letters through SQL. For example, to find AEINNRTT anagrams, this will work:

 select alphagram, word, definition from words where length = 8 and alphagram like '%A%' and alphagram like '%E%' and alphagram like '%I%' and alphagram like '%NN%' and alphagram like '%R%' and alphagram like '%TT%' 

This will return 1 row (for INTRANET)

And if I wanted to include a known number of wildcards, for example, how many words with INTRANET + are empty (wildcard), I just need to change the β€œlength” to the total number of letters + the number of wildcards

eg.

 select alphagram, word, definition from words where length = 9 and alphagram like '%A%' and alphagram like '%E%' and alphagram like '%I%' and alphagram like '%NN%' and alphagram like '%R%' and alphagram like '%TT%' 

... will return 8 rows (ENTERTAIN, INSTANTER, INTEGRANT, INTRANETS, ITINERANT, NATTRING, RATTENING and TRANSIENT)

My question is this: is there a more efficient way to do this with SQL only?

This works pretty fast in SQLServer, but pretty slow in SqlLite. I understand that% xxx% search is not fast.

+6
sql
source share
4 answers

You can create a kind of index column for each record that has all the letters of the word in alphabetical order, and then compare them. Each anagram will have the same index value.

+2
source share

One idea is to do it like this (for a given word length):

  • split the word into separate characters (possibly using SUBSTRING() in a loop, although the best approach is probably a separate SO target question)

  • generate all permutations

  • PROFIT!

Although, as the commentator said, I would strongly advise you to do this outside of SQL, if you have no reason for this, or you just do it to challenge your skills.

0
source share

The best way I decided to do this: I created a ... z columns and analyzed each word and counted the number of occurrences of a given letter and put it under the corresponding column, then when I entered a word for disassembly, I counted every occurrence of each letter for this words and compared it with words in the database. It can be a little difficult to understand, let me know if you need further clarification.

0
source share

This question is old and I might misunderstand something, but it looks like your first request might be

 select alphagram, word, definition from words where length = 8 and alphagram = 'AEINNRTT' and word <> alphagram 

This works because all anagrams of the same length have the same letter. It will use an index in the alphabet and be very fast.

for a case with a length> 8, it is more difficult to have a simple script, but I would try to add 26 columns to the table: alpha_a, alpha_b, .. containing the number of each letter in the alphabet. Each of them can have an index, and then search

 select alphagram, word, definition from words where length = 9 and alpha_a >= 1 and alpha_e >= 1 and alpha_i >= 1 and alpha_n >= 2 and alpha_r >= 1 and alpha_t >= 2 
0
source share

All Articles