SQL Server: how to find the most commonly used row in a field?

I have a table with 1,000,000+ entries, and I would like to find the most common substring with a length of at least 5 characters.

If I have the following entries:

KDHFOUDHGOENWFIJ 1114H4363SDFHDHGFDG GSDLGJSLJSKJDFSG 1114H20SDGDSSFHGSLD SLSJDHLJKSSDJFKD 1114HJSDHFJKSDKFSGG 

I would like to write a statement in SQL that selects 1114H as the most interlinear commmon string. How can i do this?

Notes:

  • The substring should not be in the same place.
  • Substrings must be 5
  • The maximum length of each entry is 50 characters.
+4
source share
2 answers

There is no need to search for the longest substring, so each substring with a length greater than 5 will always have a substring of 5 characters, which is a reference for counting. Therefore, we only need to check substrings of length 5.

The data samples have three rows that occur three times. _1114H , _1114 and 1114H ( _ should show the location of the space )

In this solution, master..spt_values used instead of a table of numbers.

 declare @T table ( ID int identity, Data varchar(50) ) insert into @T values ('KDHFOUDHGOENWFIJ 1114H4363SDFHDHGFDG'), ('GSDLGJSLJSKJDFSG 1114H20SDGDSSFHGSLD'), ('SLSJDHLJKSSDJFKD 1114HJSDHFJKSDKFSGG') select top 1 substring(T.Data, N.Number, 5) as Word from @T as T cross apply (select N.Number from master..spt_values as N where N.type = 'P' and N.number between 1 and len(T.Data)-4) as N group by substring(T.Data, N.Number, 5) order by count(distinct id) desc 

Result:

 Word ------ 1114 
+4
source

This does not answer your question completely, but here is an article from a book on advanced search methods that mentions the custom function "LCS" (the longest common substring), which may be useful:

http://books.google.com/books?id=wGwVkAt79bEC&pg=PA248&lpg=PA248&dq=sql+full+text+common+substring&source=bl&ots=fveHa8an08&sig=VTWHQDTA6gqSNylY9oR0mPhcP6Y&hl=en&ei=iALcTd_AB-j00gG3iZ3lDw&sa=X&oi=book_result&ct=result&resnum=1&ved= 0CBoQ6AEwAA # v = onepage & q & f = false

+2
source

All Articles