How can you get the n most common words in a set of many rows returned in a SQL Server query?

I want to return the 10 most common words from a query on a SQL server, so working with a set of strings, such as:

quick drying fox
slow yellow fox
slow green fox

the fox will return
slow
quick
brown
yellow
green

+6
sql-server tsql
source share
3 answers

To see how to do this declaratively (i.e. without a while ), see the answer I worked on (for code golf of all things): Build an ASCII diagram of the most frequently used words in this text

Please note that the code in this link is designed to make the smallest possible number of characters, not read. Please use more descriptive names, at least.

+1
source share

I will try to run the split function (separating each word with a space) for each returned line to get all the individual words in the auxiliary table. Using the following code, you should be able to split the string into your spaces:

 CREATE FUNCTION dbo.Split(@String varchar(8000), @Delimiter char(1)) returns @temptable TABLE (items varchar(8000)) as begin declare @idx int declare @slice varchar(8000) select @idx = 1 if len(@String)<1 or @String is null return while @idx!= 0 begin set @idx = charindex(@Delimiter,@String) if @idx!=0 set @slice = left(@String,@idx - 1) else set @slice = @String if(len(@slice)>0) insert into @temptable(Items) values(@slice) set @String = right(@String,len(@String) - @idx) if len(@String) = 0 break end return end 

You must call this function from the cursor or something else; inside it just use something like:

 insert into #tmp (word) select * from dbo.split(' ', @row) 

Finally, you will need to use a simple query, for example:

 select top 10 count(*) as number, word from separated_words_table order by number 

Source here

+3
source share

Another way. ("Borrowed" from here )

 WITH Sentences AS ( SELECT 'quick brown fox' AS Sentence UNION ALL SELECT 'slow yellow fox' UNION ALL SELECT 'slow green fox' ), Xmlified AS ( SELECT CAST('<M>' + REPLACE(Sentence,' ','</M><M>') + '</M>' AS XML) AS xSentence FROM Sentences ), Words AS ( SELECT Split.a.value('.', 'VARCHAR(100)') AS word FROM Xmlified CROSS APPLY xSentence.nodes('/M') Split(a) ) SELECT COUNT(*) AS C, word FROM Words GROUP BY word ORDER BY C DESC 
+1
source share

All Articles