Finding the most common words in a column using sqlite?

I have data that looks like this:

movie_id comment 1 tom cruise is great 1 great action movie 2 got teary eyed 2 great cast 1 tom cruise is hott 

I need a function that returns the most common words in the comments, depending on which movie you choose. Therefore, if I request movie_id = 1, I would get:

  tom, 2 cruise, 2 is, 2 great, 2 hott, 1 action, 1 movie, 1 

While I request movie_id = 2, I get:

  got, 1 teary, 1 eyed, 1 great, 1 cast, 1 

I saw some solutions using tsql, but I have never used this before and did not understand the code. Looking for a way to do this in sqlite3.

0
source share
1 answer

You can do this with a really ugly request.

 select word, count(*) from ( select (case when instr(substr(m.comments, nums.n+1), ' ') then substr(m.comments, nums.n+1) else substr(m.comments, nums.n+1, instr(substr(m.comments, nums.n+1), ' ') - 1) end) as word from (select ' '||comments as comments from m )m cross join (select 1 as n union all select 2 union all select 3 ) nums where substr(m.comments, nums.n, 1) = ' ' and substr(m.comments, nums.n, 1) <> ' ' ) w group by word order by count(*) desc 

This is not verified. An internal query needs a list of numbers (limited to only 3 here, you can see how to add more). He then checks to see if the word begins at position n + 1. The word begins after a space, so I put a space at the beginning of the comments.

He then draws out the word for aggregation purposes.

+2
source

All Articles