SQL Server: Find the most popular product category purchased for each user for use in a subquery

Question

SQL Server: Find the most popular product category purchased for each user for use in a subquery

I have three tables: categories(id, name), products(id, category_id, name)) and purchases(id, user_id, product_id). A productbelongs to a category. Users can purchase a lot products. I intend to find the most popular categoryfor each user.

However, I need to use the result set of the query as a subquery, so the use of any operators ORDER BYis unfortunately disabled due to SQL Server limitations (error The ORDER BY clause is invalid in views, inline functions, derived tables, and subqueries, unless TOP is also specified.).

My approach was to create a list of all purchasesfor each user for category. Then I have a function MAXto select the maximum quantity purchases. I join this result with the original query (duplicated as a subquery) to retrieve category_id, and finally I take the category name.

There are two problems in my request:

Obviously, I would prefer not to use the same query twice in my code. However, I cannot rely on the use of CTE or temporary tables, as the result of this query is supposed to be associated with a view that has a subset of user data, as it is, and the VIEW code is intended for use in the third package package, which can only process basic SQL code .
In the case of a tie (say, the buyer bought 4 products, 2 of 2 categorieseach), I get a duplicate row for this user.

Violin:

http://sqlfiddle.com/#!6/8821b/5

I would appreciate it if someone could help me figure out how to ensure that only one row is returned per user, as well as a way to remove the duplicated subquery.

Thank!

+4

sql sql-server

SchmitzIT Jul 30 '14 at 14:57

source share

1 answer

Jim V. · Accepted Answer · 2014-07-30T15:20:53+0000

First, thanks for providing an example in SQLFiddle. This facilitates the help of ALOT.

You can use row_number for a more accurate way to get the "top" record. In this example, I decided to use category_name as a secondary sorting criterion after counting.

SELECT user_id, category_name, category_count
FROM
(
  SELECT 
      user_id, COUNT(1) as category_count, category_name, 
      ROW_NUMBER() OVER (
          PARTITION BY user_id 
          ORDER BY COUNT(1) DESC, category_name ASC) 
          as ordinal_position
  FROM
      purchases p 
          JOIN products p2 ON p.product_id = p2.id
          JOIN categories c ON p2.category_id = c.id        
  GROUP BY user_id, category_name
 ) a
WHERE ordinal_position = 1
ORDER BY category_count DESC

Example in SQL Fiddle.

SQL Server: Find the most popular product category purchased for each user for use in a subquery

More articles: