Algorithm recommendations - calculation of corresponding storages based on data of their category

Question

Algorithm recommendations - calculation of corresponding storages based on data of their category

I have a stores and categories model. A store can have many categories.

I am trying to create a list of related stores for each store.

I would like to calculate a score based on # general categories that the repository shares with another.

I have a plan but don’t know how to start coding in Ruby on Rails.

Any tips?

PS. I think it would be better to have a separate table for storing this calculated data for each store - since the execution of this real-time will be intensive in the database.

UPDATE I just noticed a lack of MAJOR in my logic for this. Just a few department stores, such as Amazon, will dominate related stores for all sellers (since they belong to almost all categories and thus will correspond to each category for niche stores). How to avoid this problem?

0

algorithm ruby ruby-on-rails ruby-on-rails-3 ruby-on-rails-3.1

Jacob Dec 10 '11 at 4:49

source share

2 answers

Assuming you have models:

 class Store < ActiveRecord:Base has_many :categories_stores has_many :categories, :throught => :categories_stores end class CategoriesStore < ActiveRecord::Base belongs_to :category belongs_to :store end class Category < ActiveRecord::Base has_many :categories_stores has_many :categories, :throught => :categories_stores end

The basic algorithm in words will be: 1. Find the categories (ids) that have the selected Store. 2. Find stores that have one of the categories from step 1. 3. Count the categories for each store found from the list of categories 1.

All of this can be done in several ways in SQL. For instance:

 SELECT s3.store_id, COUNT(s3.category_id) FROM categories_stores s1, categories_stores s2, categories_stores s3 WHERE s1.store_id = :id and s2.category_id = s1.category_id and s3.store_id = s2.store_id and s3.category_id = s1.category_id GROUP BY s3.store_id

Where: id - parameter for the request. Some request parses can be performed with pure ruby, some can not.

+1

Mark huk Dec 10 '11 at 8:09

source share

Jim mischel · Accepted Answer · 2011-12-10T15:39:59+0000

Your "BIG flaw" is not unusual. As you say, the Amazon will be “connected” with everything. This is a fairly common problem with any recommendation system that tries to use such a relationship. I did not do this with store categories, but the problem is very similar to the video selection / ranking system that I built.

The usual way to help prevent the dominance of popular material is, instead of using the number of suitable categories, it gives weights for each store. The total weights are 1/category_count or 1/sqrt(category_count) .

Introduce three stores:

 Jim Books - 2 categories: ["Books", "Music"] Amazon - 10 categories: ["Books", "Music", "Movies", "Housewares", etc.] Ralph Remainders - 3 categories: ["Books", "Music", "Movies"]

Now, if you are looking for stores similar to Jim Books, you match the categories. Obviously, both Amazon and Ralph include the Books and Music categories, and if you use only the number of matching categories, both will have the same rating.

But if you use a weight coefficient, then their estimates are very different. With a weight factor of 1/category_count :

 Amazon - 10 categories, weighting factor = 1/10. Ralph - 3 categories, weighting factor = 1/3.

So, Amazon will get a similarity rating of 0.20, and Ralph will get a similarity rating of 0.66.

If the weight coefficient is 1/sqrt(category_count) , then:

 Amazon - weighting factor = 1/sqrt(10) = 0.316 Ralph - weighting factor = 1/sqrt(3) = 0.562

In this case, the Amazon score is around 0.632 and the Ralph score is 1.244.

I found that 1/sqrt(category_count) is generally better because it weakens the overwhelming influence of high-level stores (i.e. those that have many categories), but not so much that these stores do not fall into the results. 1/category_count gives too much emphasis on stores that have only one or two categories.

Algorithm recommendations - calculation of corresponding storages based on data of their category

More articles: