Rails 3 SQL Query User Mapping Algorithm (COMPLICATED)

I am currently working on an application that matches users based on answers to questions. I implemented my algorithm in regular RoR and ActiveRecord queries, but he does not want to use it. To match one user with 100 other users, you need

Completed 200 OK in 17741ms (Views: 106.1ms | ActiveRecord: 1078.6ms) 

on my local machine. But anyway ... Now I want to understand this in raw SQL in order to get better performance. But I'm really having trouble finding SQL queries in SQL queries, etc. Similar calculations are a plus. My head is about to explode, and I don’t even know where to start.

Here is my algorithm:

 def match(user) @a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100 @b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100 if self.common_questions(user) == [] 0.to_f else match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count) if match <= 0 0.to_f else match end end end def possible_score(user) i = 0 self.user_questions.select("question_id, importance").find_each do |n| if user.user_questions.select(:id).find_by_question_id(n.question_id) i += Importance.find_by_id(n.importance).value end end return i end def actual_score(user) i = 0 self.user_questions.select("question_id, importance").includes(:accepted_answers).find_each do |n| @user_answer = user.user_questions.select("answer_id").find_by_question_id(n.question_id) unless @user_answer == nil if n.accepted_answers.select(:answer_id).find_by_answer_id(@user_answer.answer_id) i += Importance.find_by_id(n.importance).value end end end return i end 

So, basically the user answers the questions, chooses which answers he accepts, and how important this question is for him. Then the algorithm checks what questions two users have, if user1 answered user2, if so, then user2 is added for each question, which is the result of user1. Also vice versa for the user 2. Divided by the possible score gives a percentage, and both percentages applied to the average average give me one common percentage percentage for both users. Pretty complicated, I know. Tell me if I did not explain it well enough. I just hope I can express this in raw SQL. Performance is all about it.

Here are my database tables:

 CREATE TABLE "users" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "username" varchar(255) DEFAULT '' NOT NULL); (left some unimportant stuff out, it all there in the databse dump i uploaded) CREATE TABLE "user_questions" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_id" integer, "question_id" integer, "answer_id" integer(255), "importance" integer, "explanation" text, "private" boolean DEFAULT 'f', "created_at" datetime); CREATE TABLE "accepted_answers" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "user_question_id" integer, "answer_id" integer); 

I think the top of the SQL query should look something like this:

 SELECT u1.id AS user1, u2.id AS user2, COALESCE(SQRT( (100.0*actual_score/possible_score) * (100.0*actual_score/possible_score) ), 0) AS match FROM 

But since I'm not a SQL master, I can only do the usual things that my head is about to explode. Hope someone helps me figure this out. Or at least improve your performance somehow! Thank you very much!

EDIT:

So, based on the answer of the wizard, I managed to get a good SQL statement for "possible_score"

 SELECT SUM(value) AS sum_id FROM user_questions AS uq1 INNER JOIN importances ON importances.id = uq1.importance INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101 WHERE uq1.user_id = 1 

I tried to get "actual_score", but that didn't work. My database manager crashed when I executed this.

 SELECT SUM(imp.value) AS sum_id FROM user_questions AS uq1 INNER JOIN importances imp ON imp.id = uq1.importance INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101 INNER JOIN accepted_answers as ON as.user_question_id = uq1.id AND as.answer_id = uq2.answer_id WHERE uq1.user_id = 1 

EDIT2

Ok, I'm an idiot! I can't use the β€œlike” as an alias, of course. Changed it to aa and it worked! W00t!

+1
source share
2 answers

I know that you were thinking about switching to an SQL solution, but there are some important performance improvements that can be made to your Ruby code, which may eliminate the need for manual SQL code. When optimizing your code, it is often worth using a profiler to make sure you really know which parts are the problem. In your example, I think some great improvements can be made by removing iterative code and database queries that are executed at each iteration!

In addition, if you use the latest version of ActiveRecord, you can generate queries with subqueries without the need for SQL encoding. Of course, it is important that you have the correct indexes created for your database.

I make a lot of assumptions about your models and relationships based on what I can do from your code. If I am wrong, let me know and I will try to make the appropriate adjustments.

 def match(user) if self.common_questions(user) == [] 0.to_f else # Move a_score and b_score calculation inside this conditional branch since it is otherwise not needed. @a_score = (self.actual_score(user).to_f / self.possible_score(user).to_f) * 100 @b_score = (user.actual_score(self).to_f / user.possible_score(self).to_f) * 100 match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count) if match <= 0 0.to_f else match end end end def possible_score(user) # If user_questions.importance contains ID values of importances, then you should set up a relation between UserQuestion and Importance. # Ie UserQuestion belongs_to :importance, and Importance has_many :user_questions. # I'm assuming that user_questions represents join models between users and questions. # Ie User has_many :user_questions, and User has_many :questions, :through => :user_questions. # Question has_many :user_questions, and Question has_many :users, :through => :user_questions # From your code this seems like the logical setup. Let me know if my assumption is wrong. self.user_questions. joins(:importance). # Requires the relation between UserQuestion and Importance I described above where(:question_id => Question.joins(:user_questions).where(:user_id => user.id)). # This should create a where clause with a subselect with recent versions of ActiveRecord sum(:value) # I'm also assuming that the importances table has a `value` column. end def actual_score(user) user_questions. joins(:importance, :accepted_answers). # It looks like accepted_answers indicates an answers table where(:answer_id => Answer.joins(:user_questions).where(:user_id => user.id)). sum(:value) end 

UserQuestion seems to be a model of superconnections between user, question, answer and value. Here are model relationships related to code (not including has_many: through relationships you could create). I think you probably already did this:

 # User has_many :user_questions # UserQuestion belongs_to :user belongs_to :question belongs_to :importance, :foreign_key => :importance # Maybe rename the column `importance` to `importance_id` belongs_to :answer # Question has_many :user_questions # Importance has_many :user_questions # Answer has_many :user_questions 
+1
source

So here is my new match function. I still could not put everything in one query, because SQLite does not support mathematical functions. But as soon as I switch to MySQL, I put everything in one query. All of this has already given me a HUGE performance boost:

 Completed 200 OK in 528ms (Views: 116.5ms | ActiveRecord: 214.0ms) 

to match one user with 100 other users. Pretty good! I should see how good it is when I populate my database with fake 10k users. And additional perks to β€œThe Wizard of Ogz” for indicating my inefficient code!

EDIT:

tried it with only 1000 users, from 10 to 100 UserQuestions each and ...

 Completed 200 OK in 104871ms (Views: 2146.0ms | ActiveRecord: 93780.5ms) 

... the boy did it for a long time! I will have to think about something to solve this problem.

 def match(user) if self.common_questions(user) == [] 0.to_f else @a_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match FROM (SELECT SUM(imp.value) AS actual_score FROM user_questions AS uq1 INNER JOIN importances imp ON imp.id = uq1.importance INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ? INNER JOIN accepted_answers aa ON aa.user_question_id = uq1.id AND aa.answer_id = uq2.answer_id WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score FROM user_questions AS uq1 INNER JOIN importances ON importances.id = uq1.importance INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ? WHERE uq1.user_id = ?) AS ps1",user.id, self.id, user.id, self.id]).collect(&:match).first.to_f @b_score = UserQuestion.find_by_sql(["SELECT 100.0*as1.actual_score/ps1.possible_score AS match FROM (SELECT SUM(imp.value) AS actual_score FROM user_questions AS uq1 INNER JOIN importances imp ON imp.id = uq1.importance INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = ? INNER JOIN accepted_answers aa ON aa.user_question_id = uq1.id AND aa.answer_id = uq2.answer_id WHERE uq1.user_id = ?) AS as1, (SELECT SUM(value) AS possible_score FROM user_questions AS uq1 INNER JOIN importances ON importances.id = uq1.importance INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = ? WHERE uq1.user_id = ?) AS ps1",self.id, user.id, self.id, user.id]).collect(&:match).first.to_f match = Math.sqrt(@a_score * @b_score) - (100 / self.common_questions(user).count) if match <= 0 0.to_f else match end end end 
0
source

All Articles