How to use preloaded collection in recursive methods

I have the following association:

class Action < ActiveRecord::Base # self referential association has_many :action_parents has_many :parents, through: :action_parents has_many :action_children, class_name: 'ActionParent', foreign_key: 'parent_id' has_many :children, through: :action_children, source: :action โ€ฆ def should_finish should_start + duration end def should_start # my_start is a field in db: if there are no parents (root) it will use this field return my_start if parents.empty? parents.map(&:should_finish).sort.last end end 

My problem is that should_finish and should_start call each other, and even if I preload the parent records, they lead to a lot of queries:

 Action.includes(:parents).last.should_finish # a new query every time it checks for parents 

Any idea on how to cache actions and parents ?

EDIT - let me give you some context:

 # actions table: actions_parents table: # id | duration task_id | parent_id # 1 | 5 2 | 1 # 2 | 10 3 | 1 # 3 | 20 4 | 2 # 4 | 15 4 | 3 # # |--------------| # | action 2 | # |---------- >| duration: 10 | # | |--------------| # | | # |--------------| |--------->|--------------| # | action 1 | | action 4 | # | duration: 5 | | duration: 15 | # |--------------| |--------->|--------------| # | | # | |--------------| # |----------->| action 3 | # | duration: 20 | # |--------------| 

PS: no circular dependencies.

Assuming I have a tree my_start field some day at 10:00:00 :

 # action | should_start | should_finish # ------------------------------------- # 1 | 10:00:00* | 10:00:05 # 2 | 10:00:05 | 10:00:15 # 3 | 10:00:05 | 10:00:25 # 4 | 10:00:25** | 10:00:40 # # * value from db since there is no parent # ** should_finish of parent with latest should_finish (action 3) 

I thought he could preload all the actions using Action.includes(:parents)

+7
ruby ruby-on-rails activerecord ruby-on-rails-4
source share
3 answers

I'll throw wild before I know the specifics

Assuming there are no noticeable loops in the parent structure, you cannot help yourself by caching something other than caching the entire table, because every time you click on the parents, you will hit different parents for each instance of the action and without a caching strategy , including rails alone, saves you from having to move the entire dataset to the cache.

Thing is what you are trying to do is really hard to do with a relational database and, in fact, is the reason why graphical databases were created (see What are graph databases and when to use the graph database and Neo4j on Heroku )

The best you can do before going to the graph database or caching the entire action table is to optimize your queries (use pluck ) and possibly rewrite them into a PLSQL function.

Plan B is to help your knowledge of your data,

  • change values โ€‹โ€‹in should_start , duration and should_finish ? Does this change a lot?
  • Is this critical real-time data? (i.e. itโ€™s normal to get a little outdated information from time to time)
  • makes data structuring more readable or writeable?
  • leading to the question: does it make sense to create database fields of the Action model so that you do not have to go through each time you search?
    • i.e. you read operations much more than records and
    • you can recalculate the calculated fields in the background task
  • Do you often refer to should_start and should_finish for a small time window?
  • how good you are with Neo4j : D
  • ....

EDIT 1

The only solution that I see at the moment is to not capture the problem. Try the following:

store parent structure identifiers in a string / text field, e.g.

  • action 4 would have [1,2,3] ,
  • actions 2 and 3 would have [1] and
  • Action 1 would have []

then when you map the ancestor_ids array to the hash id => action

 def ancestry_hash @ancestry_hash ||= Hash[ Action.includes(:action_parents).where(id: ancestor_ids).map{|action| [action.id, action]} ] end 

and then re-execute the recursive request to move this hash, not the activerecord tree, or you will call additional requests. Something like:

 def should_finish(id = self.id) should_start(id) + ancestry_hash[id].duration end def should_start(id = self.id) # my_start is a field in db: if there are no parents (root) it will use this field action = ancestry_hash[id] return my_start if action.action_parents.empty? action.action_parents.pluck(:parent_id).map{ |parent_id| should_finish(parent_id) }.sort.last end 

I have not tested the code, but I hope you understand that it should be close enough to this

+1
source share

Problem:

In a nutshell, you have 2 questions. First, you are trying to preload more than you need (?!?), And the other is that Rails does not want to load what you really need, due to the recursive nature of the logic.

To clarify in more detail, consider the following:

 my_action.parents.map(&:parents).flatten.map(&:parents) 

The rails will be:

  • first take all parents for this action.
  • then collapse each of these parents and take your parents.
  • then smooth these โ€œgrandparentsโ€ into an array, move on to each of them and bring them to your parents.

Please note that in this case there is not much point in loading the first level parents, since you are just starting with an Action instance, not a collection. The .parents call will display all first-level parents for this action in one go (which would require a download).

So what happens when you start with a collection (ActiveRelation) instead of an instance?

 Action.some_scope.includes(:parents).map(&:parents) 

In this case, the parents of ALL actions included in the scope will be loaded. Calling .map(&:parents) will NOT trigger further SQL calls, and thatโ€™s the whole point of loading with includes() . However, there are two things that hit the whole purpose of this - and you do both of them: /

First, your starting point is not really a set of actions, since you immediately call .last . Thus, the choice of all parents for ALL actions is meaningless - you only need the "last"! Because of this, Rails is smart enough to scale down and will only load parents from the "last" action. However, in this case there was not much benefit from impatient loading, since calling .parents would lead to the same request at a later time. (Although there is little benefit to downloading in advance if subsequent operations should be faster, which is of limited utility in this case). Thus, with or without the .includes statement, you would execute a single request to retrieve the parents for the last action.

More importantly, you recursively call .parents for each of these parents, and Rails was completely unaware that you were going to do this. Moreover, recursive calls are not preliminary in nature (without knowing a few tricks), so there is no way to tell ActiveRecord or use 'vanilla' SQL in this regard to go through the chain and find out which parents are not required until then as it is already done that (by making a moot point). All this leads to a nightmare of the situation N + 1, how you worry.

Some solutions:

There are several ways to mitigate or eliminate the N + 1 problem - in order of complexity of implementation:

  • Pre-fetch to N parental levels (assuming you know what max (N) is)

Action.last.parents.includes(parents: {parents: :parents}) # grabs up to 4 levels

  • Skip SQL completely, load all the actions into the hash of the Action arrays associated with the child_id, and use non-ActiveRecord methods to aggregate what you need using simple Ruby. This will deteriorate rapidly as your data grows, but it can be good enough for you - at least for now.

  • Use a diagram that allows you to predefine the ancestor tree and use this utility to help with this calculation. The example provided by @bbozo is one way to do this - you can also explore gems such as ancestors, act_as_tree, awesome_nested_set, clos_tree and possibly others to help you with this.

  • Use a database-specific function that actually performs recursive calculation in one call. PostgreSQL, Oracle, and MS-SQL have this capability, while MySQL and SQLite do not. This will probably give you better performance, but can be difficult to write using only the ActiveRecord query interface.

+1
source share

Have you tried to remember this?

In the model

 def should_start return my_start if parents.empty? @should_start ||= parents.map(&:should_finish).sort.last end 
0
source share

All Articles