SQLAlchemy Many-to-Many Performance

Question

SQLAlchemy Many-to-Many Performance

I have a relationship with the Many-To-Many association database, but the association table itself contains many attributes that need to be accessed, so I made three classes:

class User(Base): id = Column(Integer, primary_key=True) attempts = relationship("UserAttempt", backref="user", lazy="subquery") class Challenge(Base): id = Column(Integer, primary_key=True) attempts = relationship("UserAttempt", backref="challenge", lazy='subquery') class UserAttempt(Base): challenge_id = Column(Integer, ForeignKey('challenge.id'), primary_key=True) user_id = Column(Integer, ForeignKey('user.id'), primary_key=True)

This, of course, is a simplified case when I left the other attributes that I need for access. The goal here is that each User can try to execute any number of Challenge s, and therefore, the UserAttempt table, which describes one specific user working with one call.

Now the problem is: when I request for all users and then look at every attempt, I am fine. But when I look at the challenge of this attempt, it explodes in numerous subqueries. Of course, this is bad for performance.

What I really want from SQLAlchemy is to pull out all (or all relevant) calls at once, and then associate them with the corresponding attempts. It does not really matter if all the problems are pulled out or only done, which have the actual association later, since this number of problems is only between 100-500.

My solution right now is not very elegant: I try to parse all the relevant attempts, problems and users, and then connect them manually: Perform all the attempts and assign to add the call and user, then add the call and user to the attempt. This seems to me a cruel decision that should not be necessary.

However, each approach (for example, changing “lazy” parameters, changed requests, etc.) leads to requests from hundreds to thousands. I also tried to write simple SQL queries that would bring my desired results and came up with something along the lines of SELECT * FROM challenge WHERE id IN (SELECT challenge_id FROM attempts) , and it worked fine, but I can't translate it to SQLAlchemy

Thank you in advance for any recommendations you can offer.

+4

python sql sqlalchemy

javex Mar 05 '13 at 2:49

source share

1 answer

zzzeek · Accepted Answer · 2013-03-05T03:58:40+0000

What I really want from SQLAlchemy is to pull out all (or all relevant) calls at once, and then associate them with the corresponding attempts. It doesn’t really matter if all the problems are pulled out or just made, which have an actual connection later,

First you want to remove this "lazy = 'subquery" directive from the relation () first; fixing relationships to always download everything why you get an explosion of requests. In particular, here you get this task → trying to load for each lazyload UserAttempt-> Challenge, so you can make the worst possible load combination here :).

With this in mind, there are two approaches.

It should be borne in mind that the one-to-one association is usually retrieved from a session in memory first with a primary key, and if present, SQL is not emitted. Therefore, I think that you could get exactly the effect that seems to describe you, using a commonly used technique:

 all_challenges = session.query(Challenge).all() for user in some_users: # however you got these for attempt in user.attempts: # however you got these do_something_with(attempt.challenge) # no SQL will be emitted

If you want to use the above approach using "Select * from call, where id (select challenge_id from attempt)":

 all_challenges = session.query(Challenge).\ filter(Challenge.id.in_(session.query(UserAttempt.challenge_id))).all()

although this is most likely more efficient than JOIN:

 all_challenges = session.query(Challenge).\ join(Challenge.attempts).all()

or DISTINCT, I think the union will return the same call. and how it appears in UserAttempt:

 all_challenges = session.query(Challenge).distinct().\ join(Challenge.attempts).all()

Another way is to download more efficiently. you can query a bunch of users / attempts / problems in a single query that emits three SELECT statements:

 users = session.query(User).\ options(subqueryload_all(User.attempts, UserAttempt.challenge)).all()

or because UserAttempt-> Challenge is ambiguous, the connection might be better:

 users = session.query(User).\ options(subqueryload(User.attempts), joinedload(UserAttempt.challenge)).all()

only from UserAttempt:

 attempts = session.query(UserAttempt).\ options(joinedload(UserAttempt.challenge)).all()

SQLAlchemy Many-to-Many Performance

More articles: