Can I force SQLAlchemy to load a subquery without repeating the full original query?

Question

Can I force SQLAlchemy to load a subquery without repeating the full original query?

Suppose we have a source generated query:

SELECT company.x AS company_x, ... FROM company LEFT OUTER JOIN acc ON acc.id = company.acc LEFT OUTER JOIN usercomp_links ON company.id = usercomp_links.pid LEFT OUTER JOIN usergro_links ON acc.id = usergro_links.pid WHERE usergro_links.eid = %s OR usercomp_links.eid = %s

And if we add .options(subqueryload(Company.childs)) to this, we get:

 SELECT company.x AS company_x, ..., anon_1.company_id AS anon_1_company_id FROM ( SELECT company.id AS company_id FROM company LEFT OUTER JOIN acc ON acc.id = company.acc LEFT OUTER JOIN usercomp_links ON company.id = usercomp_links.pid LEFT OUTER JOIN usergro_links ON acc.id = usergro_links.pid WHERE usergro_links.eid = %s OR usercomp_links.eid = %s) AS anon_1 INNER JOIN acel_links AS acel_links_1 ON anon_1.company_id = acel_links_1.eid INNER JOIN company ON company.id = acel_links_1.pid ORDER BY anon_1.company_id

And this is stupid. If I get the company identifiers from the first request and load all the subsidiaries with my hands, it will be incredibly fast compared to what we get in this case.

I read the documentation, looked at the code, but I can’t see if I can say that sqlalchemy just gets the identifiers from the results of the first query and loads the children in a separate, relatively simple query. I do not rely on this sample - I had more difficult situations when sqlalchemy simply could not load the constructed query. And why do all this from the first request again?

So, does anyone know how to load a download without automatically creating a "join from join in join" style?

+9

python orm sqlalchemy eager-loading

Mihail krivushin Nov 02 '14 at 20:33

source share

4 answers

Update: the "select in" strategy is now implemented in SQLAlchemy (since version 1.2): see Downloading the IN load in the documentation.

TL; DR:

I think that the joint joinedload should be used where possible, as it is more efficient than other strategies, including the strategy proposed in the question for loading related data using the "IN" operator.

The "IN" strategy can be fairly easily implemented "outside" of SQLAlchemy (see the code below), and probably it should not be difficult to implement it as a new loading strategy (since it is logically similar to the existing subqueryload strategy).

Full version:

I started with a simple experiment to see the queries created by different strategies.

The full source code for the experiment is on Github .

My models look like this:

 class Author(ModelBase): __tablename__ = 'authors' id = Column(Integer, primary_key=True, nullable=False) name = Column(String(255)) class Book(ModelBase): __tablename__ = 'books' id = Column(Integer, primary_key=True) name = Column(String) author_id = Column(Integer, ForeignKey('authors.id')) author = relationship( 'Author', backref=backref('books'))

Now tests, lazy loading first:

 books = session.query(Book).all() print books[0].author.name session.commit()

Conclusion (cleaned out):

 -------------Lazy-------------- sqlalchemy.engine.base.Engine: SELECT books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id FROM books SELECT authors.id AS authors_id, authors.name AS authors_name FROM authors WHERE authors.id = ? INFO:sqlalchemy.engine.base.Engine:(1,) author1

As expected, delayed loading launches one request for fetching books and one request each time we contact the author.

Subquery loading:

 books = session.query(Book).options(subqueryload(Book.author)).all() print books[0].author.name session.commit() -------------Subquery---------- SELECT books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id FROM books SELECT authors.id AS authors_id, authors.name AS authors_name, anon_1.books_author_id AS anon_1_books_author_id FROM ( SELECT DISTINCT books.author_id AS books_author_id FROM books) AS anon_1 JOIN authors ON authors.id = anon_1.books_author_id ORDER BY anon_1.books_author_id author1

For a subquery, we have two queries: first a selection of books, and another selection of authors using a subquery.

Attached Download:

 books = session.query(Book).options(joinedload(Book.author)).all() print books[0].author.name session.commit() -------------Joined------------ SELECT books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id, authors_1.id AS authors_1_id, authors_1.name AS authors_1_name FROM books LEFT OUTER JOIN authors AS authors_1 ON authors_1.id = books.author_id author1

The combined strategy launches just one request to get both books and authors.

Immediate download:

 books = session.query(Book).options(immediateload(Book.author)).all() print books[0].author.name session.commit() -------------Immediate--------- SELECT books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id FROM books SELECT authors.id AS authors_id, authors.name AS authors_name FROM authors WHERE authors.id = ? INFO:sqlalchemy.engine.base.Engine:(1,) SELECT authors.id AS authors_id, authors.name AS authors_name FROM authors WHERE authors.id = ? INFO:sqlalchemy.engine.base.Engine:(2,) author1

And the immediate strategy loads the books with the first query, and then, when we try to access the relation, it selects all the related data with a separate query for each related record.

It seems that "joinload ()" should be the most efficient in most cases (and more efficient than the "IN" strategy) - we just get all the data in one request.

Now let's try to implement the IN strategy outside of SQL alchemy:

 print '-------------IN----------------' books = session.query(Book).all() ids = set() for b in books: ids.add(b.author_id) authors = session.query(Author).filter(Author.id.in_(ids)).all() print books[0].author.name print books[1].author.name print books[2].author.name print books[3].author.name

Exit:

 -------------IN---------------- SELECT books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id FROM books SELECT authors.id AS authors_id, authors.name AS authors_name FROM authors WHERE authors.id IN (?, ?) INFO:sqlalchemy.engine.base.Engine:(1, 2) author1 author1 author2 author2

As we can see, it launches two requests, and then we can access all the authors.

Please note that we do not explicitly attach authors to books, but this still works when we try to access authors through books, because SQLAlchemy finds author entries in the internal identity map and does not start additional database queries.

The “IN” strategy code, similar to the one above, can be generalized into a function that can be used with any model / relation. And, probably, the “IN” strategy should be relatively easy to implement as a new SQLAlchemy strategy, it is similar to the existing subqueryloading - it also has to execute a second query to get the associated data.

+5

Boris Serebrov Aug 22 '16 at 7:43

source share

You can either work with the abstract ORM layer, in which case you model the childern attribute as a relation to the ORM relationship , something like:

 from sqlalchemy.orm import relationship children = relationship("<name of the acl_links class>", lazy="joined")

Using lazy="joined" leads to the desired load as requested (this is equivalent to the combined load suggested by @vsminkov already) from the documentation:

The default loader strategy for any relation () is configured using the lazy keyword argument ... Below we set it as attached so that the child relations are loaded using JOIN

There are many settings that you can apply when defining relationships, so check out the documentation to get the most out of it.

Or you can work with the request API and make the request as you wish, for example, execute a simple second request, for example:

 list_of_ids_previously_loaded_companies = <information from your previous query> the_session.query(<name of the acl_links class>).filter(<name of the acl_links class>.eid.in_(list_of_ids_previously_loaded_companies)

You go even lower and use an expression language , something like:

 q = select([acl_links]).where(acl_links.c.eid.in_(list_of_ids_previously_loaded_companies)) the_session.execute(q).fetchall()

As a final solution, you can make completely raw sql :

 from sqlalchemy import text children_results = a_db_connection.execute(text(<SQL STATEMENT STRING>).fetchall()

Choose which one is best for your needs. Note that you are still responsible for the correctness of your schema and the correct placement of index and foreign keys to optimize performance.

0

creativeChips Aug 19 '16 at 10:15

source share

I made a message to the SQLAlchemy mailing list about this: https://groups.google.com/d/msg/sqlalchemy/8-kHuliJpr8/PHUZLLtMEQAJ

Loading "in", which Boris Serebrov spoke about, apparently works one at a time. It will still run queries (if you are not loading) if you are accessing the one-to-many relationship.

I ended up with this solution: https://gist.github.com/pawl/df5ba8923d9929dd1f4fc4e683eced40

0

pawl Mar 17 '17 at 3:25

source share

Mihail krivushin · Accepted Answer · 2018-05-03T16:26:58+0000

http://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html#sqlalchemy.orm.selectinload

It has been added to sqlalchemy, so now you can just use the selectinload strategy.

Can I force SQLAlchemy to load a subquery without repeating the full original query?

More articles: