Update: the "select in" strategy is now implemented in SQLAlchemy (since version 1.2): see Downloading the IN load in the documentation.
TL; DR:
I think that the joint joinedload should be used where possible, as it is more efficient than other strategies, including the strategy proposed in the question for loading related data using the "IN" operator.
The "IN" strategy can be fairly easily implemented "outside" of SQLAlchemy (see the code below), and probably it should not be difficult to implement it as a new loading strategy (since it is logically similar to the existing subqueryload strategy).
Full version:
I started with a simple experiment to see the queries created by different strategies.
The full source code for the experiment is on Github .
My models look like this:
class Author(ModelBase): __tablename__ = 'authors' id = Column(Integer, primary_key=True, nullable=False) name = Column(String(255)) class Book(ModelBase): __tablename__ = 'books' id = Column(Integer, primary_key=True) name = Column(String) author_id = Column(Integer, ForeignKey('authors.id')) author = relationship( 'Author', backref=backref('books'))
Now tests, lazy loading first:
books = session.query(Book).all() print books[0].author.name session.commit()
Conclusion (cleaned out):
-------------Lazy-------------- sqlalchemy.engine.base.Engine: SELECT books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id FROM books SELECT authors.id AS authors_id, authors.name AS authors_name FROM authors WHERE authors.id = ? INFO:sqlalchemy.engine.base.Engine:(1,) author1
As expected, delayed loading launches one request for fetching books and one request each time we contact the author.
Subquery loading:
books = session.query(Book).options(subqueryload(Book.author)).all() print books[0].author.name session.commit() -------------Subquery---------- SELECT books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id FROM books SELECT authors.id AS authors_id, authors.name AS authors_name, anon_1.books_author_id AS anon_1_books_author_id FROM ( SELECT DISTINCT books.author_id AS books_author_id FROM books) AS anon_1 JOIN authors ON authors.id = anon_1.books_author_id ORDER BY anon_1.books_author_id author1
For a subquery, we have two queries: first a selection of books, and another selection of authors using a subquery.
Attached Download:
books = session.query(Book).options(joinedload(Book.author)).all() print books[0].author.name session.commit() -------------Joined------------ SELECT books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id, authors_1.id AS authors_1_id, authors_1.name AS authors_1_name FROM books LEFT OUTER JOIN authors AS authors_1 ON authors_1.id = books.author_id author1
The combined strategy launches just one request to get both books and authors.
Immediate download:
books = session.query(Book).options(immediateload(Book.author)).all() print books[0].author.name session.commit() -------------Immediate--------- SELECT books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id FROM books SELECT authors.id AS authors_id, authors.name AS authors_name FROM authors WHERE authors.id = ? INFO:sqlalchemy.engine.base.Engine:(1,) SELECT authors.id AS authors_id, authors.name AS authors_name FROM authors WHERE authors.id = ? INFO:sqlalchemy.engine.base.Engine:(2,) author1
And the immediate strategy loads the books with the first query, and then, when we try to access the relation, it selects all the related data with a separate query for each related record.
It seems that "joinload ()" should be the most efficient in most cases (and more efficient than the "IN" strategy) - we just get all the data in one request.
Now let's try to implement the IN strategy outside of SQL alchemy:
print '-------------IN----------------' books = session.query(Book).all() ids = set() for b in books: ids.add(b.author_id) authors = session.query(Author).filter(Author.id.in_(ids)).all() print books[0].author.name print books[1].author.name print books[2].author.name print books[3].author.name
Exit:
-------------IN---------------- SELECT books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id FROM books SELECT authors.id AS authors_id, authors.name AS authors_name FROM authors WHERE authors.id IN (?, ?) INFO:sqlalchemy.engine.base.Engine:(1, 2) author1 author1 author2 author2
As we can see, it launches two requests, and then we can access all the authors.
Please note that we do not explicitly attach authors to books, but this still works when we try to access authors through books, because SQLAlchemy finds author entries in the internal identity map and does not start additional database queries.
The βINβ strategy code, similar to the one above, can be generalized into a function that can be used with any model / relation. And, probably, the βINβ strategy should be relatively easy to implement as a new SQLAlchemy strategy, it is similar to the existing subqueryloading - it also has to execute a second query to get the associated data.