There are four different variations, ordered from the slowest to the fastest. timeit result below:
from sqlalchemy.sql import func from sqlalchemy.orm import load_only def simple_random(): return random.choice(model_name.query.all()) def load_only_random(): return random.choice(model_name.query.options(load_only('id')).all()) def order_by_random(): return model_name.query.order_by(func.random()).first() def optimized_random(): return model_name.query.options(load_only('id')).offset( func.floor( func.random() * db.session.query(func.count(model_name.id)) ) ).limit(1).all()
timeit results for 10,000 starts on my Macbook versus a 300-row PostgreSQL table:
simple_random(): 90.09954111799925 load_only_random(): 65.94714171699889 order_by_random(): 23.17819356000109 optimized_random(): 19.87806927999918
You can easily see that using func.random() much faster than returning all the results in Python random.choice() .
In addition, as the size of the table increases, the performance of order_by_random() will deteriorate significantly, since the ORDER BY parameter requires a full table scan compared to COUNT in optimized_random() , you can use the index.
Jeff Widman Nov 07 '15 at 12:55 2015-11-07 12:55
source share