The following query immediately returns data:
SELECT time, value from data order by time limit 100;
Without the limit clause, it takes a long time before the server starts returning strings:
SELECT time, value from data order by time;
I observe this using both the query tool ( psql ) and the query using the API.
Questions / problems:
- The amount of work that the server must perform before starting to return rows should be the same for both select statements. Correctly?
- If so, why is there a delay in case 2?
- Is there any fundamental RDBMS problem that I don't understand?
- Is there a way to get postgresql to start returning result rows to the client without a pause, also for case 2?
- EDIT (see below) .
setFetchSize to be the key to solving this. In my case, I am executing a query from python using SQLAlchemy. How to set this parameter for one request ( session.execute is executed) ? I am using the psycopg2 driver.
The time column is the primary key, BTW.
EDIT:
I believe this excerpt from the JDBC Driver Documentation describes the problem and hints at a solution (I still need help - see the last mark above):
By default, the driver collects all query results at once. This can be inconvenient for large datasets, so the JDBC driver provides a means to determine the ResultSet on the database cursor and only select a small number of rows.
and
Changing the code to cursor mode is as simple as setting the sample size of the Statement to the appropriate size. Setting the sample size to 0 will cache all the rows (default behavior).
// make sure autocommit is off conn.setAutoCommit(false); Statement st = conn.createStatement(); // Turn use of the cursor on. st.setFetchSize(50);
python sql postgresql sqlalchemy
codeape
source share