Why doesn't postgresql start returning rows right away?

The following query immediately returns data:

SELECT time, value from data order by time limit 100; 

Without the limit clause, it takes a long time before the server starts returning strings:

 SELECT time, value from data order by time; 

I observe this using both the query tool ( psql ) and the query using the API.

Questions / problems:

  • The amount of work that the server must perform before starting to return rows should be the same for both select statements. Correctly?
  • If so, why is there a delay in case 2?
  • Is there any fundamental RDBMS problem that I don't understand?
  • Is there a way to get postgresql to start returning result rows to the client without a pause, also for case 2?
  • EDIT (see below) . setFetchSize to be the key to solving this. In my case, I am executing a query from python using SQLAlchemy. How to set this parameter for one request ( session.execute is executed) ? I am using the psycopg2 driver.

The time column is the primary key, BTW.

EDIT:

I believe this excerpt from the JDBC Driver Documentation describes the problem and hints at a solution (I still need help - see the last mark above):

By default, the driver collects all query results at once. This can be inconvenient for large datasets, so the JDBC driver provides a means to determine the ResultSet on the database cursor and only select a small number of rows.

and

Changing the code to cursor mode is as simple as setting the sample size of the Statement to the appropriate size. Setting the sample size to 0 will cache all the rows (default behavior).

 // make sure autocommit is off conn.setAutoCommit(false); Statement st = conn.createStatement(); // Turn use of the cursor on. st.setFetchSize(50); 
+6
python sql postgresql sqlalchemy
source share
2 answers

The psycopg2 dbapi driver buffers the entire query result before returning any rows. You will need to use a server-side cursor to gradually get results. For SQLAlchemy, see server_side_cursors in the docs , and if you use the ORM Query.yield_per () method .

There is currently no way in SQLAlchemy to set this for a single query, but there is a ticket with a patch to implement this .

+4
source share

In theory, since your ORDER BY is the primary key, some results should not be necessary, and the database can indeed return data right away in key order.

I would expect an efficient database to notice this and optimize it. PGSQL doesn't seem to be. * shrug *

You did not notice any effect if you have a LIMIT of 100, because you pull these 100 results out of the database very quickly, and you will not notice any delays if they are first collected and sorted before being sent to your client.

I suggest abandoning ORDER BY. Most likely, your results will be correctly ordered by time in any case (maybe even a standard or specification that matches this, given your PC), and you can get your results faster.

0
source share

All Articles