From the psycopg2 documentation:
When a database query is executed, the Psycopg cursor usually retrieves all the records returned by the backend, passing them to the client. If the request returned a huge amount of data, a proportionally large amount of memory will be allocated by the client. If the dataset is too large to be practically processed on the client side, you can create a cursor on the server side.
I would like to query a table with possible thousands of rows and perform some actions for each of them. Will regular cursors actually display the entire dataset on the client? That doesn't sound very reasonable. The code is something like:
conn = psycopg2.connect(url) cursor = conn.cursor() cursor.execute(sql) for row in cursor: do some stuff cursor.close()
I expect this to be a streaming operation. And the second question concerns the volume of cursors. Inside my loop, I would like to update another table. Do I need to open a new cursor every time and close? Each item update should be in its own transaction, as I may need to roll back.
for row in cursor: anotherCursor = anotherConn.cursor() anotherCursor.execute(update) if somecondition: anotherConn.commit() else: anotherConn.rollback cursor.close()
======== EDIT: MY ANSWER TO FIRST PART ========
Well, I will try to answer the first part of my question. Regular cursors actually bring the entire data set as soon as you call execute before starting iterating over the result set. You can verify this by checking the amount of process memory at each step. But the need for a server-side cursor is actually related to the postgres server, not the client, and is documented here: http://www.postgresql.org/docs/9.3/static/sql-declare.html
Now this is not immediately visible from the documentation, but such cursors can be temporarily created for the duration of the transaction. There is no need to explicitly create a function that returns a refcursor in the database, a specific SLQ statement, etc. With psycopg2, you only need to specify a name when you receive the cursor, and a temporary cursor will be created for this transaction. Therefore, instead of:
cursor = conn.cursor()
you just need to:
cursor = conn.cursor('mycursor')
What does he and he work. I suppose the same thing is done under the covers when using JDBC when setting up fetchSize. It is a bit more transparent. See Docs here: https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
You can verify that this works by querying the pg_cursors view inside a single transaction. The server-side cursor appears after receiving the client-side cursor and disappears after the client-side cursor is closed. So the bottom line: I'm glad to make this change in my code, but I have to say that it was a big problem for someone who had no experience with postgres.