Psycopg2 uses memory for a large select request

I use psycopg2 to query the Postgresql database and try to process all the rows from a table with approximately 380M rows. There are only 3 columns (id1, id2, count) of the whole integer type. However, when I run the simple select query below, the Python process starts consuming more and more memory until it is killed by the OS.

A minimal working example (assuming mydatabase exists and contains a table called mytable):

import psycopg2 conn = psycopg2.connect("dbname=mydatabase") cur = conn.cursor() cur.execute("SELECT * FROM mytable;") 

At this point, the program begins to consume memory.

I looked and the Postgresql process is behaving well. It uses a fair processor bit, which is good and a very limited amount of memory.

I expected psycopg2 to return an iterator without trying to unload all the results from the selection. Then I could use cur.fetchone() to process all the lines.

So, how can I select 380M from the row table without using the available memory?

+14
python postgresql psycopg2
source share
3 answers

You can use server cursors .

 cur = conn.cursor('cursor-name') # server side cursor cur.itersize = 10000 # how much records to buffer on a client cur.execute("SELECT * FROM mytable;") 
+19
source share

Another way to use cursors on the server side:

 with psycopg2.connect(database_connection_string) as conn: with conn.cursor(name='name_of_cursor') as cursor: cursor.itersize = 20000 query = "SELECT * FROM ..." cursor.execute(query) for row in cursor: # process row 

Psycopg2 will retrieve the itersize strings itersize client at a time. As soon as the for loop runs out of this batch, he will choose the next one.

+4
source share

You can use the OFFSET option in Postgresql with custom LIMIT to get a small number of datasets. More information in the section: https://www.postgresql.org/docs/8.1/queries-limit.html

For example:

 offset = 0 while True: data = cursor.execute("SELECT * FROM mytable LIMIT 100 OFFSET {}".format(offset)) if not data: break for row in data: # process data offset += 100 
0
source share

All Articles