I am developing a part of a system where processes are limited to approximately 350 MB of RAM; we use cx_Oracle to download files from an external system for processing.
The external system stores the files as a BLOB, and we can capture them by doing something like this:
# ... set up Oracle connection, then cursor.execute(u"""SELECT filename, data, filesize FROM FILEDATA WHERE ID = :id""", id=the_one_you_wanted) filename, lob, filesize = cursor.fetchone() with open(filename, "w") as the_file: the_file.write(lob.read())
lob.read() will obviously fail with a MemoryError when we push a file larger than 300-350 MB, so we tried something like this instead of reading all at once:
read_size = 0 chunk_size = lob.getchunksize() * 100 while read_size < filesize: data = lob.read(chunk_size, read_size + 1) read_size += len(data) the_file.write(data)
Unfortunately, we still get a MemoryError after several iterations. From the point in time lob.read() , and due to the lack of memory that we finally get, it looks like lob.read() is extracting (chunk_size + read_size) bytes from the database each time . That is, reading takes O (n) time and O (n) memory, although the buffer is slightly smaller.
To get around this, we tried something like:
read_size = 0 while read_size < filesize: q = u'''SELECT dbms_lob.substr(data, 2000, %s) FROM FILEDATA WHERE ID = :id''' % (read_bytes + 1) cursor.execute(q, id=filedataid[0]) row = cursor.fetchone() read_bytes += len(row[0]) the_file.write(row[0])
This takes 2,000 bytes (argh) at a time and takes forever (something like two hours for a 1.5 GB file). Why 2000 bytes? According to Oracle dbms_lob.substr() , dbms_lob.substr() stores the return value in RAW, which is limited to 2000 bytes.
Is there a way to save the results of dbms_lob.substr() in a larger data object and read maybe a few megabytes at a time? How to do this with cx_Oracle?