How to load huge Oracle LOB using cx_Oracle on a memory-limited system?

I am developing a part of a system where processes are limited to approximately 350 MB of RAM; we use cx_Oracle to download files from an external system for processing.

The external system stores the files as a BLOB, and we can capture them by doing something like this:

# ... set up Oracle connection, then cursor.execute(u"""SELECT filename, data, filesize FROM FILEDATA WHERE ID = :id""", id=the_one_you_wanted) filename, lob, filesize = cursor.fetchone() with open(filename, "w") as the_file: the_file.write(lob.read()) 

lob.read() will obviously fail with a MemoryError when we push a file larger than 300-350 MB, so we tried something like this instead of reading all at once:

 read_size = 0 chunk_size = lob.getchunksize() * 100 while read_size < filesize: data = lob.read(chunk_size, read_size + 1) read_size += len(data) the_file.write(data) 

Unfortunately, we still get a MemoryError after several iterations. From the point in time lob.read() , and due to the lack of memory that we finally get, it looks like lob.read() is extracting (chunk_size + read_size) bytes from the database each time . That is, reading takes O (n) time and O (n) memory, although the buffer is slightly smaller.

To get around this, we tried something like:

 read_size = 0 while read_size < filesize: q = u'''SELECT dbms_lob.substr(data, 2000, %s) FROM FILEDATA WHERE ID = :id''' % (read_bytes + 1) cursor.execute(q, id=filedataid[0]) row = cursor.fetchone() read_bytes += len(row[0]) the_file.write(row[0]) 

This takes 2,000 bytes (argh) at a time and takes forever (something like two hours for a 1.5 GB file). Why 2000 bytes? According to Oracle dbms_lob.substr() , dbms_lob.substr() stores the return value in RAW, which is limited to 2000 bytes.

Is there a way to save the results of dbms_lob.substr() in a larger data object and read maybe a few megabytes at a time? How to do this with cx_Oracle?

+6
source share
1 answer

I think the order of the arguments in lob.read () is canceled in your code. The first argument should be the offset, the second argument should be the amount to read. This explains the use of O (n) time and memory.

+1
source

Source: https://habr.com/ru/post/926672/


All Articles