A NumPy memory mapping data structure ( memmap ) may be a good choice here.
You get access to NumPy arrays from a binary file on disk, without immediately loading the entire file into memory.
(Note, I believe, but I'm not sure if the Numpys memmap object does not match Pythons - in particular, NumPys is like an array, Python is like a file.)
Method Signature:
A = NP.memmap(filename, dtype, mode, shape, order='C')
All arguments are simple (that is, they have the same meaning as in other places in NumPy), with the exception of "order", which refers to the order of the ndarray memory layout. I assume that the default is โCโ, and (only) another option is โFโ, for Fortran - as elsewhere, these two options represent the order of rows and columns, respectively.
Two methods:
flush (which writes to disk any changes you make to the array); and
close (which writes data to the memmap array or, more precisely, to an array-like memory card for data stored on disk)
usage example:
import numpy as NP from tempfile import mkdtemp import os.path as PH my_data = NP.random.randint(10, 100, 10000).reshape(1000, 10) my_data = NP.array(my_data, dtype="float") fname = PH.join(mkdtemp(), 'tempfile.dat') mm_obj = NP.memmap(fname, dtype="float32", mode="w+", shape=1000, 10)