Since you do not know the internal operation of the brine, you need to use a different storage method. The script below uses the tobytes() functions to save a string of data in a raw file.
Since the length of each line is known, its offset in the file can be calculated and obtained through seek() and read() . After that, it is converted back to an array using the frombuffer() function.
However, a big disclaimer is that the size of the array is not saved (this can be added, but some more complications are required) and that this method may not be as portable as a pickled array.
As @PadraicCunningham pointed out in comment , memmap is likely to become an alternative and elegant solution.
Performance Note:. After reading the comments, I did a short test. On my machine (16 GB of RAM, encrypted by SSD), I was able to execute 40,000 random lines in 24 seconds (with a matrix of 20,000x40000, of course, not the 10x10 from the example).
from __future__ import print_function import numpy import random def dumparray(a, path): lines, _ = a.shape with open(path, 'wb') as fd: for i in range(lines): fd.write(a[i,...].tobytes()) class RandomLineAccess(object): def __init__(self, path, cols, dtype): self.dtype = dtype self.fd = open(path, 'rb') self.line_length = cols*dtype.itemsize def read_line(self, line): offset = line*self.line_length self.fd.seek(offset) data = self.fd.read(self.line_length) return numpy.frombuffer(data, self.dtype) def close(self): self.fd.close() def main(): lines = 10 cols = 10 path = '/tmp/array' a = numpy.zeros((lines, cols)) dtype = a.dtype for i in range(lines):
source share