Memory Error: numpy.genfromtxt ()

I have a file of size 50,000x5,000 (float). when using x = np.genfromtxt(readFrom, dtype=float) to load a file into memory, the following error message appears:

File "C: \ Python27 \ lib \ site-packages \ numpy \ lib \ npyio.py", line 1583, in genfromtxt for (i, converter) in the listing (converters)])
Memoryerror

I want to load the entire file into memory, because I calculate the Euclidean distance between each vector using Scipy. dis = scipy.spatial.distance.euclidean(x[row1], x[row2])

Is there an efficient way to load a huge matrix file into memory.

Thanks.

Update:

I managed to solve the problem. Here is my solution. I'm not sure if it is efficient or logically fixed, but works fine for me:

 x = open(readFrom, 'r').readlines() y = np.asarray([np.array(s.split()).astype('float32') for s in x], dtype=np.float32) .... dis = scipy.spatial.distance.euclidean(y[row1], y[row2]) 

Please help me improve my decision.

+2
source share
2 answers

In fact, you use 8 byte floats, since python float matches C double (at least for most systems):

 a=np.arange(10,dtype=float) print(a.dtype) #np.float64 

You must specify your data type as np.float32 . Depending on your OS and on 32-bit or 64-bit (and whether you use 32-bit python versus 64-bit python), the address space available for using numpy may be less than your 4Gb, which may also be the problem here.

+1
source

Depending on your OS and version of Python, it is likely that you will never be able to allocate a 1 GB array (here is the mgilson argument). The problem is not that you are running out of memory, but that you are running out of continuous memory. If you use a 32-bit machine (especially for Windows), this will not help add more memory. Moving to 64-bit architecture will probably help.

Using smaller data types can certainly help; depending on the operations you use, a 16-bit float or even an 8-bit int may suffice.

If this does not work, you are forced to admit that the data just does not fit into memory. You will have to process it piecewise (in this case, storing data as an HDF5 array can be very useful).

+1
source

All Articles