Loading sparse Matlab matrix saved with -v7.3 (HDF5) in Python and working on it

I am new to python based on Matlab. I have a large sparse matrix saved in Matlab v7.3 (HDF5) format. I have so far found two ways to upload to a file using h5pyand tables. However, work on the matrix seems to be very slow after. For example, in matlab:

>> whos     
  Name           Size                   Bytes  Class     Attributes

  M      11337x133338            77124408  double    sparse    

>> tic, sum(M(:)); toc
Elapsed time is 0.086233 seconds.

Using tables:

t = time.time()
sum(f.root.M.data)
elapsed = time.time() - t
print elapsed
35.929461956

Using h5py:

t = time.time()
sum(f["M"]["data"])
elapsed = time.time() - t
print elapsed

(I gave up, waiting ...)

[EDIT]

Based on comments from @bpgergo, I should add that I tried to convert the result loaded with h5py( f) into an array numpyor sparse array scipyin the following two ways:

from scipy import sparse
A = sparse.csc_matrix((f["M"]["data"], f["M"]["ir"], f["tfidf"]["jc"]))

or

data = numpy.asarray(f["M"]["data"])
ir = numpy.asarray(f["M"]["ir"])
jc = numpy.asarray(f["M"]["jc"])    
    A = sparse.coo_matrix(data, (ir, jc))

but both of these operations are also very slow.

Am I missing something here?

+5
3

, python sum (.. , ).

-, , , , . , , Matlab.

-, python sum numpy. (, , numpy , python builtin sum). numpy.sum(yourarray) yourarray.sum() numpy.

:

( h5py, .)

import h5py
import numpy as np

f = h5py.File('yourfile.hdf', 'r')
dataset = f['/M/data']

# Load the entire array into memory, like you're doing for matlab...
data = np.empty(dataset.shape, dataset.dtype)
dataset.read_direct(data)

print data.sum() #Or alternately, "np.sum(data)"
+3

:

import tables, warnings
from scipy import sparse

def load_sparse_matrix(fname) :
    warnings.simplefilter("ignore", UserWarning) 
    f = tables.openFile(fname)
    M = sparse.csc_matrix( (f.root.M.data[...], f.root.M.ir[...], f.root.M.jc[...]) )
    f.close()
    return M
+1

All Articles