Python: how do you store a sparse matrix using python?

I have output using sparse matrix in python, I need to save this sparse matrix on my hard drive, how can I do this? if I have to create a database, then how do I do this? this is my code:

import nltk import cPickle import numpy from scipy.sparse import lil_matrix from nltk.corpus import wordnet as wn from nltk.corpus import brown f = open('spmatrix.pkl','wb') def markov(L): count=0 c=len(text1) for i in range(0,c-2): h=L.index(text1[i]) k=L.index(text1[i+1]) mat[h,k]=mat[h,k]+1//matrix cPickle.dump(mat,f,-1) text = [w for g in brown.categories() for w in brown.words(categories=g)] text1=text[1:500] arr=set(text1) arr=list(arr) mat=lil_matrix((len(arr),len(arr))) markov(arr) f.close() 

I need to save this “mat” in a file and access the matrix value using coordinates.

the result of a sparse matrix is: "the result of a sparse matrix is ​​as follows:

 (173, 168) 2.0 (173, 169) 1.0 (173, 172) 1.0 (173, 237) 4.0 (174, 231) 1.0 (175, 141) 1.0 (176, 195) 1.0 

but when I store it in a file and read the same thing, I get it like this:

 (0, 68) 1.0 (0, 77) 1.0 (0, 95) 1.0 (0, 100) 1.0 (0, 103) 1.0 (0, 110) 1.0 (0, 112) 2.0 (0, 132) 1.0 (0, 133) 2.0 (0, 139) 1.0 (0, 146) 2.0 (0, 156) 1.0 (0, 157) 1.0 (0, 185) 1.0 
+8
python numpy sparse-matrix
source share
7 answers

Note This answer answers the revision question that the code now provides.

You should not call cPickle.dump() in your function. Create a sparse matrix, and then upload its contents to a file.

Try:

 def markov(L): count=0 c=len(text1) for i in range(0,c-2): h=L.index(text1[i]) k=L.index(text1[i+1]) mat[h,k]=mat[h,k]+1 #matrix text = [w for g in brown.categories() for w in brown.words(categories=g)] text1=text[1:500] arr=set(text1) arr=list(arr) mat=lil_matrix((len(arr),len(arr))) markov(arr) f = open('spmatrix.pkl','wb') cPickle.dump(mat,f,-1) f.close() 
+4
source share

Assuming you have a numpy matrix or ndarray , which implies your question and tags, there is a dump and load method that you can use:

 your_matrix.dump('output.mat') another_matrix = numpy.load('output.mat') 
+6
source share

pyTables is the Python interface for the HDF5 data model and is a fairly popular choice and is well integrated with NumPy and SciPy. pyTables allows you to access slices of data arrays without loading the entire array back into memory.

I have no concrete experience with sparse matrices as such, and a quick Google search neither confirmed nor denied that the allowed matrices are supported.

+2
source share

By adding HDF5 support, Python also has NetCDF support , which is ideal for storing matrix form data and quick access, both rare and dense. It is included in Python-x, y for windows that many scientific python users have come across.

You can find more numpy-based examples in this cookbook .

+2
source share

For very large sparse matrices on clusters, you can use pytrilinos, it has an HDF5 interface that can dump a sparse matrix to disk and also works if the matrix is ​​distributed on different nodes.

http://trilinos.sandia.gov/packages/pytrilinos/development/EpetraExt.html#input-output-classes

+2
source share

Depending on the size of the sparse matrix, I usually use cPickle to select an array:

 import cPickle f = open('spmatrix.pkl','wb') cPickle.dump(your_matrix,f,-1) f.close() 

If I deal with really large datasets, then I tend to use netcdf4-python

Edit:

To access the file again, you must:

 f = open('spmatrix.pkl','rb') # open the file in read binary mode # load the data in the .pkl file into a new variable spmat spmat = cPickle.load(f) f.close() 
+2
source share

For me, using the -1 option in the cPickle.dump function made the pickled file not load afterwards.

The object that I dropped through cPickle was an instance of scipy.sparse.dok_matrix .

Using only two arguments did the trick for me; the documentation for pickle.dump() indicates that the default value for protocol parameter is 0 .

Work with Windows 7, Python 2.7.2 (64 bits) and cPickle v 1.71.

Example:

 >>> import cPickle >>> print cPickle.__version__ 1.71 >>> from scipy import sparse >>> H = sparse.dok_matrix((135, 654), dtype='int32') >>> H[33, 44] = 8 >>> H[123, 321] = -99 >>> print str(H) (123, 321) -99 (33, 44) 8 >>> fname = 'dok_matrix.pkl' >>> f = open(fname, mode="wb") >>> cPickle.dump(H, f) >>> f.close() >>> f = open(fname, mode="rb") >>> M = cPickle.load(f) >>> f.close() >>> print str(M) (123, 321) -99 (33, 44) 8 >>> M == H True >>> 
+2
source share

All Articles