Convert scipy sparse csr to pandas?

I used

sklearn.preprocessing.OneHotEncoder 

to convert some data, the output signal scipy.sparse.csr.csr_matrix how can I combine it back into my original framework along with other columns?

I tried to use pd.concat , but I get

 TypeError: cannot concatenate a non-NDFrame object 

thanks

+6
source share
2 answers

If A is csr_matrix , you can use .toarray() (there is also .todense() , which creates a numpy matrix , which also works for the DataFrame constructor):

 df = pd.DataFrame(A.toarray()) 

Then you can use this with pd.concat() .

 A = csr_matrix([[1, 0, 2], [0, 3, 0]]) (0, 0) 1 (0, 2) 2 (1, 1) 3 <class 'scipy.sparse.csr.csr_matrix'> pd.DataFrame(A.todense()) 0 1 2 0 1 0 2 1 0 3 0 <class 'pandas.core.frame.DataFrame'> RangeIndex: 2 entries, 0 to 1 Data columns (total 3 columns): 0 2 non-null int64 1 2 non-null int64 2 2 non-null int64 

pandas version 0.20 introduced sparse data structures , including SparseDataFrame .

Alternatively, you can pass sparse sklearn matrices to avoid sklearn out of memory when accessing pandas . Just convert your other data to a sparse format by passing the numpy array to the scipy.sparse.csr_matrix constructor and use scipy.sparse.hstack to combine (see docs ).

+8
source

You can also avoid returning a sparse matrix in the first place by setting the sparse parameter to False when creating an Encoder.

The OneHotEncoder documentation says:

sparse: boolean, default = True

Will return a sparse matrix, if set to True else will return an array.

Then you can call the DataFrame constructor again to convert the numpy array to a DataFrame.

+1
source

All Articles