How to calculate the variance of a sparse matrix column in Scipy?

Question

How to calculate the variance of a sparse matrix column in Scipy?

I have a big scipy.sparse.csc_matrix and I want to normalize it. This subtracts the average column value from each element and divides by the standard deviation of the column (std) i.

scipy.sparse.csc_matrix has .mean() , but is there an efficient way to calculate variance or std?

+8

python numpy scipy

nickponline Aug 29 '12 at 1:08

source share

2 answers

An effective way is to actually compact the entire matrix, and then standardize it in the usual way with

 X = X.toarray() X -= X.mean() X /= X.std()

As @Sebastian noted in his comments, standardization destroys the sparseness structure (introduces many nonzero elements) at the subtraction stage, so there is no need to keep the matrix in a sparse format.

+3

Fred foo Aug 29 '12 at 12:16

source share

Sicco · Accepted Answer · 2012-08-29T09:31:42+0000

You can calculate the variance yourself, using the average value, with the following formula :

 E[X^2] - (E[X])^2

E[X] means the average value. Therefore, to compute E[X^2] , you will need the csc_matrix square, and then use the mean function. To get (E[X])^2 , you just need to square the result of the mean function, obtained using regular input.

How to calculate the variance of a sparse matrix column in Scipy?

More articles: