How to ignore a NaN data point in a numpy array and generate normalized data in Python?

Question

How to ignore a NaN data point in a numpy array and generate normalized data in Python?

Say I have a numpy array that has some float ('nan'), I don’t want to enter this data now, and I want to normalize it first and save the NaN data in the source space, is there any way I can do this?

I used to use the normalize function in sklearn.Preprocessing , but this function cannot seem to accept any array containing NaN as input.

+5

python numpy scipy scikit-learn

xxx222 Jun 10 '16 at 13:46

source share

2 answers

You can use numpy.nansum to calculate the norm and ignore nan:

 In [54]: x Out[54]: array([ 1., 2., nan, 3.])

Here the norm with nan ignored:

 In [55]: np.sqrt(np.nansum(np.square(x))) Out[55]: 3.7416573867739413

y is a normalized array:

 In [56]: y = x / np.sqrt(np.nansum(np.square(x))) In [57]: y Out[57]: array([ 0.26726124, 0.53452248, nan, 0.80178373]) In [58]: np.linalg.norm(y[~np.isnan(y)]) Out[58]: 1.0

+2

Warren weckesser Jun 10 '16 at 14:14

source share

Chiel · Accepted Answer · 2016-06-10T13:52:03+0000

You can mask your array using the numpy.ma.array function and then apply any numpy operation:

 import numpy as np a = np.random.rand(10) # Generate random data. a = np.where(a > 0.8, np.nan, a) # Set all data larger than 0.8 to NaN a = np.ma.array(a, mask=np.isnan(a)) # Use a mask to mark the NaNs a_norm = a / np.sum(a) # The sum function ignores the masked values. a_norm2 = a / np.std(a) # The std function ignores the masked values.

You can access your source data:

 print a.data

How to ignore a NaN data point in a numpy array and generate normalized data in Python?

More articles: