NumPy: how to filter matrix lines

I am new to numpy and cannot try to filter out a subset of the sample.

I have a matrix with the form (1000, 12) . That is, thousands of samples, with 12 columns of data each. I am ready to create two matrices: one with all outliers in the sample, and the other with all elements that are not outliers; The resulting matrices should have the following form:

 norm.shape = (883, 12) outliers.shape = (117, 12) 

To define outlier, I use this condition:

 cond_out = (dados[0:,RD_EVAL] > _max_rd) | (dados[0:,DUT_EVAL] > _max_dut) 

That is, for each row in the matrix, I am looking for the values ​​of two columns. If one of them is above a certain threshold, then the line is considered an outlier. The fact is that this condition has the form (1000,) , so when I compress the original matrix, I get the result (117,) . How can I filter the matrix so that the result is (117,12) , i.e. A matrix with all the lines that are outliers, but with all the data columns in each of them?

+4
source share
2 answers
 import numpy as np d=np.random.randn(4,4) array([[ 1.16968447, -0.07650322, -0.30519481, -2.09278839], [ 0.53350868, -0.8004209 , 0.38477468, 1.31876924], [ 0.06461366, 0.82204993, 0.42034665, 0.30473843], [ 1.13469745, -1.47969242, 2.36338208, -0.33700972]]) 

Allows you to filter all rows that are less than zero in the second column:

 d[:,1]<0 array([ True, True, False, True], dtype=bool) 

You see, you get a logical array that you can use to select the necessary rows:

 d[d[:,1]<0,:] array([[ 1.16968447, -0.07650322, -0.30519481, -2.09278839], [ 0.53350868, -0.8004209 , 0.38477468, 1.31876924], [ 1.13469745, -1.47969242, 2.36338208, -0.33700972]]) 
+11
source

Maybe something like this will work?

 >>> import numpy >>> m = numpy.random.random(size=(1000,12)) >>> RD_EVAL = 7 >>> _max_rd = 0.9 >>> DUT_EVAL = 11 >>> _max_dut = 0.95 >>> cond_out = (m[:,RD_EVAL] > _max_rd) | (m[:,DUT_EVAL] > _max_dut) >>> cond_out.shape (1000,) >>> >>> norm = m[~cond_out, :] >>> outliers = m[cond_out,:] >>> >>> norm.shape (846, 12) >>> outliers.shape (154, 12) 

See advanced indexing docs.

+3
source

All Articles