Pandas combining a list based on qcut of another list

say I have a list:

a = [3, 5, 1, 1, 3, 2, 4, 1, 6, 4, 8] 

and optional list a:

 b = [5, 2, 6, 8] 

I would like to get cells using pd.qcut(a,2) and count the number of values ​​in each box for list b. it

 In[84]: pd.qcut(a,2) Out[84]: Categorical: [[1, 3], (3, 8], [1, 3], [1, 3], [1, 3], [1, 3], (3, 8], [1, 3], (3, 8], (3, 8], (3, 8]] Levels (2): Index(['[1, 3]', '(3, 8]'], dtype=object) 

Now I know that the boxes are [1,3] and (3,8), and I would like to know how many values ​​are in each box for list "b". I can do this manually when the number of pins is small, but what is the best approach when the number of bins is large?

+6
source share
2 answers

You can use retbins paramether to return a bit from qcut:

 >>> q, bins = pd.qcut(a, 2, retbins=True) 

Then use pd.cut to get the b indices relative to the beans:

 >>> b = np.array(b) >>> hist = pd.cut(b, bins, right=True).labels >>> hist[b==bins[0]] = 0 >>> hist array([1, 0, 1, 1]) 

Please note that you need to process the corner case bins[0] separately, as it is not included by cut in the left tray.

+4
source

As shown in an earlier answer: you can get the bin borders from qcut using the retbins parameter, as shown below:

 q, bins = pd.qcut(a, 2, retbins=True) 

You can then use cut to put values ​​from another list into these β€œbins”. For instance:

 myList = np.random.random(100) # Define bin bounds that cover the range returned by random() bins = [0, .1, .9, 1] # Now we can get the "bin number" of each value in myList: binNum = pd.cut(myList, bins, labels=False, include_lowest=True) # And then we can count the number of values in each bin number: np.bincount(binNum) 

Make sure that the borders of your bin cover the entire range of values ​​displayed in the second list. To ensure this, you can overlay the boundaries of your bin with a maximum and minimum value. For instance.

 cutBins = [float('-inf')] + bins.tolist() + [float('inf')] 
0
source

All Articles