Weight box in Pandas

For the next data frame (df),

ColA ColA_weights ColB ColB_weights 0 0.038671 1073 1.859599 1 1 20.39974 57362 10.59599 1 2 10.29974 5857 2.859599 1 3 5.040000 1288 33.39599 1 4 1.040000 1064 7.859599 1 

I want to draw a weighted boxplot where the weights for each field are set by ColA_weights and ColB_weights respectively, I just do

 df.boxplot(fontsize=12,notch=0,whis=1.5,vert=1,widths=0.2) 

However, it seems that there are no conditions for including the scales. Any solutions?

thanks!

+8
python pandas boxplot
source share
1 answer

As stated in the comments, here is a way to list with each entry shown as many times as the weight indicates. I think this is not the smartest solution, and someone can come up with the best one.

My example applies only to column A, but you can use it in the same way in column B:

 import matplotlib.pyplot as plt weighted_appearances = [] for index, row in df.iterrows(): weighted_row = [row.ColA]*row.ColA_weights weighted_appearances += weighted_row plt.boxplot(weighted_appearances) plt.show() 

Pros: a very simple writing solution, theoretically working in all cases (if your weights are not integers, you will have to convert / bypass them in the way you think is acceptable)

Cons: not very effective, if you work with really large weights, you will need to find a way to "collapse" those that will have a reasonable use of memory.

+1
source share

All Articles