With `pandas.cut ()`, how do I get whole cells and avoid getting a negative lower bound?

There is a null value in my data frame. I am trying to use the precision and include_lowest pandas.cut() parameters, but I cannot get the intervals consisting of integers and not a single decimal floating point. I also cannot get the leftmost interval to stop at zero.

 import pandas as pd import seaborn as sns import matplotlib.pyplot as plt sns.set(style='white', font_scale=1.3) df = pd.DataFrame(range(0,389,8)[:-1], columns=['value']) df['binned_df_pd'] = pd.cut(df.value, bins=7, precision=0, include_lowest=True) sns.pointplot(x='binned_df_pd', y='value', data=df) plt.xticks(rotation=30, ha='right') 

enter image description here

I tried setting precision to -1, 0, and 1, but they all output a single decimal floating point number. The pandas.cut() manual mentions that the x-min and x-max values ​​are expanded from 0.1% of the x range, but I thought that maybe include_lowest might somehow suppress this behavior. My current solution involves importing numpy:

 import numpy as np bin_counts, edges = np.histogram(df.value, bins=7) edges = [int(x) for x in edges] df['binned_df_np'] = pd.cut(df.value, bins=edges, include_lowest=True) sns.pointplot(x='binned_df_np', y='value', data=df) plt.xticks(rotation=30, ha='right') 

enter image description here

Is there a way to get non-negative integers as interval bounds directly with pandas.cut() without using numpy?

Edit: I just noticed that specifying right=False makes the lowest interval shift at 0, not -0.4. It seems to take precedence over include_lowest , since changing the latter has no visible effect in combination with right=False . The following intervals are still indicated with a single decimal point.

enter image description here

+5
source share
1 answer

you should specifically set the labels argument

preparations:

 lower, higher = df['value'].min(), df['value'].max() n_bins = 7 

create labels:

 edges = range(lower, higher, (higher - lower)/n_bins) # the number of edges is 8 lbs = ['(%d, %d]'%(edges[i], edges[i+1]) for i in range(len(edges)-1)] 

set tags:

 df['binned_df_pd'] = pd.cut(df.value, bins=n_bins, labels=lbs, include_lowest=True) 
0
source

All Articles