Use dplyr to truncate the upper and lower percentiles of a numeric variable

I created a survey. Since outvey-weight-weight can lead to very large deviations, I follow a tip from many statistical books: I want to truncate the top 5% and bottom 5% of the survey weight. I would like to use dplyr for this.

#generate data
data<-as.data.frame(cbind(sequence(2000),rnorm(2000,mean=3.16,sd=1.355686))) 
names(data)<-c("id","weight")

#This is how far i got
data2<-data %>% mutate(perc.weight=percent_rank(weight)) %>%
                mutate(perc.weight>0.95 | perc.weight<0.05)

After that, I have two new variables. The first variable gives the percentage series of weights. The second variable indicates if the value exceeds the target range.

Now I want to replace the weights that are in the 95-100 percentiles and weights within the 0-5 percentile, with the weight values ​​that make up the border of these percentiles.

I would be grateful for any help!

+4
source share
1 answer

quantile togehter pmin, pmax:

data %>% mutate(weight_trunc = pmin(pmax(weight, quantile(weight, .05)), 
                                          quantile(weight, .95)))
+12

All Articles