Changing variable values ​​by group with dplyr

My question is that I want to change all the missing values ​​to the average value for each group for several columns. I want to use dplyr, but for me it does not work

for instance

iris2 <- iris
set.seed(1)
iris2[-5] <- lapply(iris2[-5], function(x) {
  x[sample(length(x), sample(10, 1))] <- NA
  x
})

impute_missing=function(x){
    x[is.na(x)]=mean(x,na.rm=TRUE)
    return(x)
}

iris2 %>% groupby (Species) %>% sapply(impute_missing)

However, the codes were not attributed to the missing species, but to the average of all the missing values ​​for each column. Another weird subtle function also applies to a Speciesgroup variable. Is there any way to dispute the average of species and store the complete data file /

+4
source share
1 answer

Try:

 library(dplyr)
 iris2New <- iris2 %>% 
                   group_by(Species) %>%
                   mutate_each(funs(mean=mean(., na.rm=TRUE)), contains("."))

 iris2[,-5][is.na(iris2)[,-5]] <- iris2New[,-5][is.na(iris2)[,-5]]

 iris2

Or you can use ifelsein the original datasetiris2

  fun1 <- function(x) ifelse(is.na(x), mean(x, na.rm=TRUE), x)
  iris3 <-  iris2 %>% 
                  group_by(Species) %>% 
                  mutate_each(funs(fun1), contains(".") )

  identical(as.data.frame(iris3), iris2)
  #[1] TRUE

function

 iris4 <-  iris2 %>% 
                 group_by(Species) %>% 
                 mutate_each(funs(ifelse(is.na(.), mean(., na.rm=TRUE), .)), contains(".") )


 identical(iris3,iris4)
 #[1] TRUE
+4

All Articles