I have a data frame that has 5 variables and 800 rows:
head(df)
V1 variable value element OtolithNum
1 24.9835 V7 130230.0 Mg 25
2 24.9835 V8 145844.0 Mg 25
3 24.9835 V9 126126.0 Mg 25
4 24.9835 V10 103152.0 Mg 25
5 24.9835 V11 129571.9 Mg 25
6 24.9835 V12 114214.0 Mg 25
I need to do the following:
- identify all values (from the variable "value") that> 2 standard deviations from the median are grouped by the variable of the element .
- remove outliers from the data frame (or create a new framework with outliers disabled).
I use the dplyr package and used the following code to group by the variable "element" and provide averages:
df1=df %>%
group_by(element) %>%
summarise_each(funs(mean), value)
Could you help me manipulate or add to the code above to remove the outliers (defined above as> 2 sd from the median) grouped by the variable "element" before I extract the funds.
( ), :
#standardize each column (we use it in the outdet function)
scale(dat)
#create function that looks for values > +/- 2 sd from mean
outdet <- function(x) abs(scale(x)) >= 2
#index with the function to remove those values
dat[!apply(sapply(dat, outdet), 1, any), ]