How can I normalize the values ​​of a data frame by a sum (get percentages)

I have the following data frame:

> str(df) 'data.frame': 52 obs. of 3 variables: $ n : int 10 20 64 108 128 144 256 320 404 512 ... $ step : Factor w/ 4 levels "Step1","Step2",..: 1 1 1 1 1 1 1 1 1 1 ... $ value: num 0.00178 0.000956 0.001613 0.001998 0.002975 ... 

Now I would like to normalize / split df$value by the sum of the values ​​belonging to the same n ie, so that I can get percentages. This does not work, but shows what I would like to achieve. Here I precommute in dfa the sum of the values ​​belonging to the same n, and try to divide by the original df$value by the total sum of dfa$value with a match of n :

 dfa <- aggregate(x=df$value, by=list(df$n), FUN=sum) names(dfa)[names(dfa)=="Group.1"] <- "n" names(dfa)[names(dfa)=="x"] <- "value" df$value <- df$value / dfa[dfa$n==df$n,][[1]] 
+4
source share
3 answers

I think the following works using the data.table package.

 df <- data.table(df) df[,value2 := value/sum(value),by=n] 
+4
source

I would use ave :

 set.seed(123) df <- data.frame(n=rep(c(2,3,6,8), each=5), value = sample(5:60, 20)) df$value_2 <- ave(df$value, list(df$n), FUN=function(L) L/sum(L)) 
+4
source

The problem with the code you have is the line:

 df$value <- df$value / dfa[dfa$n==df$n,][[1]] 

The line dfa$n==df$n returns a logical vector of length max(length(df),length(dfa) , which tells you about each index if n matches. I don't think you can use this to match dfa$n - df$n .

Using base functions, you can use aggregate and merge :

 dfa <- aggregate(x=df$value, by=list(df$n), FUN=sum) names(dfa) <- c("n","sum.value") df2 <- merge(df,dfa,by="n",all = TRUE) df2$value2 <- df2$value/df2$sum.value 
+1
source

All Articles