R-table: how to sum, not count?

Suppose I have data in table R that looks like this:

Id Name Price sales Profit Month Category Mode 1 A 2 5 8 1 XK 1 A 2 6 9 2 XK 1 A 2 5 8 3 XK 1 B 2 4 6 1 YL 1 B 2 3 4 2 YL 1 B 2 5 7 3 YL 2 C 2 5 11 1 XM 2 C 2 5 11 2 XL 2 C 2 5 11 3 XK 2 D 2 8 10 1 YM 2 D 2 8 10 2 YK 2 D 2 5 7 3 YK 3 E 2 5 9 1 YM 3 E 2 5 9 2 YL 3 E 2 5 9 3 YM 3 F 2 4 7 1 ZM 3 F 2 5 8 2 ZL 3 F 2 5 8 3 ZM 

If I use the table function for this data, for example:

 table(df$Category, df$Mode) 

He will show me in each mode which category has how many observations. This is similar to counting the number of items in each category in each mode.

But what if I want the table to be displayed under each Category that Mode earned, how much Profit (amount or average), and not the total?

Is there a way to do this using the table function or another function in R?

+7
r aggregate
source share
3 answers

We can use xtabs from base R By default, xtabs gets sum

 xtabs(Profit~Category+Mode, df) # Mode #Category KLM # X 36 11 11 # Y 17 26 28 # Z 0 8 15 

Or another base R option, more flexible for applying different FUN , is tapply .

 with(df, tapply(Profit, list(Category, Mode), FUN=sum)) # KLM #X 36 11 11 #Y 17 26 28 #Z NA 8 15 

Or we can use dcast to convert from 'long' to 'wide' format. It is more flexible as we can specify fun.aggregate - sum , mean , median , etc.

 library(reshape2) dcast(df, Category~Mode, value.var='Profit', sum) # Category KLM #1 X 36 11 11 #2 Y 17 26 28 #3 Z 0 8 15 

If you need this in a "long" format, here is one option with data.table . We convert "data.frame" to "data.table" ( setDT(df) ), grouped by "Category" and "Mode", we get the sum "Profit".

 library(data.table) setDT(df)[, list(Profit= sum(Profit)) , by = .(Category, Mode)] 
+9
source share

Another possibility is to use the aggregate() function:

 profit_dat <- aggregate(Profit ~ Category + Mode, data=df, sum) #> profit_dat # Category Mode Profit #1 XK 36 #2 YK 17 #3 XL 11 #4 YL 26 #5 ZL 8 #6 XM 11 #7 YM 28 #8 ZM 15 
+4
source share

I prefer to use dplyr (and ggplot2) for most data analysis:

 library(dplyr) group_by(df, Category, Mode) %>% summarise(sum = sum, count=n()) 

https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

+2
source share

All Articles