Merge rows with same values ​​in multiple columns

I have an excel file with several rows and columns (13232 rows and 18 columns). The last column gives some value. What I want to do is find the rows with the same column data except the last, and sum their last column value. For example: if the input

+---------+---------+---------+---------+ | Column1 | Column2 | Column3 | Column4 | +---------+---------+---------+---------+ | ABC | DEF | GHI | 5 | | XYZ | PQR | LMN | 4 | | ABC | DEF | GHI | 11 | | Test1 | Test2 | Test3 | 12 | | XYZ | PQR | LMN | 54 | +---------+---------+---------+---------+ 

then the conclusion should be

 +---------+---------+---------+---------+ | Column1 | Column2 | Column3 | Column4 | +---------+---------+---------+---------+ | ABC | DEF | GHI | 16 | | XYZ | PQR | LMN | 58 | | Test1 | Test2 | Test3 | 12 | +---------+---------+---------+---------+ 

How can this be achieved in R?

+5
source share
1 answer

You can use aggregate from base R

  aggregate(Column4~., df1, FUN=sum) # Column1 Column2 Column3 Column4 #1 ABC DEF GHI 16 #2 XYZ PQR LMN 58 #3 Test1 Test2 Test3 12 

or

  library(data.table) setDT(df1)[, list(Column4=sum(Column4)), by = c(names(df1)[1:3])] # Column1 Column2 Column3 Column4 #1: ABC DEF GHI 16 #2: XYZ PQR LMN 58 #3: Test1 Test2 Test3 12 

or

  library(sqldf) sqldf('select Column1, Column2, Column3, sum(Column4) as Column4 from df1 group by Column1, Column2, Column3') # Column1 Column2 Column3 Column4 #1 ABC DEF GHI 16 #2 Test1 Test2 Test3 12 #3 XYZ PQR LMN 58 

or

 library(dplyr) df1 %>% group_by(Column1, Column2, Column3) %>% summarize(Column4 = sum(Column4)) # Source: local data frame [3 x 4] # Groups: Column1, Column2 # Column1 Column2 Column3 Column4 # 1 ABC DEF GHI 16 # 2 Test1 Test2 Test3 12 # 3 XYZ PQR LMN 58 

Playable data:

 df1 <- structure(list(Column1 = structure(c(1L, 3L, 1L, 2L, 3L), .Label = c("ABC", "Test1", "XYZ"), class = "factor"), Column2 = structure(c(1L, 2L, 1L, 3L, 2L), .Label = c("DEF", "PQR", "Test2"), class = "factor"), Column3 = structure(c(1L, 2L, 1L, 3L, 2L), .Label = c("GHI", "LMN", "Test3"), class = "factor"), Column4 = c(5L, 4L, 11L, 12L, 54L)), .Names = c("Column1", "Column2", "Column3", "Column4" ), class = "data.frame", row.names = c(NA, -5L)) 
+6
source

All Articles