Change the data in R, replacing the long table with a large table

Question

Change the data in R, replacing the long table with a large table

I would like to use the reshape2 package in R to change my long table to a wide table.

I have a dataset from a database that looks like this (example):

id1 | id2 | info | action_time | 1 | a | info1 | time1 | 1 | a | info1 | time2 | 1 | a | info1 | time3 | 2 | b | info2 | time4 | 2 | b | info2 | time5 |

And now I want it to be like this:

 id1 | id2 | info |action_time 1|action_time 2|action_time 3| 1 | a | info1 | time1 | time2 | time3 | 2 | b | info2 | time4 | time5 | |

I tried several times and looked for several examples on some website using reshape() or dcast() , but could not find such an example. The number of action_time for each identifier is different, and for some identifiers they can have more than 10 action_time , so in this case the converted dataset will contain more than 10 columns of action_time .

Can anyone come up with a convenient way to do this? If there is a way to do this in excel (Pivot Table?), That would be awesome. Thank you heaps

0

r pivot-table reshape2

Lambo Jun 26 '15 at 2:37

source share

2 answers

Using tidyr

 require(tidyr) # replicate data df <- structure(list(id1 = c(1, 1, 1, 2, 2), id2 = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c(" a ", " b "), class = "factor"), info = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c(" info1 ", " info2 "), class = "factor"), action_time = structure(1:5, .Label = c(" time1 ", " time2 ", " time3 ", " time4 ", " time5 " ), class = "factor")), .Names = c("id1", "id2", "info", "action_time" ), class = "data.frame", row.names = c(NA, -5L)) # create additional column on action_time sequence action_no <- paste("action_time", unlist(sapply(rle(df$id1)$lengths, function(x) seq(1, x)))) y <- cbind(df, action_no) # spread into final dataframe z <- spread(y, action_no, action_time)

Final conclusion

 > z id1 id2 info action_time 1 action_time 2 action_time 3 1 1 a info1 time1 time2 time3 2 2 b info2 time4 time5 <NA>

+1

Ricky Jun 26 '15 at 3:05

source share

Steven beaupré · Accepted Answer · 2015-06-26T03:30:15+0000

Try:

 library(dplyr) library(tidyr) df %>% group_by(id1) %>% mutate(action_no = paste("action_time", row_number())) %>% spread(action_no, action_time)

What gives:

 #Source: local data frame [2 x 6] # # id1 id2 info action_time 1 action_time 2 action_time 3 #1 1 a info1 time1 time2 time3 #2 2 b info2 time4 time5 NA

Data

 df <- structure(list(id1 = c(1, 1, 1, 2, 2), id2 = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("a", "b"), class = "factor"), info = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("info1", "info2"), class = "factor"), action_time = structure(1:5, .Label = c("time1", "time2", "time3", "time4", "time5"), class = "factor")), .Names = c("id1", "id2", "info", "action_time"), class = "data.frame", row.names = c(NA, -5L))

Change the data in R, replacing the long table with a large table

More articles: