I would like to add columns to the data.table based on the row in another column. This is my data and the approach I tried:
Params
1: {clientID: 459; time: 1386868908703; version: 6}
2: {clientID: 459; id: 52a9ea8b534b2b0b5000575f; time: 1386868824339; user: 459001}
3: {clientID: 988; time: 1388939739771}
4: {clientID: 459; id: 52a9ec00b73cbf0b210057e9; time: 1386868810519; user: 459001}
5: {clientID: 459; time: 1388090530634}
Code to create this table:
DT = data.table(Params=c("{ clientID : 459; time : 1386868908703; version : 6}","{ clientID : 459; id : 52a9ea8b534b2b0b5000575f; time : 1386868824339; user : 459001}","{ clientID : 988; time : 1388939739771}","{ clientID : 459; id : 52a9ec00b73cbf0b210057e9; time : 1386868810519; user : 459001}","{ clientID : 459; time : 1388090530634}"))
I would like to analyze the text in the "Params" column and create new columns based on the text in it. For example, I would like to have a new column named "user" that contains only the number after "user:" in the Params row. The added column should look like this:
Params user
1: {clientID: 459; time: 1386868908703; version: 6} NA
2: {clientID: 459; id: 52a9ea8b534b2b0b5000575f; time: 1386868824339; user: 459001} 459001
3: {clientID: 988; time: 1388939739771} NA
4: {clientID: 459; id: 52a9ec00b73cbf0b210057e9; time: 1386868810519; user: 459001} 459001
5: {clientID: 459; time: 1388090530634} 459001
I created the following function for parsing (in this case for the user):
myparse <- function(searchterm, s) {
s <-gsub("{","",s, fixed = TRUE)
s <-gsub(" ","",s, fixed = TRUE)
s <-gsub("}","",s, fixed = TRUE)
s <-strsplit(s, '[;:]')
s <-unlist(s)
if (length(s[which(s==searchterm)])>0) {s[which(s==searchterm)+1]} else {NA}
}
Then I use the following function to add a column:
DT <- transform(DT, user = myparse("user", Params))
This works in the case of "time", which is included in all lines, but does not work in the case of "user", which is included in only two lines. The following error is returned:
Error in data.table(list(Params = c("{ clientID : 459; time : 1386868908703; version : 6}", :
argument 2 (nrow 2) cannot be recycled without remainder to match longest nrow (5)
How can I answer that? Thanks!