How to add a column to a data table in R, which is based on a row in another column?

I would like to add columns to the data.table based on the row in another column. This is my data and the approach I tried:

                                                                                     Params
1: {clientID: 459; time: 1386868908703; version: 6}
2: {clientID: 459; id: 52a9ea8b534b2b0b5000575f; time: 1386868824339; user: 459001}
3: {clientID: 988; time: 1388939739771}
4: {clientID: 459; id: 52a9ec00b73cbf0b210057e9; time: 1386868810519; user: 459001}
5: {clientID: 459; time: 1388090530634}

Code to create this table:

DT = data.table(Params=c("{ clientID : 459;  time : 1386868908703;  version : 6}","{ clientID : 459;  id : 52a9ea8b534b2b0b5000575f;  time : 1386868824339;  user : 459001}","{ clientID : 988;  time : 1388939739771}","{ clientID : 459;  id : 52a9ec00b73cbf0b210057e9;  time : 1386868810519;  user : 459001}","{ clientID : 459;  time : 1388090530634}"))

I would like to analyze the text in the "Params" column and create new columns based on the text in it. For example, I would like to have a new column named "user" that contains only the number after "user:" in the Params row. The added column should look like this:

                                                                                     Params user
1: {clientID: 459; time: 1386868908703; version: 6} NA
2: {clientID: 459; id: 52a9ea8b534b2b0b5000575f; time: 1386868824339; user: 459001} 459001
3: {clientID: 988; time: 1388939739771} NA
4: {clientID: 459; id: 52a9ec00b73cbf0b210057e9; time: 1386868810519; user: 459001} 459001
5: {clientID: 459; time: 1388090530634} 459001

I created the following function for parsing (in this case for the user):

myparse <- function(searchterm, s) {
  s <-gsub("{","",s, fixed = TRUE)
  s <-gsub(" ","",s, fixed = TRUE)
  s <-gsub("}","",s, fixed = TRUE)
  s <-strsplit(s, '[;:]')
  s <-unlist(s)
  if (length(s[which(s==searchterm)])>0) {s[which(s==searchterm)+1]} else {NA}
}

Then I use the following function to add a column:

DT <- transform(DT, user = myparse("user", Params))

This works in the case of "time", which is included in all lines, but does not work in the case of "user", which is included in only two lines. The following error is returned:

Error in data.table(list(Params = c("{ clientID : 459;  time : 1386868908703;  version : 6}",  : 
  argument 2 (nrow 2) cannot be recycled without remainder to match longest nrow (5)

How can I answer that? Thanks!

+4
source share
2 answers

Here you can use regular expressions for this task:

myparse <- function(searchterm, s) {
  res <- rep(NA_character_, length(s)) # NA vector
  idx <- grepl(searchterm, s) # index for strings including the search term
  pattern <- paste0(".*", searchterm, " : ([^;}]+)[;}].*") # regex pattern
  res[idx] <- sub(pattern, "\\1", s[idx]) # extract target string
  return(res)
}

You can use this function to add new columns, for example, for user:

DT[, user := myparse("user", Params)]

NA user:

DT[, user]
# [1] NA       "459001" NA       "459001" NA
+6

, :

library(yaml)

DT = data.frame(
    Params=c("{ clientID : 459;  time : 1386868908703;  version : 6}","{ clientID : 459;  id : 52a9ea8b534b2b0b5000575f;  time : 1386868824339;  user : 459001}","{ clientID : 988;  time : 1388939739771}","{ clientID : 459;  id : 52a9ec00b73cbf0b210057e9;  time : 1386868810519;  user : 459001}","{ clientID : 459;  time : 1388090530634}"), 
    stringsAsFactors=F
    )

conv.to.yaml <- function(x){
     gsub(';  ','\n',substr(x, 3, nchar(x)-1))
}

tmp <- lapply( DT$Params, function(x) yaml.load(conv.to.yaml(x)) )  

:

unames <- unique( unlist(sapply( tmp, names) ) )
res <- as.data.frame(  do.call(rbind, lapply(tmp, function(x)x[unames]) ) )
colnames( res ) <- unames
res

, , :

> res
  clientID       time version                       id   user
1      459 -405527905       6                     NULL   NULL
2      459 -405612269    NULL 52a9ea8b534b2b0b5000575f 459001
3      988 1665303163    NULL                     NULL   NULL
4      459 -405626089    NULL 52a9ec00b73cbf0b210057e9 459001
5      459  816094026    NULL                     NULL   NULL
0

All Articles