Replace the specified character in a string variable with a character from another string variable of equal length

Question

Replace the specified character in a string variable with a character from another string variable of equal length

I have a data frame with two string variables with equal number of characters. These lines represent student responses to some exams. The first line contains a + sign to answer each question and a wrong answer for each wrong element. The second line contains all the correct answers. I want to replace all the + signs in the first line with the correct answer from the second line. Using this code, you can create a simplified heuristic data set:

df <- data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), 
                 v2 = c("DBBAD", "BDCAD","CDCCA"), stringsAsFactors = FALSE)

So, the signs + in df$v1must be replaced by w / letters in df$v2, which are at the same distance from the beginning of the line. Any ideas?

+4

regex r

Braden Dec 17 '13 at 21:31

source share

5 answers

This also seems to be correct:

mapply(function(x, y) paste0(ifelse(x == "+", y, x), collapse = ""), 
                 strsplit(as.character(df$v1), ""), strsplit(as.character(df$v2), ""))
#[1] "DAAAB" "DDCCC" "ADBAD"

+3

alexis_laz Dec 17 '13 at 21:59

source share

, , , , :

## df<-data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), v2 = c("DBBAD", "BDCAD","CDCCA"))
dats <- lapply(df, function(x) do.call(rbind, strsplit(as.character(x), "")))

dats[[1]][dats[[1]] == "+"] <- dats[[2]][dats[[1]] == "+"]

apply(dats[[1]], 1, paste, collapse = "")
## [1] "DAAAB" "DDCCC" "ADBAD"

, :

Unit: microseconds
     expr     min      lq  median       uq      max neval
 Andrea() 296.693 313.953 321.884 328.4155 2443.051  1000
   Josh() 300.891 314.420 319.551 326.5500 3748.779  1000
  Tyler() 144.148 155.344 159.543 164.2080 2233.593  1000
 Jibler() 174.937 188.932 193.597 198.7290 2269.514  1000
 Alexis() 154.877 167.007 171.672 175.4040 2342.753  1000
 Julius() 394.658 413.317 420.315 429.4120 2549.412  1000

+2

Tyler Rinker 17 . '13 21:40

, , lapply ifelse.

> dats <- lapply(df, function(x) do.call(rbind, strsplit(as.character(x), "")))
> apply(with(dats, ifelse(v1=="+", v2, v1)), 1, paste0, collapse="")
[1] "DAAAB" "DDCCC" "ADBAD"

+2

Jilber Urbina 17 . '13 22:22

df<-data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), 
               v2 = c("DBBAD", "BDCAD","CDCCA"),
               stringsAsFactors = F)


f <- function(x , y){
  xs <- unlist(strsplit(x, split = ""))
  ys <- unlist(strsplit(y, split = ""))
  paste(ifelse(xs == "+", ys , xs), collapse = "")
}

vapply(df$v1, f , df$v2, FUN.VALUE = character(1))

+1

Andrea 17 . '13 22:02

Julius Vainora · Accepted Answer · 2013-12-17T22:23:46+0000

When df$v1and df$v2are symbols, we can use

regmatches(df$v1, gregexpr("\\+", df$v1)) <- regmatches(df$v2, gregexpr("\\+", df$v1))

I.e,

df <- data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), 
                 v2 = c("DBBAD", "BDCAD", "CDCCA"), 
                 stringsAsFactors = FALSE)
rg <- gregexpr("\\+", df$v1)
regmatches(df$v1, rg) <- regmatches(df$v2, rg)
df
#      v1    v2
# 1 DAAAB DBBAD
# 2 DDCCC BDCAD
# 3 ADBAD CDCCA

rgcontains the positions "+" in df$v1, and we conveniently use it regmatchesto replace those matches with df$v1what is in df$v2at the same positions.

Replace the specified character in a string variable with a character from another string variable of equal length

More articles: