The "Right" Way to Make a Line-In Replacement

I have a data frame that looks something like this:

dataDemo <- data.frame(POS = 1:4 , REF = c("A" , "T" , "G" , "C") , 
    ind1 = c("A" , "." , "G" , "C") , ind2 = c("A" , "C" , "C" , "."),
                                                  stringsAsFactors=FALSE)

dataDemo

  POS REF ind1 ind2
1   1   A    A    A
2   2   T    .    C
3   3   G    G    C
4   4   C    C    .

and I would like to replace everything. "" s value REFfor this string. Here is how I did it:

for(i in seq_along(dataDemo$REF)){
    dataDemo[i , ][dataDemo[i , ] == '.'] <- dataDemo$REF[i]
}

I would like to know if there is a more โ€œcorrectโ€ or idiomatic way to do this in R. Usually I try to use * when possible, and it looks like it is easy to adapt to this approach and make it more readable (and works faster ), but despite the fact that he has a lot of time, I have not achieved much success.

+4
source share
3 answers

base R, ".", REF.

# Get row numbers
rownrs <- which(dataDemo==".", arr.ind = TRUE)[,1]

# Replace values
dataDemo[dataDemo=="."] <- dataDemo$REF[rownrs]

# Result
dataDemo
#  POS REF ind1 ind2
#1   1   A    A    A
#2   2   T    T    C
#3   3   G    G    C
#4   4   C    C    C
+7

dplyr,

library(dplyr)

dataDemo %>% mutate_each(funs(ifelse(. == '.', REF, as.character(.))), -POS)
#   POS REF ind1 ind2
# 1   1   A    A    A
# 2   2   T    T    C
# 3   3   G    G    C
# 4   4   C    C    C
+8

Here is an option using setfrom data.table, which should be fast.

library(data.table)
setDT(dataDemo)
nm1 <- paste0("ind", 1:2)
for(j in nm1){
    i1  <- dataDemo[[j]]=="."
    set(dataDemo, i = which(i1), j=j,  value = dataDemo$REF[i1])
 }

dataDemo
#   POS REF ind1 ind2
#1:   1   A    A    A
#2:   2   T    T    C
#3:   3   G    G    C
#4:   4   C    C    C

EDIT: based on @alexis_laz comments


Or using dplyr

library(dplyr)
dataDemo %>% 
    mutate_each(funs(ifelse(.==".", REF,.)), ind1:ind2)
#    POS REF ind1 ind2
#1   1   A    A    A
#2   2   T    T    C
#3   3   G    G    C
#4   4   C    C    C

Or we can use methods base Rfor this in one line.

dataDemo[nm1] <- lapply(dataDemo[nm1], function(x) ifelse(x==".",  dataDemo$REF, x))
+4
source

All Articles