NA quick change - error or warning

I have a big data.frame called "mat" from 49952 obs. out of 7597 variables, and I'm trying to replace NAs with zeros. Here is an example of what my data.frame looks like:

ABCEFDQZ . . . 1 1 1 0 NA NA 0 NA NA 2 0 0 1 NA NA 0 NA NA 3 0 0 0 NA NA 1 NA NA 4 NA NA NA NA NA NA NA NA 5 0 1 0 1 NA 0 NA NA 6 1 1 1 0 NA 0 NA NA 7 0 0 1 0 NA 1 NA NA . . . 

I need a very fast tool to replace them. The result should look like this:

  ABCEFDQZ . . . 1 1 1 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 3 0 0 0 0 0 1 0 0 4 0 0 0 0 0 0 0 0 5 0 1 0 1 0 0 0 0 6 1 1 1 0 0 0 0 0 7 0 0 1 0 0 1 0 0 . . . 

I already tried lapply(mat, function(x){replace(x, is.na(x),0)}) - didn't work - mat[is.na(mat)] <- 0 - an error and maybe too slow - and also link - don't work too.

@Sotos already advised me to plyr::rbind.fill(lapply(L, as.data.frame)) , but it didn’t work because it makes data.frame from 379485344 observations and 1 variable (which is 49952x7597), so I should also return it back. Is there a better way to do this?

The real structure of my data.frame:

 > str(mat) 'data.frame': 49952 obs. of 7597 variables: $ 6794602 : num 1 NA NA NA NA 0 0 0 0 0 ... $ 1008667 : num NA 1 0 NA NA 0 0 0 0 0 ... $ 8009082 : num NA 0 1 NA NA NA NA NA NA NA ... $ 6740421 : num NA NA NA 1 NA 0 0 0 0 0 ... $ 6777805 : num NA NA NA NA 1 NA NA NA NA NA ... $ 1001682 : num NA NA NA NA NA 0 0 0 0 0 ... $ 1001990 : num NA NA NA NA NA 0 0 0 0 0 ... $ 1002541 : num NA NA NA NA NA 0 0 0 0 0 ... $ 1002790 : num NA NA NA NA NA 0 0 0 0 0 ... 

Note:

when I tried mat[is.na(mat)] <- 0 , there was a warning:

 > mat[is.na(mat)] <- 0 Warning messages: 1: In `[<-.factor`(`*tmp*`, thisvar, value = 0) : invalid factor level, NA generated 2: In `[<-.factor`(`*tmp*`, thisvar, value = 0) : invalid factor level, NA generated > nlevels(mat) [1] 0 

Data.frame after using mat[is.na(mat)] <- 0 :

 > str(mat) 'data.frame': 49952 obs. of 7597 variables: $ 6794602 : num 1 0 0 0 0 0 0 0 0 0 ... $ 1008667 : num 0 1 0 0 0 0 0 0 0 0 ... $ 8009082 : num 0 0 1 0 0 0 0 0 0 0 ... $ 6740421 : num 0 0 0 1 0 0 0 0 0 0 ... $ 6777805 : num 0 0 0 0 1 0 0 0 0 0 ... $ 1001682 : num 0 0 0 0 0 0 0 0 0 0 ... $ 1001990 : num 0 0 0 0 0 0 0 0 0 0 ... $ 1002541 : num 0 0 0 0 0 0 0 0 0 0 ... $ 1002790 : num 0 0 0 0 0 0 0 0 0 0 ... 

So the questions are:

  • Is there any other quick way to replace NA?
  • Is warning a big deal? Since the data after using mat[is.na(mat)] <- 0 looks the way I want, but there are too many values, so I can’t check if they are all right.
+2
r dataframe na
source share
2 answers

Try the following:

 mat %>% replace(is.na(.), 0) 
+2
source share

If you suspect that some of your columns are factors, you can use the following code to detect and change them to numeric.

 inx <- sapply(mat, inherits, "factor") mat[inx] <- lapply(mat[inx], function(x) as.numeric(as.character(x))) 

Then try the following.

 mat[] <- lapply(mat, function(x) {x[is.na(x)] <- 0; x}) mat 

And here is the data.

 mat <- structure(list(A = c(1L, 0L, 0L, NA, 0L, 1L, 0L), B = c(1L, 0L, 0L, NA, 1L, 1L, 0L), C = c(0L, 1L, 0L, NA, 0L, 1L, 1L), E = c(NA, NA, NA, NA, 1L, 0L, 0L), F = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), D = c(0L, 0L, 1L, NA, 0L, 0L, 1L), Q = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Z = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)), .Names = c("A", "B", "C", "E", "F", "D", "Q", "Z"), row.names = c("1", "2", "3", "4", "5", "6", "7"), class = "data.frame") 
0
source share

All Articles