Replace <NA> in the factor column in R
I want to replace the <NA> values ββin the factor column with a valid value. But I can not find a way. This example is for demonstration purposes only. The source data comes from an external csv file that I have to deal with.
df <- data.frame(a=sample(0:10, size=10, replace=TRUE), b=sample(20:30, size=10, replace=TRUE)) df[df$a==0,'a'] <- NA df$a <- as.factor(df$a) It might look like this:
ab 1 1 29 2 2 23 3 3 23 4 3 22 5 4 28 6 <NA> 24 7 2 21 8 4 25 9 <NA> 29 10 3 24 Now I want to replace the <NA> values ββwith a number.
df[is.na(df$a), 'a'] <- 88 In `[<-.factor`(`*tmp*`, iseq, value = c(88, 88)) : invalid factor level, NA generated I think I missed R's fundamental concept of factors. I AM? I do not understand why this does not work. I think invalid factor level means 88 not a valid level in this coefficient, right? So, I have to tell the factor column, what is another level?
1) addNA If fac is a factor of addNA(fac) , the same factor, but with the addition of NA as a level. See ?addNA
So that the NA level is 88:
facna <- addNA(fac) levels(facna) <- c(levels(fac), 88) giving:
> facna [1] 1 2 3 3 4 88 2 4 88 3 Levels: 1 2 3 4 88 1a) This can be written on one line as follows:
`levels<-`(addNA(fac), c(levels(fac), 88)) 2) factor It can also be executed on the same line using various factor arguments as follows:
factor(fac, levels = levels(addNA(fac)), labels = c(levels(fac), 88), exclude = NULL) 2a) or equivalent:
factor(fac, levels = c(levels(fac), NA), labels = c(levels(fac), 88), exclude = NULL) 3) ifelse Another approach:
factor(ifelse(is.na(fac), 88, paste(fac)), levels = c(levels(fac), 88)) Note: We used the following to enter fac
fac <- structure(c(1L, 2L, 3L, 3L, 4L, NA, 2L, 4L, NA, 3L), .Label = c("1", "2", "3", "4"), class = "factor") Update: Improved (1) and added (1a).
The basic concept of a factor variable is that it can take only certain values, i.e. levels . Value not in levels invalid.
You have two options:
If you have a variable that follows this concept, be sure to identify all levels when creating it, even those that do not have the appropriate values.
Or make a variable a character variable and work with it.
PS: Often these problems arise due to data import. For example, what you are showing seems to be a numeric variable, not a factor variable.
The problem is that NA not the level of this factor:
> levels(df$a) [1] "2" "4" "5" "9" "10" You cannot change it right away, but the following will do the trick:
df$a <- as.numeric(as.character(df$a)) df[is.na(df$a),1] <- 88 df$a <- as.factor(df$a) > df$a [1] 9 88 3 9 5 9 88 8 3 9 Levels: 3 5 8 9 88 > levels(df$a) [1] "3" "5" "8" "9" "88"