Replace <NA> in the factor column in R

I want to replace the <NA> values ​​in the factor column with a valid value. But I can not find a way. This example is for demonstration purposes only. The source data comes from an external csv file that I have to deal with.

 df <- data.frame(a=sample(0:10, size=10, replace=TRUE), b=sample(20:30, size=10, replace=TRUE)) df[df$a==0,'a'] <- NA df$a <- as.factor(df$a) 

It might look like this:

  ab 1 1 29 2 2 23 3 3 23 4 3 22 5 4 28 6 <NA> 24 7 2 21 8 4 25 9 <NA> 29 10 3 24 

Now I want to replace the <NA> values ​​with a number.

 df[is.na(df$a), 'a'] <- 88 In `[<-.factor`(`*tmp*`, iseq, value = c(88, 88)) : invalid factor level, NA generated 

I think I missed R's fundamental concept of factors. I AM? I do not understand why this does not work. I think invalid factor level means 88 not a valid level in this coefficient, right? So, I have to tell the factor column, what is another level?

+6
source share
4 answers

1) addNA If fac is a factor of addNA(fac) , the same factor, but with the addition of NA as a level. See ?addNA

So that the NA level is 88:

 facna <- addNA(fac) levels(facna) <- c(levels(fac), 88) 

giving:

 > facna [1] 1 2 3 3 4 88 2 4 88 3 Levels: 1 2 3 4 88 

1a) This can be written on one line as follows:

 `levels<-`(addNA(fac), c(levels(fac), 88)) 

2) factor It can also be executed on the same line using various factor arguments as follows:

 factor(fac, levels = levels(addNA(fac)), labels = c(levels(fac), 88), exclude = NULL) 

2a) or equivalent:

 factor(fac, levels = c(levels(fac), NA), labels = c(levels(fac), 88), exclude = NULL) 

3) ifelse Another approach:

 factor(ifelse(is.na(fac), 88, paste(fac)), levels = c(levels(fac), 88)) 

Note: We used the following to enter fac

 fac <- structure(c(1L, 2L, 3L, 3L, 4L, NA, 2L, 4L, NA, 3L), .Label = c("1", "2", "3", "4"), class = "factor") 

Update: Improved (1) and added (1a).

+14
source

The basic concept of a factor variable is that it can take only certain values, i.e. levels . Value not in levels invalid.

You have two options:

If you have a variable that follows this concept, be sure to identify all levels when creating it, even those that do not have the appropriate values.

Or make a variable a character variable and work with it.

PS: Often these problems arise due to data import. For example, what you are showing seems to be a numeric variable, not a factor variable.

+2
source

The problem is that NA not the level of this factor:

 > levels(df$a) [1] "2" "4" "5" "9" "10" 

You cannot change it right away, but the following will do the trick:

 df$a <- as.numeric(as.character(df$a)) df[is.na(df$a),1] <- 88 df$a <- as.factor(df$a) > df$a [1] 9 88 3 9 5 9 88 8 3 9 Levels: 3 5 8 9 88 > levels(df$a) [1] "3" "5" "8" "9" "88" 
+2
source

Another way to do this:

 #check levels levels(df$a) #[1] "3" "4" "7" "9" "10" #add new factor level. ie 88 in our example df$a = factor(df$a, levels=c(levels(df$a), 88)) #convert all NA to 88 df$a[is.na(df$a)] = 88 #check levels again levels(df$a) #[1] "3" "4" "7" "9" "10" "88" 
0
source

All Articles