How to combine two levels in one categorical variable in R

Now I'm learning R, and I am having trouble finding a command.

I have categorical data

levels(job)
[1] "admin."        "blue-collar"   "entrepreneur"  "housemaid"    
[5] "management"    "retired"       "self-employed" "services"     
[9] "student"       "technician"    "unemployed"    "unknown"

Now I want to simplify these levels, for example

levels(job) 
[1] "class1"  "class2" "class3" "unknown"

where type1includes "admin.", "entrepreneur"and "self-employed"; type2includes "blue-collar", "management"and "technician"; type3It includes "housemaid", "student", "retired"and "services"; unknownincludes "unknown"and "unemployed".

To this end, which command can I use? Thank you Jan

+4
source share
4 answers

You can assign levels:

levels(z)[levels(z)%in%c("unemployed","unknown","self-employed")] <- "unknown"

This is described in the help file - enter ?levels.


Stealing from @akrun's answer, you can do this most cleanly with a hash / list:

ha <- list(
  unknown = c("unemployed","unknown","self-employed"),
  class1  = c("admin.","management")
)

for (i in 1:length(ha)) levels(z)[levels(z)%in%ha[[i]]] <- names(ha)[i]
+9
source

- "/" ""

indx <-  setNames(rep(c(paste0('type',1:3), 'unknown'), c(3,3,4,2)), 
      c(levels(job)[c(1,3,7)], levels(job)[c(2,5,10)], 
      levels(job)[c(4,6,8,9)], levels(job)[c(11,12)]))

factor(unname(indx[as.character(job)]))

v1 <- c('admin.', 'blue-collar', 'entrepreneur', 'housemaid',
'management', 'retired', 'self-employed', 'services', 'student', 
'technician', 'unemployed', 'unknown')
set.seed(24)
job <- factor(sample(v1, 50, replace=TRUE))
+4

recode car.

( , , , - )

+3

base-r: character, , factor().

job <- as.character(job)
job[job %in% c("admin.","entrepreneur","self-employed")] <- "class1"
... # do the same for the other classes
job <- factor(job)

irec() questionr. , , .

0

All Articles