R: ifelse function returns vector position instead of value (string)

I have a very strange problem regarding the ifelse function: it does not return the coefficient (as I want), but something like the position of the factor.

Downloadable dataset can be downloaded here .

What I want

.. consists in creating a new column in df that contains the name of the country if the country belongs to the 12 most frequent countries (in the "answer" column). Otherwise, it must contain "Other"

What I've done

... there is

  • Create a list with the most common country names using as.data.frame (summary .. etc.) ## this works
  • The TRUE part of the function matches the df $ col value with this list using% in% ##, this also works
  • The return value if TRUE should be a factor (country name) in this

but

... R returns something really strange: it returns the position of the factor level (from 1 to 181) for the top 10 countries, and for the rest - “other” (this is normal). This line returns the wrong value:

aDDs$answer, ## then it should be named as aDDs$answer **THIS IS THE PROBLEM** 

Code I use:

 ## create a list with most frequent country names temp <- row.names(as.data.frame(summary(aDDs$answer, max=12))) # create a df or something else with the summary output. colnames(temp)[1]="freq" "India" %in% temp #check if it works (yes) ## create new column that filters top results aDDs$top <- ifelse( aDDs$answer %in% temp, ## condition: match aDDs$answer with row.names in summary df aDDs$answer, ## then it should be named as aDDs$answer **THIS IS THE PROBLEM** "Other" ## else it should be named "Other" ) View(aDDs) 

PS. This is the next question of this , because it is slightly different and a separate question may be required.

+8
r if-statement
source share
3 answers

The answer field is a factor, so your function returns a number (factor level).

What you need to do:

 aDDs$answer <- as.character(aDDs$answer) 

and then it works.

+12
source share

This is because you have a factor:

 ifelse(c(T, F), factor(c("a", "b")), "other") #[1] "1" "other" 

Read the warning in help("ifelse") :

The result mode may depend on the value of the test (see examples), and the class attribute (see oldClass) of the result is taken from the test and may not be appropriate for values ​​selected from yes and no.

Sometimes it's better to use a construct such as

(tmp <- yes; tmp [! test] <- no [! test]; tmp) may be expanded to handle missing values ​​in the test.

+3
source share

Change ifelse as follows

 aDDs$top <- ifelse( aDDs$answer %in% temp, ## condition: match aDDs$answer with row.names in summary df levels(aDDs$answer)[aDDs$answer], ## then it should be named as aDDs$answer **THIS IS THE PROBLEM** "Other" ## else it should be named "Other" ) 

Pay attention to the levels function and square brackets. Levels know how many factors are their and their index. Thus, in essence, we are saying that this is a coefficient value corresponding to some index value.

Demo example:

 topCountries<-as.factor(c("India", "USA", "UK")) AllCountries<-as.factor(c("India", "USA", "UK", "China", "Brazil")) myData<-data.frame(AllCountries) myData myData$top<-ifelse( myData$AllCountries %in% topCountries, levels(myData$AllCountries)[myData$AllCountries], "Other" ) myData 

the top column in myData will have a "different" for China and Brazil. For strings where Allcountries in {India, USA, UK} will return its corresponding values, that is {India, USA, UK}. Without using levels it will return “Other” and a factor index for {India, USA, UK}.

0
source share

All Articles