Creating a new variable from the lookup table

I have the following columns in my dataset:

presult aresult I single I double I triple I home run SS strikeout 

I would like to add a third โ€œbaseโ€ column, which depends on the value of the result in the aresult column.

For example, I would like the bases to be equal to 1 for one, 2 for double, 3 for triple, 4 for home start and 0 for out.

Normally I would create a new variable as follows:

 dataset$base<-ifelse(dataset$aresult=="single", 1, 0) 

The problem is that I donโ€™t know how to encode a new variable without setting all other variables.

+7
r dataframe
source share
4 answers

define a lookup table

 lookup= data.frame( base=c(0,1,2,3,4), aresult=c("strikeout","single","double","triple","home run")) 

then use join from plyr

 dataset = join(dataset,lookup,by='aresult') 
+14
source share

Here's how to use a named vector to search:

Define test data:

 dat <- data.frame( presult = c(rep("I", 4), "SS", "ZZ"), aresult = c("single", "double", "triple", "home run", "strikeout", "home run"), stringsAsFactors=FALSE ) 

Define a named number vector with estimates:

 score <- c(single=1, double=2, triple=3, `home run`=4, strikeout=0) 

Use vector indexing to compare results with results:

 dat$base <- score[dat$aresult] dat presult aresult base 1 I single 1 2 I double 2 3 I triple 3 4 I home run 4 5 SS strikeout 0 6 ZZ home run 4 

Additional Information:

If you do not want to create a named vector manually, say, in the case when you have large amounts of data, do it as follows:

 scores <- c(1:4, 5) names(scores) <- c("single", "double", "triple", "home run", "strikeout") 

(Or read the values โ€‹โ€‹and names from existing data. The goal is to build a numerical vector and then assign names.)

+13
source share

Alternative Answer Dieter :

 dat <- data.frame( presult = c(rep("I", 4), "SS", "ZZ"), aresult = c("single", "double", "triple", "home run", "strikeout", "home run"), stringsAsFactors=FALSE ) dat$base <- as.integer(factor(dat$aresult, levels=c("strikeout","single","double","triple","home run")))-1 
+2
source share
  dataset$base <- as.integer(as.factor(dataset$aresult)) 

Depending on your data, as.factor () may be omitted since in many cases the default strings are factors, for example. with read.table

+1
source share

All Articles