Fill a new column in the data area with a double matrix search

I have a dataframe df:

colour shape 'red' circle 'blue' square 'blue' circle 'green' sphere 

And double matrix m with named rows / columns

  circle square sphere red 1 4 7 blue 2 5 8 green 3 6 9 

I would like to add a new column in DF to get:

 id colour shape 1 'red' circle 5 'blue' square 2 'blue' circle 9 'green' sphere 

I tried to do this with the following code, but it does not work:

 df$id <- m[df$colour,df$shape] 

I also tried apply (); and it looks like but no luck. Can someone tell me the correct approach to this without using a loop?

+8
r
source share
6 answers

I think that I can win the short answer contest here, as long as these are more likely symbol vectors, rather than factors that might be more expected, unless you make special efforts to avoid this. It really only adds cbind to convert the two df "character" vectors to the two-column matrix expected by the function [.matrix that you were very close to success in using. (And that also seems reasonably expressive.)

 # Data construct d <- data.frame(color=c('red','blue','blue','green'), shape=c('circle','square','circle','sphere'), stringsAsFactors=FALSE) m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere'))) # Code: d$id <- with( d, m [ cbind(color, shape) ] ) d color shape id 1 red circle 1 2 blue square 5 3 blue circle 2 4 green sphere 9 
+5
source share

A fairly simple (and fast!) Alternative is to use a matrix to index into your matrix:

 # Your data d <- data.frame(color=c('red','blue','blue','green'), shape=c('circle','square','circle','sphere')) m <- matrix(1:9, 3,3, dimnames=list(c('red','blue','green'), c('circle','square','sphere'))) # Create index matrix - each row is a row/col index i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m))) # Now use it and add as the id column... d2 <- cbind(id=m[i], d) d2 # id color shape #1 1 red circle #2 5 blue square #3 2 blue circle #4 9 green sphere 

The match function is used to find the corresponding numerical index for a particular string.

Note that in the newer version of R (2.13 and newer, I think) you can use character strings in the index matrix. Unfortunately, color and shape columns are usually factors , and cbind doesn't like this (it uses integer codes), so you need to force them using as.character :

 i <- cbind(as.character(d$color), as.character(d$shape)) 

... I suspect that using match more efficient.

EDIT I measured and apparently approximately 20% faster used match :

 # Make 1 million rows d <- d[sample.int(nrow(d), 1e6, TRUE), ] system.time({ i <- cbind(match(d$color, rownames(m)), match(d$shape, colnames(m))) d2 <- cbind(id=m[i], d) }) # 0.46 secs system.time({ i <- cbind(as.character(d$color), as.character(d$shape)) d2 <- cbind(id=m[i], d) }) # 0.55 secs 
+7
source share

Another answer. Use reshape2 and plyr packages (optional only for attachment).

 require(plyr) require(reshape2) Df <- data.frame(colour = c("red", "blue", "blue", "green"), shape = c("circle", "square", "circle", "sphere")) Mat <- matrix(1:9, dimnames = list(c("red", "blue", "green"), c("circle", "square", "sphere")), nrow = 3) Df2 <- melt.array(Mat, varnames = c("colour", "shape")) join(Df, Df2) result <- join(Df, Df2) join(Df, Df2) Joining by: colour, shape colour shape value 1 red circle 1 2 blue square 5 3 blue circle 2 4 green sphere 9 

Hope for this help

+2
source share

merge() is your friend here. To use it, we need an appropriate data frame to combine with the contents of a complex version of your ID matrix. I create this as newdf following code:

 df <- data.frame(matrix(1:9, ncol = 3)) colnames(df) <- c("circle","square","sphere") rownames(df) <- c("red","blue","green") newdf <- cbind.data.frame(ID = unlist(df), expand.grid(colour = rownames(df), shape = colnames(df))) 

Result:

 > newdf ID colour shape circle1 1 red circle circle2 2 blue circle circle3 3 green circle square1 4 red square square2 5 blue square square3 6 green square sphere1 7 red sphere sphere2 8 blue sphere sphere3 9 green sphere 

Then with your raw data in the df2 object defined with

 df2 <- data.frame(colour = c("red","blue","blue","green"), shape = c("circle","square","circle","sphere")) 

use merge()

 > merge(newdf, df2, sort = FALSE) colour shape ID 1 red circle 1 2 blue circle 2 3 blue square 5 4 green sphere 9 

You can save this and reorder the columns if you need it:

 > res <- merge(newdf, df2, sort = FALSE) > res <- res[,c(3,1,2)] > res ID colour shape 1 1 red circle 2 2 blue circle 3 5 blue square 4 9 green sphere 
+1
source share

You can also convert the matrix m to a vector, and then map the identifier to the color and shape values:

 df<-data.frame(colour=c("red","blue","blue","green"), shape=c("circle","square","circle","sphere")) m<-matrix(1:9,nrow=3,dimnames=list(c("red","blue","green"), c("circle","square","sphere"))) mVec<-as.vector(m) 

The next step corresponds to the color in df in the corresponding dimname in the m-matrix, then adds an integer corresponding to the form. The result in the m-vector index with the corresponding ID.

 df$ID<-mVec[match(df$colour, dimnames(m)[[1]]) + (dim(m)[1]* (match(df$shape, dimnames(m)[[2]]) - 1))] 
+1
source share
 #recreating your data dat <- read.table(text="colour shape 'red' circle 'blue' square 'blue' circle 'green' sphere", header=TRUE) d2 <- matrix(c(1:9), ncol=3, nrow=3, byrow=TRUE) dimnames(d2) <-list(c('circle', 'square', 'sphere'), c("red", "blue", "green")) d2<-as.table(d2) #make a list of matching to the row and column names of the look up matrix LIST <- list(match(dat[, 2], rownames(d2)), match(dat[, 1], colnames(d2))) #use sapply to index the lookup matrix using the row and col values from LIST id <- sapply(seq_along(LIST[[1]]), function(i) d2[LIST[[1]][i], LIST[[2]][i]]) #put it all back together data.frame(id=id, dat) 
0
source share

All Articles