How to join / join two tables using character values?

Question

How to join / join two tables using character values?

I would like to combine two tables based on first name, last name and year and create a new binary variable indicating whether there was a row from table 1 in the second table.

The first table is a panel data set of some attributes of NBA players during the season:

firstname<-c("Michael","Michael","Michael","Magic","Magic","Magic","Larry","Larry") lastname<-c("Jordan","Jordan","Jordan","Johnson","Johnson","Johnson","Bird","Bird") year<-c("1991","1992","1993","1991","1992","1993","1992","1992") season<-data.frame(firstname,lastname,year) firstname lastname year 1 Michael Jordan 1991 2 Michael Jordan 1992 3 Michael Jordan 1993 4 Magic Johnson 1991 5 Magic Johnson 1992 6 Magic Johnson 1993 7 Larry Bird 1992 8 Larry Bird 1992

The second data.frame is a panel data set of some of the attributes of NBA players selected for the All-Star game:

  firstname<-c("Michael","Michael","Michael","Magic","Magic","Magic") lastname<-c("Jordan","Jordan","Jordan","Johnson","Johnson","Johnson") year<-c("1991","1992","1993","1991","1992","1993") ALLSTARS<-data.frame(firstname,lastname,year) firstname lastname year 1 Michael Jordan 1991 2 Michael Jordan 1992 3 Michael Jordan 1993 4 Magic Johnson 1991 5 Magic Johnson 1992 6 Magic Johnson 1993

My desired result is as follows:

  firstname lastname year allstars 1 Michael Jordan 1991 1 2 Michael Jordan 1992 1 3 Michael Jordan 1993 1 4 Magic Johnson 1991 1 5 Magic Johnson 1992 1 6 Magic Johnson 1993 1 7 Larry Bird 1992 0 8 Larry Bird 1992 0

I tried using the left join. But not sure if that makes sense:

  test<-join(season, ALLSTARS, by =c("lastname","firstname","year") , type = "left", match = "all")

+6

r

user3833190 Jul 9 '15 at 12:19

source share

3 answers

Here's a simple solution using data.table binary connection, which allows you to update a column by reference when connecting

 library(data.table) setkey(setDT(season), firstname, lastname, year)[ALLSTARS, allstars := 1L] season # firstname lastname year allstars # 1: Larry Bird 1992 NA # 2: Larry Bird 1992 NA # 3: Magic Johnson 1991 1 # 4: Magic Johnson 1992 1 # 5: Magic Johnson 1993 1 # 6: Michael Jordan 1991 1 # 7: Michael Jordan 1992 1 # 8: Michael Jordan 1993 1

Or using dplyr

 library(dplyr) ALLSTARS %>% mutate(allstars = 1L) %>% right_join(., season) # firstname lastname year allstars # 1 Michael Jordan 1991 1 # 2 Michael Jordan 1992 1 # 3 Michael Jordan 1993 1 # 4 Magic Johnson 1991 1 # 5 Magic Johnson 1992 1 # 6 Magic Johnson 1993 1 # 7 Larry Bird 1992 NA # 8 Larry Bird 1992 NA

+4

David Arenburg Jul 9 '15 at 12:22

source share

In the database R:

 ALLSTARS$allstars <- 1L newdf <- merge(season, ALLSTARS, by=c('firstname', 'lastname', 'year'), all.x=TRUE) newdf$allstars[is.na(newdf$allstars)] <- 0L newdf

Or I like a different approach:

 season$allstars <- (apply(season, 1, function(x) paste(x, collapse='')) %in% apply(ALLSTARS, 1, function(x) paste(x, collapse='')))+0L # # firstname lastname year allstars # 1 Michael Jordan 1991 1 # 2 Michael Jordan 1992 1 # 3 Michael Jordan 1993 1 # 4 Magic Johnson 1991 1 # 5 Magic Johnson 1992 1 # 6 Magic Johnson 1993 1 # 7 Larry Bird 1992 0 # 8 Larry Bird 1992 0

+2

Pierre lafortune Jul 9 '15 at 12:42

source share

Sam firke · Accepted Answer · 2015-07-09T14:57:36+0000

It looks like you are using join() from the plyr package. You were almost there: just a preface to your team with ALLSTARS$allstars <- 1 . Then make your join as it is written and finally converts the NA values to 0. So:

 ALLSTARS$allstars <- 1 test <- join(season, ALLSTARS, by =c("lastname","firstname","year") , type = "left", match = "all") test$allstars[is.na(test$allstars)] <- 0

Result:

  firstname lastname year allstars 1 Michael Jordan 1991 1 2 Michael Jordan 1992 1 3 Michael Jordan 1993 1 4 Magic Johnson 1991 1 5 Magic Johnson 1992 1 6 Magic Johnson 1993 1 7 Larry Bird 1992 0 8 Larry Bird 1992 0

Although I personally would use left_join or right_join from the right_join package, as in David's answer, instead of plyr join() . Also note that in this case you really do not need the by join() argument, since by default the function will try to join all the fields with common names that you need here.

How to join / join two tables using character values?

More articles: