short version: how to replace values ββin a data frame with a string found in another data frame?
longer version: I am a biologist working with many types of bees. I have a dataset with many thousands of bees. Each line has a unique bee identification number along with all the necessary information about this sample (capture data, GPS location, etc.). Species information for each bee has not been entered because it takes a long time to identify them. When IDing, I get boxes of hundreds of bees, all of the same species. I enter them in a separate data frame. I am trying to write code that will update the original data file with information about species (family, gender, species, sex, etc.), as I ID bees. Currently, the type information in the source data file is empty and interpreted as NA inside R. I want R to find all the unique identifiers of the bees and fill in the species information, but it's hard for me to figure out how to replace the NA Values ββwith a string (for example, "Andrenidae ")
Here is a simple example of what I'm trying to do:
rawData<-data.frame(beeID=c(1:20),family=rep(NA,20)) speciesInfo<-data.frame(beeID=seq(1,20,3),family=rep("Andrenidae",7)) rawData[rawData$beeID == 4,"family"] <- speciesInfo[speciesInfo$beeID == 4,"family"]
So, I replace things as I want, but with a number, not a surname (string). In the end, I would like to write a small loop to add all the data in the view, for example:
for (i in speciesInfo$beeID){ rawData[rawData$beeID == i,"family"] <- speciesInfo[speciesInfo$beeID == i,"family"] }
Thanks in advance for any advice!
Greetings
Zach
EDIT:
I just noticed that the first two methods below each time add a new column, which can cause problems if I need to add view information several times (which I usually do). For example:
rawData<-data.frame(beeID=c(1:20),family=rep(NA,20)) Andrenidae<-data.frame(beeID=seq(1,20,3),family=rep("Andrenidae",7)) Halictidae<-data.frame(beeID=seq(1,20,3)+1,family=rep("Halictidae",7)) # using join library(plyr) rawData <- join(rawData, Andrenidae, by = "beeID", type = "left") rawData <- join(rawData, Halictidae, by = "beeID", type = "left") # using merge rawData <- merge(x=rawData,y=Andrenidae,by='beeID',all.x=T,all.y=F) rawData <- merge(x=rawData,y=Halictidae,by='beeID',all.x=T,all.y=F)
Is there a way to collapse columns so that I have one single data frame? Or a way to update rawData rather than adding a new column every time? Thanks in advance!