Changing the data form to R. Is it possible to have two "variable values",

I am struggling with a reshape package looking for a way to "distinguish" a data frame, but with two (or more) values โ€‹โ€‹in "value.var".

Here is an example of what I want to achieve.

df <- data.frame( StudentID = c("x1", "x10", "x2", "x3", "x4", "x5", "x6", "x7", "x8", "x9"), StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'), ExamenYear = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'), Exam = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'), participated = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'), passed = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'), stringsAsFactors = FALSE) 

From df, I can create the following data framework:

 tx <- ddply(df, c('ExamenYear','StudentGender'), summarize, participated = sum(participated == "yes"), passed = sum(passed == "yes")) 

In the logic of change, I have two variable values " and passed

I am looking for a way to combine the following information in a single data frame:

  dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated') dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed') 

The final table I'm trying to create will look like this:

 tempTab1 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated') tempTab2 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed') as.data.frame(cbind(ExamenYear = tempTab1[,1], Female_Participated = tempTab1[,2], Female_Passed = tempTab2[,2], Male_Participated = tempTab1[,3], Male_Passed = tempTab2[,3] )) 

Is it possible to have two "variable values" in the translation function?

+6
source share
1 answer

Since you got this far, why not melt your tx object and use dcast as follows:

 dcast(melt(tx, id.vars=c(1, 2)), ExamenYear ~ StudentGender + variable) # ExamenYear F_participated F_passed M_participated M_passed # 1 2007 1 1 1 1 # 2 2008 1 1 2 2 # 3 2009 NA NA 3 2 

A more direct approach, however, is likely to be with melt your data from the start:

 df.m <- melt(df, id.vars=c(1:4)) dcast(df.m, ExamenYear ~ StudentGender + variable, function(x) sum(x == "yes")) # ExamenYear F_participated F_passed M_participated M_passed # 1 2007 1 1 1 1 # 2 2008 1 1 2 2 # 3 2009 0 0 3 2 

Update: R basic approach

While the required code is not โ€œprettyโ€, it is also not so difficult to do in the R database. Here is one approach:

  • Use aggregate() to get tx from your example.

     dfa <- aggregate(cbind(participated, passed) ~ ExamenYear + StudentGender, df, function(x) sum(x == "yes")) dfa # ExamenYear StudentGender participated passed # 1 2007 F 1 1 # 2 2008 F 1 1 # 3 2007 M 1 1 # 4 2008 M 2 2 # 5 2009 M 3 2 
  • Use reshape to convert dfa from long to wide.

     reshape(dfa, direction = "wide", idvar="ExamenYear", timevar="StudentGender") # ExamenYear participated.F passed.F participated.M passed.M # 1 2007 1 1 1 1 # 2 2008 1 1 2 2 # 5 2009 NA NA 3 2 
+11
source

Source: https://habr.com/ru/post/925465/


All Articles