Strict display of errors by groups

Sorry for the massive data dump, but I can't play it on the subsets of the data I tried. Copy the dput data (165 sb, not crazy) into this gist .

I am trying to build data in a DT on sport , according to:

  • Create an empty storyline with appropriate restrictions to accommodate all data
  • Select the gini column as a scatter chart, with the colors changing to sport
  • Separate the five_year_ma column as a row with matching colors, which is 2.

It should be simple, and I did things like before. Here is what should work:

 #empty plot with proper axes DT[ , plot( NA, ylim = range(gini), xlim = range(season), xlab = "Season", ylab = "Gini", main = "Comparison of Gini Coefficient Across Sports")] #pick colors for each sport cols <- c(NHL="black", NBA="red") DT[ , { #add points to current plot points(season, gini, col = cols[.BY$sport]) #add lines to current plot lines(season, five_yr_ma, col = cols[.BY$sport], lwd = 3)}, by = sport] 

But this gives me the output / error:

 # Empty data.table (0 rows) of 1 col: sport 

Error: x and y lengths are different plot.xy()

This is strange. If we skip the grouping and just do it manually, it works fine:

 all_sports[sport == "NBA", { points(season, gini, col = "red") lines(season, five_yr_ma, col = "red", lwd = 3)}] all_sports[sport == "NHL", { points(season, gini, col = "black") lines(season, five_yr_ma, col = "black", lwd = 3)}] 

expected

Moreover, even in the context of grouping, it is not clear why plot.xy received arguments of different lengths - if we make the following setting to force R to write records immediately before sending them, there is no problem:

 all_sports[ , { cat("\n\nPlotting for sport: ", .BY$sport) points(x1 <- season, y1 <- gini, col = cols[.BY$sport]) lines(x2 <- season, y2 <- five_yr_ma, col = cols[.BY$sport], lwd = 3) cat("\npoints/season: ",length(x1), "\npoints/gini: ", length(y1), "\nlines/season: ", length(x2), "\nlines/five_yr_ma: ", length(y2))}, by = sport] 

Has a conclusion:

 # Plotting for sport: NHL # points/season: 98 # points/gini: 98 # lines/season: 98 # lines/five_yr_ma: 98 # Plotting for sport: NBA # points/season: 67 # points/gini: 67 # lines/season: 67 # lines/five_yr_ma: 67 

What can happen?


Since this seems like it is not often found on machines, here is my sessionInfo() :

 R version 3.2.4 (2016-03-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.9.7 loaded via a namespace (and not attached): [1] rsconnect_0.4.1.11 tools_3.2.4 
+1
source share
1 answer

In fact, as @Arun points out, this seems to be a repetition of the problem (as yet unresolved) that caused the error in this question:

Incorrect group values ​​are used when using plot () in data.table () in RStudio

As @Arun discovered there, it looks like the native RStudio graphics device somehow gets confused by changing the pointers used for the different subgroups created when evaluating j when by is present, which can be bypassed simply copy all .SD every time, for example:

 points(copy(season), copy(gini), col = cols[.BY$sport]) lines(copy(season), copy(five_yr_ma), col = cols[.BY$sport], lwd = 3) 

or

 x <- copy(.SD) with(x, {points(season, gini, cols = cols[.BY$sport]); lines(copy(season), copy(five_yr_ma), col = cols[.BY$sport], lwd = 3)}) 

Both of them worked for me (since the subgroups are so small, there are no problems with computational efficiency - we can copy away without affecting performance noticeably).

This is # 1524 on the data.table GitHub page, and I logged this bug report with RStudio Support; will update this if the fix is ​​fixed.

+2
source

Source: https://habr.com/ru/post/1212921/


All Articles