R 2D data frame splitting with secondary complex calculations

Question

R 2D data frame splitting with secondary complex calculations

I have a data frame that generally looks like

df.data <- data.frame(x=sample(1:9, 10, replace = T), y=sample(1:9, 10, replace=T), vx=sample(-1:1, 10, replace=T), vy=sample(-1:1, 10, replace=T))

x and y are positions. vx and vy are the x, y values for the 2d vector. I want to take this data frame and "bin" based on the values of x and y, but do the calculations on vx and vy. This function does this, except that it uses a loop that will be too slow for my dataset.

 slowWay <- function(df) { df.bin <- data.frame(expand.grid(x=0:3, y=0:3, vx=0, vy=0, count=0)) for(i in 1:nrow(df)) { x.bin <- floor(df[i, ]$x / 3) y.bin <- floor(df[i, ]$y / 3) print(c(x.bin, y.bin)) df.bin[df.bin$x == x.bin & df.bin$y == y.bin, ]$vx = df.bin[df.bin$x == x.bin & df.bin$y == y.bin, ]$vx + df[i, ]$vx df.bin[df.bin$x == x.bin & df.bin$y == y.bin, ]$vy = df.bin[df.bin$x == x.bin & df.bin$y == y.bin, ]$vy + df[i, ]$vy df.bin[df.bin$x == x.bin & df.bin$y == y.bin, ]$count = df.bin[df.bin$x == x.bin & df.bin$y == y.bin, ]$count + 1 } return(df.bin) }

Is this type of 2D binning impossible?

+4

r dataframe binning

robbie Mar 08 '13 at 21:32

source share

3 answers

This is one way, but you probably need to do it in a couple of steps if you want the full record to be with unpopular bin combinations:

 > by(df.data[, c("vx", "vy")], # input data list(x.bin=floor(df.data$x / 3), y.bin=floor(df.data$y / 3)), # grouping function(df) sapply(df, function(x) c(Sum=sum(x), Count=length(x) ) ) ) #calcs x.bin: 0 y.bin: 1 vx vy Sum 0 1 Count 1 1 --------------------------------------------------------------------- x.bin: 1 y.bin: 1 vx vy Sum 0 1 Count 2 2 --------------------------------------------------------------------- x.bin: 2 y.bin: 1 vx vy Sum -1 -2 Count 2 2 --------------------------------------------------------------------- x.bin: 0 y.bin: 2 vx vy Sum 1 0 Count 1 1 --------------------------------------------------------------------- x.bin: 1 y.bin: 2 NULL --------------------------------------------------------------------- x.bin: 2 y.bin: 2 vx vy Sum 2 1 Count 4 4

+1

42- Mar 08 '13 at 22:21

source share

Here is the version of data.table :

 library(data.table) dt.data<-as.data.table(df.data) # Convert to data.table dt.data[,c("x.bin","y.bin"):=list(floor(x/3),floor(y/3))] # Add bin columns setkey(dt.data,x.bin,y.bin) dt.bin<-CJ(x=0:3, y=0:3) # Cross join to create bin combinations dt.data.2<-dt.data[dt.bin,list(vx=sum(vx),vy=sum(vy),count=.N)] # Join the bins and data; sum vx/vy and count matching rows dt.data.2[is.na(vx),vx:=0L] # Replace NA with 0 dt.data.2[is.na(vy),vy:=0L] # Replace NA with 0 dt.data.2[order(y.bin,x.bin)] # Display the final data.table output ## x.bin y.bin vx vy count ## 1: 0 0 0 0 0 ## 2: 1 0 0 0 0 ## 3: 2 0 1 1 1 ## 4: 3 0 0 0 0 ## 5: 0 1 0 0 0 ## 6: 1 1 0 -2 3 ## 7: 2 1 0 0 0 ## 8: 3 1 0 0 0 ## 9: 0 2 0 0 1 ## 10: 1 2 0 0 0 ## 11: 2 2 0 2 3 ## 12: 3 2 -1 1 1 ## 13: 0 3 0 0 0 ## 14: 1 3 0 0 0 ## 15: 2 3 0 0 0 ## 16: 3 3 1 -1 1

+1

dnlbrky Mar 11 '13 at 6:18

source share

Theodore lytras · Accepted Answer · 2013-03-08T23:10:17+0000

Here's another quicker way to do this, which includes unpopular bin combinations:

 fasterWay <- function(df.data) { a1 <- aggregate(df.data[,3:4], list(x=floor(df.data$x/3), y=floor(df.data$y/3)), sum) a2 <- aggregate(list(count=rep(NA,nrow(df.data))), list(x=floor(df.data$x/3), y=floor(df.data$y/3)), length) result <- merge(expand.grid(y=0:3,x=0:3), merge(a1,a2), by=c("x","y"), all=TRUE) result[is.na(result)] <- 0 result <- result[order(result$y, result$x),] rownames(result) <- NULL result }

This gives me:

  xy vx vy count 1 0 0 0 0 1 2 0 1 0 0 0 3 0 2 -1 -1 1 4 0 3 0 0 0 5 1 0 -1 -1 1 6 1 1 0 0 0 7 1 2 0 0 0 8 1 3 -1 0 2 9 2 0 -1 -1 1 10 2 1 0 0 0 11 2 2 -1 1 2 12 2 3 0 0 1 13 3 0 0 0 0 14 3 1 0 0 0 15 3 2 -1 0 1 16 3 3 0 0 0

R 2D data frame splitting with secondary complex calculations

More articles: