How can I assume that rbind () will not become very slow as the size of the framework grows?

Question

How can I assume that rbind () will not become very slow as the size of the framework grows?

I have a dataframe with only 1 row. To this, I start adding lines using rbind

df #mydataframe with only one row for (i in 1:20000) { df<- rbind(df, newrow) }

it becomes very slow when i was growing up. Why is this? and how can I make this type of code faster?

+6

performance append r dataframe rbind

Mark Feb 04 '13 at 19:22

source share

2 answers

I tried an example. For what it's worth, he agrees with the user's statement that inserting rows into a data frame is also very slow. I do not quite understand what is happening, since I expected the distribution problem to cause copy speed. Can someone repeat this or explain why the results below (rbind & ltend appending <insertion) would be true at all or explain why this is not a typical example (for example, a data frame too small)?

edit : the first time I forgot to initialize an object in hell2fun a data frame, so the code performed operations on matrices, not operations on data frames, which are much faster. If I get a chance, I will continue the comparison with the data frame and matrix. However, the qualitative statements in the first paragraph remain.

 N <- 1000 set.seed(101) r <- matrix(runif(2*N),ncol=2) ## second circle of hell hell2fun <- function() { df <- as.data.frame(rbind(r[1,])) ## initialize for (i in 2:N) { df <- rbind(df,r[i,]) } } insertfun <- function() { df <- data.frame(x=rep(NA,N),y=rep(NA,N)) for (i in 1:N) { df[i,] <- r[i,] } } rsplit <- as.list(as.data.frame(t(r))) rbindfun <- function() { do.call(rbind,rsplit) } library(rbenchmark) benchmark(hell2fun(),insertfun(),rbindfun()) ## test replications elapsed relative user.self ## 1 hell2fun() 100 32.439 484.164 31.778 ## 2 insertfun() 100 45.486 678.896 42.978 ## 3 rbindfun() 100 0.067 1.000 0.076

+1

Ben bolker Feb 04 '13 at 22:22

source share

joran · Accepted Answer · 2013-02-04T19:32:38+0000

You are in the 2nd circle of hell , namely, you are not able to pre-select data structures.

Growing objects in this way is a Very Very Bad Thing in R. Either pre-distribute and insert:

 df <- data.frame(x = rep(NA,20000),y = rep(NA,20000))

or rebuild your code to avoid this kind of incremental row additions. As mentioned in the link I quote, the reason for the slowness is that every time you add a row, R needs to find a new contiguous block of memory so that it matches the data frame. Copy copied.

How can I assume that rbind () will not become very slow as the size of the framework grows?

More articles: