Dynamically add column to xts object

Adding a column to an xts object is simple if you know the column name ahead of time. For example, to add a column named "b":

n <- 5 x <- merge(xts(order.by = as.Date('2015-1-1') + 1:n), a = rnorm(n)) x$b <- rnorm(n) 

Adding a column with a dynamic name (i.e. a column whose name is known only at run time) is more difficult:

 new.col.name <- 'c' # known only at runtime x[, new.col.name] <- rnorm(n) # this generates an error 

One approach is to add a column with a temporary name and then rename it:

 stopifnot(!('tmp' %in% names(x))) x$tmp <- rnorm(n) names(x)[names(x) == 'tmp'] <- new.col.name 

Is there a better way to do this? (Also, assigning the names of the xts object will copy the created object? So, for example, would this approach work well if n were very large?)

+7
r xts
source share
2 answers

The easiest / clearest way is to combine the original object with a new column, after converting the new column (s) into a matrix (so that you can set the column name).

 set.seed(21) newData <- rnorm(n) x1 <- merge(x, matrix(newData, ncol=1, dimnames=list(NULL, new.col.name))) # another way to do the same thing dim(newData) <- c(nrow(x), 1) colnames(newData) <- new.col.name x2 <- merge(x, newData) 

To answer the second question: yes, assigning names (and colnames) on an xts object creates a copy. You can see this using tracemem and gc output.

 > R -q # new R session R> x <- xts::.xts(1:1e6, 1:1e6) R> tracemem(x) [1] "<0x2892400>" R> gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 259260 13.9 592000 31.7 350000 18.7 Vcells 1445207 11.1 4403055 33.6 3445276 26.3 R> colnames(x) <- "hi" tracemem[0x2892400 -> 0x24c1ad0]: tracemem[0x24c1ad0 -> 0x2c62d30]: colnames<- tracemem[0x2c62d30 -> 0x3033660]: dimnames<-.xts dimnames<- colnames<- tracemem[0x3033660 -> 0x3403f90]: dimnames<-.xts dimnames<- colnames<- tracemem[0x3403f90 -> 0x37d48c0]: colnames<- dimnames<-.xts dimnames<- colnames<- tracemem[0x37d48c0 -> 0x3033660]: dimnames<-.xts dimnames<- colnames<- R> gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 259696 13.9 592000 31.7 350000 18.7 Vcells 1445750 11.1 4403055 33.6 3949359 30.2 R> print(object.size(x), units="Mb") 7.6 Mb 

You can see that calling colnames<- causes ~ 4 MB of extra memory to be used ("max used (Mb)" increases by this amount). The entire xts object is ~ 8 MB, half of which is coredata and the other half is index . Thus, 4 MB of additional memory is used for copying coredata .

If you want to avoid copying, you can install it manually. But be careful, because you can do something that would otherwise be caught by "checks" in colnames<-.xts .

 > R -q # new R session R> x <- xts::.xts(1:1e6, 1:1e6) R> tracemem(x) [1] "<0x2cc5330>" R> gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 256397 13.7 592000 31.7 350000 18.7 Vcells 1440915 11.0 4397699 33.6 3441761 26.3 R> attr(x, 'dimnames') <- list(NULL, "hi") tracemem[0x2cc5330 -> 0x28f4a00]: R> gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 256403 13.7 592000 31.7 350000 18.7 Vcells 1440916 11.0 4397699 33.6 3441761 26.3 R> print(object.size(x), units="Mb") 7.6 Mb 
+8
source share

I believe that there is no good alternative, but the column names are just an attribute, so they are cheap to modify and no copies will be made. ( EDIT: uh-oh, just seen, I seem to be saying the opposite to Joshua .--> See the discussion in the comments. It seems that dimnames.xts does more than just set the attribute, and involves copying the main data, so be careful.)

You can also use cbind() , which is a synonym for merge.xts , but (AFAIK) it does not offer benefits for the x$b method that you showed:

 n <- 5 x <- merge(xts(order.by = as.Date('2015-1-1') + 1:n), a = rnorm(n)) x$b <- rnorm(n) x = cbind(x, c = rnorm(n)) colnames(x)[3] = "real name" 

I also showed one way to change the column name. If you do not know that this is the third column, then the general approach is: colnames(x)[length(colnames(x))] = "real name"

+1
source share

All Articles