How to remove line from zoo / xts object given timestamp

I was happy to work with this code:

z=lapply(filename_list, function(fname){ read.zoo(file=fname,header=TRUE,sep = ",",tz = "") }) xts( do.call(rbind,z) ) 

until at the end of one file Dirty Data appears:

  Open High Low Close Volume 2011-09-20 21:00:00 1.370105 1.370105 1.370105 1.370105 1 

and this is at the beginning of the following file:

  Open High Low Close Volume 2011-09-20 21:00:00 1.370105 1.371045 1.369685 1.3702 2230 

So rbind.zoo complains about duplicate.

I can not use something like :

  y <- x[ ! duplicated( index(x) ), ] 

since they are in different objects of the zoo, inside the list. And I cannot use aggregate , as I suggested here , because it is a list of zoo objects, and not one large zoo object. And I cannot get one large object due to duplicates. Catch-22.

So, when the situation gets tough, tough hack some of the loops (sorry fingerprints and stop, as this still doesn't work):

 indexes <- do.call("c", unname(lapply(z, index))) dups=duplicated(indexes) if(any(dups)){ duplicate_timestamps=indexes[dups] for(tix in 1:length(duplicate_timestamps)){ t=duplicate_timestamps[tix] print("We have a duplicate:");print(t) for(zix in 1:length(z)){ if(t %in% index(z[[zix]])){ print(z[[zix]][t]) if(z[[zix]][t]$Volume==1){ print("-->Deleting this one"); z[[zix]][t]=NULL #<-- PROBLEM } } } } stop("There are duplicate bars!!") } 

The bit that I was set to assigns NULL to the zoo line, does not delete it (error in NextMethod ("[<-"): the replacement has a zero length). OK, so I'm making a copy of the filter, without an offensive element ... but I can handle it:

 > z[[zix]][!t,] Error in Ops.POSIXt(t) : unary '!' not defined for "POSIXt" objects > z[[zix]][-t,] Error in `-.POSIXt`(t) : unary '-' is not defined for "POSIXt" objects 

PS While high-level solutions to my real problem of “duplicating lines in the list of zoo objects” are very welcome, here we are talking about how to remove a line from a zoo object using the POSIXt index object.


A small bit of test data:

 list(structure(c(1.36864, 1.367045, 1.370105, 1.36928, 1.37039, 1.370105, 1.36604, 1.36676, 1.370105, 1.367065, 1.37009, 1.370105, 5498, 3244, 1), .Dim = c(3L, 5L), .Dimnames = list(NULL, c("Open", "High", "Low", "Close", "Volume")), index = structure(c(1316512800, 1316516400, 1316520000), class = c("POSIXct", "POSIXt"), tzone = ""), class = "zoo"), structure(c(1.370105, 1.370115, 1.36913, 1.371045, 1.37023, 1.37075, 1.369685, 1.36847, 1.367885, 1.3702, 1.36917, 1.37061, 2230, 2909, 2782), .Dim = c(3L, 5L), .Dimnames = list(NULL, c("Open", "High", "Low", "Close", "Volume")), index = structure(c(1316520000, 1316523600, 1316527200), class = c("POSIXct", "POSIXt"), tzone = ""), class = "zoo")) 

UPDATE: Thanks to G. Grothendieck for the line delete solution. In the actual code, I followed the advice of Joshua and GSee to get a list of xts objects instead of a list of zoo objects. So my code has become:

 z=lapply(filename_list, function(fname){ xts(read.zoo(file=fname,header=TRUE,sep = ",",tz = "")) }) x=do.call.rbind(z) 

(As a note, pay attention to the do.call.rbind call. This is due to the fact that rbind.xts has serious memory problems. See https://stackoverflow.com/a/364660/ ... )

Then I delete the duplicates as a step after the process:

 dups=duplicated(index(x)) if(any(dups)){ duplicate_timestamps=index(x)[dups] to_delete=x[ (index(x) %in% duplicate_timestamps) & x$Volume<=1] if(nrow(to_delete)>0){ #Next line says all lines that are not in the duplicate_timestamp group # OR are in the duplicate timestamps, but have a volume greater than 1. print("Will delete the volume=1 entry") x=x[ !(index(x) %in% duplicate_timestamps) | x$Volume>1] }else{ stop("Duplicate timestamps, and we cannot easily remove them just based on low volume.") } } 
+4
source share
3 answers

If z1 and z2 are your zoo objects, then rbind when deleting any duplicates in z2 :

 rbind( z1, z2[ ! time(z2) %in% time(z1) ] ) 

Regarding the removal of points in a zoo object having a given time, this is already illustrated above, but in general, if tt is a time vector for removal:

 z[ ! time(z) %in% tt ] 

or if we knew that tt has one element, then z[ time(z) != tt ] .

+6
source

rbind.xts will allow duplicate index values, so it can work if you convert to xts first.

 x <- lapply(z, as.xts) y <- do.call(rbind, x) # keep last value of any duplicates y <- y[!duplicated(index(y),fromLast=TRUE),] 
+3
source

I think you're lucky if you convert to xts .

 a <- structure(c(1.370105, 1.370105, 1.370105, 1.370105, 1), .Dim = c(1L, 5L), index = structure(1316570400, tzone = "", tclass = c("POSIXct", "POSIXt")), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "", .Dimnames = list(NULL, c("Open", "High", "Low", "Close", "Volume")), class = c("xts", "zoo")) b <- structure(c(1.370105, 1.371045, 1.369685, 1.3702, 2230), .Dim = c(1L, 5L), index = structure(1316570400, tzone = "", tclass = c("POSIXct", "POSIXt")), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "", .Dimnames = list(NULL, c("Open", "High", "Low", "Close", "Volume")), class = c("xts", "zoo")) (comb <- rbind(a, b)) # Open High Low Close Volume #2011-09-20 21:00:00 1.370105 1.370105 1.370105 1.370105 1 #2011-09-20 21:00:00 1.370105 1.371045 1.369685 1.370200 2230 dupidx <- index(comb)[duplicated(index(comb))] # indexes of duplicates tail(comb[dupidx], 1) #last duplicate # now rbind the last duplicated row with all non-duplicated data rbind(comb[!index(comb) %in% dupidx], tail(comb[dupidx], 1)) 
+2
source

All Articles