How to create β€œNA” for missing data in a time series

I have several data files that look like this:

X code year month day pp 1 4515 1953 6 1 0 2 4515 1953 6 2 0 3 4515 1953 6 3 0 4 4515 1953 6 4 0 5 4515 1953 6 5 3.5 

Sometimes there is no data, but I do not have NA, the rows just don't exist. I need to create NA when data is missing. I could start by determining when this happens, converting it to a zoo object and checking strict regularity (I had never used a zoo before), I used the following code:

 z.date<-paste(CET$year, CET$month, CET$day, sep="/") z <- read.zoo(CET, order.by= z.date ) reg<-is.regular(z, strict = TRUE) 

But the answer is always correct!

Can someone tell me why it is not working? Or better yet, tell me a way to create an NA when data is missing (with or without zoo)?

thanks

+8
r time-series missing-data
source share
4 answers

The seq function has some interesting functions that you can use to easily generate a complete sequence of dates. For example, the following code can be used to generate a sequence of dates starting April 25th:

Edit: This function is documented in ?seq.Date

 start = as.Date("2011/04/25") full <- seq(start, by='1 day', length=15) full [1] "2011-04-25" "2011-04-26" "2011-04-27" "2011-04-28" "2011-04-29" [6] "2011-04-30" "2011-05-01" "2011-05-02" "2011-05-03" "2011-05-04" [11] "2011-05-05" "2011-05-06" "2011-05-07" "2011-05-08" "2011-05-09" 

Now use the same principle to generate some data with β€œmissing” rows, generating a sequence for every second day:

 partial <- data.frame( date=seq(start, by='2 day', length=6), value=1:6 ) partial date value 1 2011-04-25 1 2 2011-04-27 2 3 2011-04-29 3 4 2011-05-01 4 5 2011-05-03 5 6 2011-05-05 6 

To answer your question, you can use a vector subscription or the match function to create a dataset using NA:

 with(partial, value[match(full, date)]) [1] 1 NA 2 NA 3 NA 4 NA 5 NA 6 NA NA NA NA 

To combine this result with the original complete data:

 data.frame(Date=full, value=with(partial, value[match(full, date)])) Date value 1 2011-04-25 1 2 2011-04-26 NA 3 2011-04-27 2 4 2011-04-28 NA 5 2011-04-29 3 6 2011-04-30 NA 7 2011-05-01 4 8 2011-05-02 NA 9 2011-05-03 5 10 2011-05-04 NA 11 2011-05-05 6 12 2011-05-06 NA 13 2011-05-07 NA 14 2011-05-08 NA 15 2011-05-09 NA 
+19
source share

In the zoo package, β€œregular” means that the series is evenly distributed, with the possible exception of some missing entries. The zooreg class in the zoo package is specifically designed for this type of series. Note that the set of all regular series includes the set of all equally spaced series, but strictly more.

The is.regular function checks the correctness of the given series. That is, is there a series that can make it equal to spaced by inserting NA for missing records?

Regarding your last question, its FAQ. See Frequently Asked Questions No. 13 at the Zoo. The FAQ can be obtained on the zoo CRAN page or within R via:

 vignette("zoo-faq") 

Also in FAQ # 13 there is an illustrative code.

+4
source share

First of all, it should be noted that z.date is a symbol, not a date.

Here's how I would solve your problem using xts (a subclass of the zoo).

 # remove the third obs from sample data CET <- CET[-3,] # create an actual Date column in CET CET$date <- as.Date(with(CET, paste(year, month, day, sep="-"))) # create an xts object using 'date' column x <- xts(CET[,c("code","pp")], CET$date) # now merge 'x' with a regular date sequence spanning the start/end of 'x' X <- merge(x, timeBasedSeq(paste(start(x), end(x), sep="::"))) X # code pp # 1953-06-01 4515 0.0 # 1953-06-02 4515 0.0 # 1953-06-03 NA NA # 1953-06-04 4515 0.0 # 1953-06-05 4515 3.5 
+2
source share

I had to deal with a similar problem with a monthly time series. I did this with directly connecting the two data.table / data.frame in a time variable. I believe that time series are also a kind of data set. That way, you can also regularly manipulate any time series as a regular dataset. Here is my solution:

 library(zoo) (full <- data.table(yrAndMo = as.yearmon(seq(as.Date('2008-01-01'), by = '1 month', length = someLength)))) # the full time horizon that you want to have # yrAndMo # 1: Jan 2008 # 2: Feb 2008 # 3: Mar 2008 # 4: Apr 2008 # 5: May 2008 # --- # 98: Feb 2016 # 99: Mar 2016 # 100: Apr 2016 # 101: May 2016 # 102: Jun 2016 exampleDat # the actually data you want to append to the full time horizon # yrAndMo someValue # 1 Mar 2010 7500 # 2 Jun 2010 1115 # 3 Mar 2011 2726 # 4 Apr 2011 1865 # 5 Nov 2011 1695 # 6 Dec 2012 10000 # 7 Mar 2016 1000 library(plyr) join(full, exampleDat, by = 'yrAndMo', type = "left") # yrAndMo someValue # 1: Jan 2008 NA # 2: Feb 2008 NA # 3: Mar 2008 NA # 4: Apr 2008 NA # 5: May 2008 NA # --- # 98: Feb 2016 NA # 99: Mar 2016 1000 # 100: Apr 2016 NA # 101: May 2016 NA # 102: Jun 2016 NA 

after that, you can easily change the dataset class to any type of time series that you want to have. I preferred read.zoo .

0
source share

All Articles