Differentiate the missing values ​​from the master data in the graph using R

I am creating a dummy timeseries xts xts with missing data as of 2-09-2015 as:

 library(xts) library(ggplot2) library(scales) set.seed(123) seq <- seq(as.POSIXct("2015-09-01"),as.POSIXct("2015-09-02"), by = "1 hour") ob1 <- xts(rnorm(length(seq),150,5),seq) seq2 <- seq(as.POSIXct("2015-09-03"),as.POSIXct("2015-09-05"), by = "1 hour") ob2 <- xts(rnorm(length(seq2),170,5),seq2) final_ob <- rbind(ob1,ob2) plot(final_ob) # with ggplot df <- data.frame(time = index(final_ob), val = coredata(final_ob) ) ggplot(df, aes(time, val)) + geom_line()+ scale_x_datetime(labels = date_format("%Y-%m-%d")) 

After my data looks like this: enter image description here

The red rectangular part represents the date on which data is missing. How should I show that the data were not available that day on the main chart?

I think I should show this missing data in a different color. But I do not know how to process the data to reflect the missing data behavior in the main chart.

+6
source share
1 answer

Thanks for a great reproducible example. I think you better skip this line in your “missing” part. If you have a straight line (even in a different color), this assumes that the data was collected in this interval, which fell on this straight line. If you omit the line in this interval, then it is clear that there is no data.

The problem is that you want the hourly data to be connected by lines, and then there are no lines in the "Missing data" section - so you need to somehow detect this section of missing data.

You did not give criteria for this in your question, therefore, based on your example, I will say that each line on the chart should consist of data at hourly intervals; if there is a break for more than an hour, then there should be a new line. You will have to adapt this criterion to your specific problem. All we do is split your data frame into bits that are displayed on one line.

So, first create a variable that indicates which “group” (ie line) contains the following data:

 df$grp <- factor(c(0, cumsum(diff(df$time) > 1))) 

Then you can use the group= aesthetics that geom_line uses to split lines:

 ggplot(df, aes(time, val)) + geom_line(aes(group=grp)) + # <-- only change scale_x_datetime(labels = date_format("%Y-%m-%d")) 

enter image description here

+7
source

All Articles