How to select hourly values from a dataset?

Question

How to select hourly values from a dataset?

I need help on this issue:

I have a set of water level values distributed every 30 minutes, but I only need hourly values. I tried using the aggregate() function, but because of the FUN function, this is one of the criteria that defines my analysis as medium or medium, and I don't want to use any stat function.

This is one example of my data frame.

 06/16/2015 02:00:00 0.036068 06/16/2015 02:30:00 0.008916 06/16/2015 03:00:00 -0.008622 06/16/2015 03:30:00 -0.014057 06/16/2015 04:00:00 -0.011172 06/16/2015 04:30:00 0.002401 06/16/2015 05:00:00 0.029632 06/16/2015 05:30:00 0.061902002 06/16/2015 06:00:00 0.087366998 06/16/2015 06:30:00 0.105176002 06/16/2015 07:00:00 0.1153 06/16/2015 07:30:00 0.126197994 06/16/2015 08:00:00 0.144154996

+5

r dataset subset

Fernray May 01, '16 at 15:15

source share

2 answers

 df <- read.table(text = '06/16/2015 02:00:00 0.036068 06/16/2015 02:30:00 0.008916 06/16/2015 03:00:00 -0.008622 06/16/2015 03:30:00 -0.014057 06/16/2015 04:00:00 -0.011172 06/16/2015 04:30:00 0.002401 06/16/2015 05:00:00 0.029632 06/16/2015 05:30:00 0.061902002 06/16/2015 06:00:00 0.087366998 06/16/2015 06:30:00 0.105176002 06/16/2015 07:00:00 0.1153 06/16/2015 07:30:00 0.126197994 06/16/2015 08:00:00 0.144154996') colnames(df) <- c('Date','Time','Value') index <- ifelse(substring(df$Time,4) == "00:00",T,F) final_df <- df[index,]

+3

Kunal puri May 01 '16 at 16:25

source share

akrun · Accepted Answer · 2016-05-01T16:23:40+0000

Convert the "RefDateTimeRef" column to POSIXct , extract the "minute", "second" using format and compare it with 00:00 to return the logical vector that we use for a subset of the rows.

 df1[format(as.POSIXct(df1[,1], format = "%m/%d/%Y %H:%M"), "%M:%S")=="00:00",] # RefDateTimeRef Data #10 04/14/2016 09:00 0.153 #22 04/14/2016 08:00 0.148

Or using lubridate

 library(lubridate) df1[ minute(mdy_hm(df1[,1]))==0,] # RefDateTimeRef Data #10 04/14/2016 09:00 0.153 #22 04/14/2016 08:00 0.148

Or using sub to remove the substring until part of the hour, and then use == to get the logical vector and a subset of the strings.

 df1[ sub(".*\\s+\\S{2}:", "", df1[,1])=="00",]

NOTE. I would recommend using sub or substr , as this can sometimes lead to incorrect answers.

How to select hourly values ​​from a dataset?

More articles:

How to select hourly values from a dataset?