I am trying to get daily average values from an irregular time series from a csv file.
Data in the csv file starts at 13:00 on September 20, 2013 and lasts until 10:57 on January 14, 2014:
Time Values
20/09/2013 13:00 5.133540
20/09/2013 13:01 5.144993
20/09/2013 13:02 5.158208
20/09/2013 13:03 5.170542
20/09/2013 13:04 5.167899 20/09/2013 13:25 5.168780
20/09/2013 13:26 5.179351
...
I import them using
import pandas as pd
data = pd.read_csv('<file name>', parse_dates={'Timestamp':'Time']},index_col='Timestamp')
The result is
Values
Timestamp
2013-09-20 13:00:00 5.133540
2013-09-20 13:01:00 5.144993
2013-09-20 13:02:00 5.158208
2013-09-20 13:03:00 5.170542
2013-09-20 13:04:00 5.167899
2013-09-20 13:25:00 5.168780
2013-09-20 13:26:00 5.179351
...
And then I do
dataDailyAv = data.resample('D', how = 'mean')
The result is
Values
Timestamp
2013-01-10 8.623744
2013-01-11 NaN
2013-01-12 NaN
2013-01-13 NaN
2013-01-14 NaN
...
In other words, the result contains dates that are not displayed in the source data, and for some of these dates (for example, January 10, 2013), a value even appears.
Any ideas on what's going wrong?
Thanks.
Edit: Apparently, something went wrong with parsing the date: 10/01/2013 is interpreted as January 10, 2013 instead of October 1, 2013. This can be solved by editing the date format in the csv file, but is there a way to specify the date format in read_csv?