Pandas DatetimeIndex by MongoDB ISODate

Question

Pandas DatetimeIndex by MongoDB ISODate

I find it difficult to work with time / time zones. I have form JSON source data

{ "Date": "28 Sep 2009 00:00:00", .... }

This data is then loaded into MongoDB, and this string representation of the date is converted to a JavaScript Date object. This conversion to UTC results in the following date

 { "_id": ObjectId("577a788f4439e17afd4e21f7"), "Date": ISODate("2009-09-27T23:00:00Z") }

It “looks” as if the date had really been moved forward to the day, I suppose (possibly wrong) that this is due to the fact that my car is set to Irish standard time .

Then I read this data from MongoDB and used it to create pandas DatetimeIndex

 idx = pd.DatetimeIndex([x['Date'] for x in test_docs], freq='D')

which gives me

which is incorrect, since the time was not correctly converted from UTC to local time. So I implemented the solution given in this answer

 idx = pd.DatetimeIndex([x['Date'] for x in test_docs], freq='D') idx = idx.tz_localize(tz=tz.tzutc()) idx = idx.tz_convert(tz=tz.tzlocal()) frame = DataFrame(test_docs, index=idx) frame = frame.drop('Date', 1)

which gives me the right day ago

Then I normalize DatetimeIndex, so the clock is deleted, which allows me to group all the records per day.

 frame.groupby(idx).sum()

At this point, however, something strange is happening. Dates are ultimately grouped as follows

but this does not reflect the dates in the frame

Can anyone shed some light on where I can go wrong?

Reply to @ptrj

Explicitly use my timezone as a string

 idx = pd.DatetimeIndex([x['Date'] for x in test_docs], freq='D') idx = idx.tz_localize(tz=tz.tzutc()) idx = idx.tz_convert(tz='Europe/Dublin') idx = idx.normalize() frame = DataFrame(test_docs, index=idx) ... ... aggregate = frame.groupby(idx).sum() aggregate.plot()

this does not work for me, it leads to the following schedule

For some reason, groupby is not grouping correctly in 2014, as shown below.

If instead I use

 idx = idx.tz_convert(tz.gettz('Europe/Dublin'))

I get the same problem

Convert to Object

 idx = pd.DatetimeIndex([x['Date'] for x in test_docs], freq='D') idx = idx.tz_localize(tz=tz.tzutc()) idx = idx.tz_convert(tz=tz.tzlocal()) idx = idx.normalize() frame = DataFrame(test_docs, index=idx) aggregate = frame.groupby(idx.astype(object)).sum()

This approach works right for me

+6

python pandas datetime time mongodb

Philip o'brien Jul 05 '16 at 10:57

source share

2 answers

I managed to get around this now by changing my groupby to the next

 frame.groupby([pd.DatetimeIndex([x.date() for x in frame.index])]).sum()

so when i initially tried groupby

 idx = pd.DatetimeIndex([x['Date'] for x in test_docs], freq='D') idx = idx.tz_localize(tz=tz.tzutc()) idx = idx.tz_convert(tz=tz.tzlocal()) frame.groupby(idx).sum()

Now I call the date method for each index element before performing the groupby operation.

I send this as an answer if no one answers, but I hope someone answers and explains what happens, because my "decision" seems too hoarse for my tastes.

0

Philip o'brien Jul 08 '16 at 11:00

source share

ptrj · Accepted Answer · 2016-07-10T20:31:22+0000

I was able to reproduce the error with the following data:

 idx0 = pd.date_range('2011-11-11', periods=4) idx1 = idx0.tz_localize(tz.tzutc()) idx2 = idx1.tz_convert(tz.tzlocal()) df = pd.DataFrame([1, 2, 3, 4]) df.groupby(idx2).sum() Out[20]: 0 1970-01-01 00:00:00-05:00 9 2011-11-10 19:00:00-05:00 1

This is a bug in pandas code, tz.tzlocal() to tz.tzlocal() . This also appears in:

 idx2.tz_localize(None) Out[27]: DatetimeIndex(['2011-11-10 19:00:00', '1970-01-01 00:00:00', '1970-01-01 00:00:00', '1970-01-01 00:00:00'], dtype='datetime64[ns]', freq='D')

You can use any of the following solutions:

use explicit timezone as a string:

 idx2 = idx1.tz_convert(tz='Europe/Dublin') df.groupby(idx2).sum() Out[29]: 0 2011-11-11 00:00:00+00:00 1 2011-11-12 00:00:00+00:00 2 2011-11-13 00:00:00+00:00 3 2011-11-14 00:00:00+00:00 4

or if it does not work:

 idx2 = idx1.tz_convert(tz.gettz('Europe/Dublin'))

convert it to an object:

 df.groupby(idx2.astype(object)).sum() Out[32]: 0 2011-11-10 19:00:00-05:00 1 2011-11-11 19:00:00-05:00 2 2011-11-12 19:00:00-05:00 3 2011-11-13 19:00:00-05:00 4

Basically, converting to something other than DatetimeIndex with tz=tz.local() should work.

EDIT: This bug was only fixed in pandas github. The fix will be available in pandas version 0.19.

Pandas DatetimeIndex by MongoDB ISODate

Reply to @ptrj

Explicitly use my timezone as a string

Convert to Object

More articles: