Unexpected result with index .ix with list by range

Can someone explain this behavior to me?

import pandas as pd dates = pd.date_range('1/1/2000', periods=8) df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) df.ix['2000-01-01':'2000-01-02', ['A', 'C']] ## Output: AC 2000-01-01 0.224944 -0.689382 2000-01-02 -0.824735 -0.805512 df.ix[['2000-01-01', '2000-01-02'], ['A', 'C']] ## Output: AC 2000-01-01 NaN NaN 2000-01-02 NaN NaN 

I expected both indexing operations to return the same (first) result.

Then I got it:

 from datetime import datetime df.loc[[datetime(2000, 1, 1), datetime(2000, 1, 5)], ['A','C']] ## Output AC 2000-01-01 0.224944 -0.689382 2000-01-05 -0.393747 0.462126 

Now I do not know the internal elements of pandas and why it implicitly converts strings to dates for a given range, but not for a given list, but I assume that the range makes it clear that we mean an object with an ordinal character, so pandas may check index, sees that this time already analyzes strings as dates.

But then the question arises, why does it do the right thing when we deliver one line?

 df.loc['2000-01-01', ['A','C']] ## Output: A 0.224944 C -0.689382 Name: 2000-01-01 00:00:00, dtype: float64 

Is this a performance issue when trying to convert multiple values ​​while providing a list? Some other design decisions?

+6
source share
1 answer

Access to DatetimeIndex with strings is kind of hacked (because R does it there, but it's easy to find some faceted cases like this). I.e:

  • It works for fragments.
  • It works for single access.
  • it may work for some other cases, but I will not count on it.

It is much better to use Timestamps instead of strings:

 In [11]: df.ix[pd.Timestamp('2000-01-01'), ['A','C']] Out[11]: A 0.480959 C 0.468689 Name: 2000-01-01 00:00:00, dtype: float64 In [12]: df.ix[pd.Timestamp('2000-01-01'):pd.Timestamp('2000-01-02'), ['A','C']] Out[12]: AC 2000-01-01 0.480959 0.468689 2000-01-02 -0.971965 -0.840954 In [13]: df.ix[[pd.Timestamp('2000-01-01'), pd.Timestamp('2000-01-02')], ['A', 'C']] Out[13]: AC 2000-01-01 0.480959 0.468689 2000-01-02 -0.971965 -0.840954 In [14]: df.ix[pd.to_datetime(['2000-01-01', '2000-01-02']), ['A', 'C']] Out[14]: AC 2000-01-01 0.480959 0.468689 2000-01-02 -0.971965 -0.840954 

As already mentioned in your answer, this is a little cleaner (although there is no ambiguity in this case) as .loc , not .ix .

+1
source

All Articles