Pandas.merge inexplicably slow

The following works fine:

times1h = pandas.DatetimeIndex(start='2010-01-01', end='2014-01-01', freq='1h')
times10min = pandas.DatetimeIndex(start='2010-01-01', end='2014-01-01', freq='10T')
wind=pandas.DataFrame({'wind':0}, index=times1h)
power=pandas.DataFrame({'power':0}, index=times10min)
%timeit pandas.merge(wind, power, how='inner', left_index=True, right_index=True)

100 loops, best of 3: 5.2 ms per loop

The following is inexplicably slow. I just make the timestamps of the first data block not unique and place it as a column, not an index:

times1h = pandas.DatetimeIndex(start='2010-01-01', end='2014-01-01', freq='1h')
times10min = pandas.DatetimeIndex(start='2010-01-01', end='2014-01-01', freq='10T')
wind=pandas.DataFrame({'time':pandas.concat([pandas.Series(times1h),     pandas.Series(times1h)]), 'wind':0})
power=pandas.DataFrame({'power':0}, index=times10min)
%timeit pandas.merge(wind, power, how='inner', left_on='time', right_index=True)

1 loops, best of 3: 16.6 s per loop

Why is it so slow? Can I do anything about this?

I am trying to get a set of points (x, y) for a Power Curve fitting.

I use pandas 0.13.1 because it is included in WinPython :)

+4
source share
1 answer

As Jeff posted in the comments on this, the solution should upgrade from pandas from 0.13.1 to 0.14.1

0
source

All Articles