Pandas is much slower than numpy?

The code below assumes that pandas can be much slower than numpy, at least in the specific case of the clip () function. Surprisingly, when performing calculations in numpy, making the circuit from pandas to numpy and back to pandas is still much faster than in pandas.

Should pandas function be implemented in this workaround?

In [49]: arr = np.random.randn(1000, 1000) In [50]: df=pd.DataFrame(arr) In [51]: %timeit np.clip(arr, 0, None) 100 loops, best of 3: 8.18 ms per loop In [52]: %timeit df.clip_lower(0) 1 loops, best of 3: 344 ms per loop In [53]: %timeit pd.DataFrame(np.clip(df.values, 0, None)) 100 loops, best of 3: 8.4 ms per loop 
+7
python numpy pandas
source share
1 answer

In master / 0.13 (release it very soon), it is much faster (still a bit slower than native numpy due to alignment processing / dtype / nans).

At 0.12, it was applied for each column, so it was a relatively expensive operation.

 In [4]: arr = np.random.randn(1000, 1000) In [5]: df=pd.DataFrame(arr) In [6]: %timeit np.clip(arr, 0, None) 100 loops, best of 3: 6.62 ms per loop In [7]: %timeit df.clip_lower(0) 100 loops, best of 3: 12.9 ms per loop 
+10
source share

All Articles