For performance, you might be better off working with a NumPy array and using np.where -
a = df.values
Runtime test
def numpy_based(df): a = df.values
Dates -
In [271]: df = pd.DataFrame(np.random.randint(0,9,(10000,2)),columns=[['A','B']]) In [272]: %timeit numpy_based(df) 1000 loops, best of 3: 380 ยตs per loop In [273]: df = pd.DataFrame(np.random.randint(0,9,(10000,2)),columns=[['A','B']]) In [274]: %timeit df['C'] = df.A.where(df.B.gt(5), df[['A', 'B']].prod(1).mul(.1)) 100 loops, best of 3: 3.39 ms per loop In [275]: df = pd.DataFrame(np.random.randint(0,9,(10000,2)),columns=[['A','B']]) In [276]: %timeit df['C'] = np.where(df['B'] > 5, df['A'], 0.1 * df['A'] * df['B']) 1000 loops, best of 3: 1.12 ms per loop In [277]: df = pd.DataFrame(np.random.randint(0,9,(10000,2)),columns=[['A','B']]) In [278]: %timeit df['C'] = np.where(df.B > 5, df.A, df.A.mul(df.B).mul(.1)) 1000 loops, best of 3: 1.19 ms per loop
Look closer
Let's take a closer look at the NumPy crunching function and compare it with pandas in a mix -
Case # 1: Working with a NumPy Array and Using numpy.where:
In [292]: %timeit np.where(a[:,1]>5,a[:,0],0.1*a[:,0]*a[:,1]) 10000 loops, best of 3: 86.5 ยตs per loop
Again, assignment to a new column: df['C'] also not very expensive -
In [300]: %timeit df['C'] = np.where(a[:,1]>5,a[:,0],0.1*a[:,0]*a[:,1]) 1000 loops, best of 3: 323 ยตs per loop
Case # 2: Work with pandas data framework and use its .where method (no NumPy)
In [293]: %timeit df.A.where(df.B.gt(5), df[['A', 'B']].prod(1).mul(.1)) 100 loops, best of 3: 3.4 ms per loop
Case # 3: Working with a pandas data file (without a NumPy array), but use numpy.where -
In [294]: %timeit np.where(df['B'] > 5, df['A'], 0.1 * df['A'] * df['B']) 1000 loops, best of 3: 764 ยตs per loop
Case # 4: work again with pandas dataframe (no NumPy array), but use numpy.where -
In [295]: %timeit np.where(df.B > 5, df.A, df.A.mul(df.B).mul(.1)) 1000 loops, best of 3: 830 ยตs per loop