Pandas apply function to multiple columns and multiple rows

Question

Pandas apply function to multiple columns and multiple rows

I have a data block with sequential pixel coordinates in the rows and columns "xpos", "ypos", and I want to calculate the angle in degrees of each path between successive pixels. I currently have the solution below that works fine and the size of my file is fast enough, but repeating all the lines doesn't seem to be the pandas way. I know how to apply a function to different columns and how to apply functions to different rows of columns, but I can’t understand how to combine them.

here is my code:

fix_df = pd.read_csv('fixations_out.csv') # wyliczanie kąta sakady temp_list=[] for count, row in df.iterrows(): x1 = row['xpos'] y1 = row['ypos'] try: x2 = df['xpos'].ix[count-1] y2 = df['ypos'].ix[count-1] a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1))) temp_list.append(a) except KeyError: temp_list.append(np.nan)

and then I insert the temp list in df

EDIT: after following the hint from the comment, I have:

 df['diff_x'] = df['xpos'].shift() - df['xpos'] df['diff_y'] = df['ypos'].shift() - df['ypos'] def calc_angle(x): try: a = abs(180/math.pi * math.atan((x.diff_y)/(x.diff_x))) return a except ZeroDivisionError: return 0 df['angle_degrees'] = df.apply(calc_angle, axis=1)

I compared the time of three solutions for my df (the df size is about 6k lines), the iteration is almost 9 times slower than it is applied, and about 1500 times slower than the execution without using:

runtime of the solution with iteration, including inserting a new column back into df: 1.51s

decision execution time without iteration using: 0.17 s

EdChum accepted answer execution time using diff () without iteration and without application: 0.001s

Suggestion: do not use iteration or apply and always try to use vectorized calculation;) it is not only faster, but also more readable.

+7

python pandas

yemu Jun 13 '14 at 9:32

source share

1 answer

Edchum · Accepted Answer · 2014-06-13T09:59:00+0000

You can do this with the following method, and I compared the pandas method to your path, and it is more than 1000 times faster, and this does not add the list back as a new column! This was done on a data frame of 10,000 rows.

 In [108]: %%timeit import numpy as np df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].shift() - df['xpos']/df['ypos'].shift() - df['ypos'])) 1000 loops, best of 3: 1.27 ms per loop In [100]: %%timeit temp_list=[] for count, row in df.iterrows(): x1 = row['xpos'] y1 = row['ypos'] try: x2 = df['xpos'].ix[count-1] y2 = df['ypos'].ix[count-1] a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1))) temp_list.append(a) except KeyError: temp_list.append(np.nan) 1 loops, best of 3: 1.29 s per loop

Also, if possible, avoid using apply , as it works differently, if you can find a vector method that can work on the entire series or in a data frame, then always prefer this.

UPDATE

seeing that you are simply subtracting from the previous line, the built-in method for this diff leads to even faster code:

 In [117]: %%timeit import numpy as np df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1)/df['ypos'].diff(1))) 1000 loops, best of 3: 1.01 ms per loop

Another update

There is also a build method for separating series and data, now it saves more time, and I get sub-1 ms time:

 In [9]: %%timeit import numpy as np df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1).div(df['ypos'].diff(1)))) 1000 loops, best of 3: 951 µs per loop

Pandas apply function to multiple columns and multiple rows

More articles: