Effective pandas frame calculation

I need to make my code faster. The problem is very simple, but I do not find a good way to do the calculation without looping across the entire DataFrame.

I have three dataFrames: A, B and C.

A and B have 3 columns each and the following format:

A (10 lines):

Canal Gerencia grad 0 'ABC' 'DEF' 23 etc... 

B (25 lines):

  Marca Formato grad 0 'GHI' 'JKL' 43 etc... 

A DataFrame C, on the other hand, has 5 columns:

C (5000 lines):

  Marca Formato Canal Gerencia grad 0 'GHI' 'JKL' 'ABC' 'DEF' -102 etc... 

I need a vector with the same DataFrame 'C' length that adds grad values ​​from three tables, for example:

 m = 'GHI' f = 'JKL' c = 'ABC' g = 'DEF' res = C['grad'][C['Marca']==m][C['Formato']==f][C['Canal']==c][C['Gerencia']==g] + A['grad'][A['Canal']==c][A['Gerencia']==g] + B['grad'][B['Formato']==f][B['Marca']==m] >>-36 

I tried a loop through a C data file, but too slow. I understand that I should try to avoid a loop through a dataFrame, but I don’t know how to do it. My actual code is the following (works, but very slowly):

 res=[] for row_index, row in C.iterrows(): vec1 = A['Gerencia']==row['Gerencia'] vec2 = A['Canal']==row['Canal'] vec3 = B['Marca']==row['Marca'] vec4 = B['Formato']==row['Formato'] grad = row['grad'] res.append(grad + sum(A['grad'][vec1][vec2])+ sum(B['grad'][vec3][vec4])) 

I would really appreciate any help in making this procedure faster. Thanks!

+5
source share
1 answer

IIUC, you need to combine C with A :

 C = pd.merge(C, A, on=['Canal', 'Gerencia']) 

(this will add a column to it) and then merge the result with B :

 C = pd.merge(C, B, on=['Marca', 'Formato']) 

(again adding a column to C )

At this point, mark C for column names; They say that they are grad_foo , grad_bar , grad_baz . Therefore just add them

 C.grad_foo + C.grad_bar + C.grad_baz 
+4
source

All Articles