I need to make my code faster. The problem is very simple, but I do not find a good way to do the calculation without looping across the entire DataFrame.
I have three dataFrames: A, B and C.
A and B have 3 columns each and the following format:
A (10 lines):
Canal Gerencia grad 0 'ABC' 'DEF' 23 etc...
B (25 lines):
Marca Formato grad 0 'GHI' 'JKL' 43 etc...
A DataFrame C, on the other hand, has 5 columns:
C (5000 lines):
Marca Formato Canal Gerencia grad 0 'GHI' 'JKL' 'ABC' 'DEF' -102 etc...
I need a vector with the same DataFrame 'C' length that adds grad values ββfrom three tables, for example:
m = 'GHI' f = 'JKL' c = 'ABC' g = 'DEF' res = C['grad'][C['Marca']==m][C['Formato']==f][C['Canal']==c][C['Gerencia']==g] + A['grad'][A['Canal']==c][A['Gerencia']==g] + B['grad'][B['Formato']==f][B['Marca']==m] >>-36
I tried a loop through a C data file, but too slow. I understand that I should try to avoid a loop through a dataFrame, but I donβt know how to do it. My actual code is the following (works, but very slowly):
res=[] for row_index, row in C.iterrows(): vec1 = A['Gerencia']==row['Gerencia'] vec2 = A['Canal']==row['Canal'] vec3 = B['Marca']==row['Marca'] vec4 = B['Formato']==row['Formato'] grad = row['grad'] res.append(grad + sum(A['grad'][vec1][vec2])+ sum(B['grad'][vec3][vec4]))
I would really appreciate any help in making this procedure faster. Thanks!
source share