Python - Pandas - Dataframe: Conditional conditional column offset

I am trying to do a dataframe transformation that I cannot solve. I tried several approaches from stackoverflow and the pandas documentation: apply, apply (lambda: ...), swaps and unions. There are too many attempts to list here, but not sure which approach is best, or maybe I tried the right approach with the wrong syntax.

Basically, I have a data framework, and I need 1) to shift the columns, 2) the number of columns to be shifted varies and depends on the variable in the data frame, 3) creates columns at the end of the data frame where you need to place the offset and 4) place zeros to newly created intervals.

df1 = pd.DataFrame({'first' : ['John', 'Mary', 'Larry', 'jerry'], '1' : [5.5, 6.0,10,20], '2' : [100, 200, 300, 400], '3' : [150, 100, 240, 110], 'offset' : ([1,0,2,1])}) goal_df = pd.DataFrame({'first' : ['John', 'Mary', 'Larry', 'jerry'], '1' : [0.0, 6.0, 0.0, 0], '2' : [5.5, 200, 0.0, 20], '3' : [100, 100, 10, 400], '4' : [150, 0.0, 300, 110], '5' : [0.0, 0.0, 240, 0.0]}) df1 1 2 3 first offset 5.5 100 150 John 1 6.0 200 100 Mary 0 10.0 300 240 Larry 2 20.0 400 110 jerry 1 goal_df 1 2 3 4 5 first 0 5.5 100 150 0 John 6 200.0 100 0 0 Mary 0 0.0 10 300 240 Larry 0 20.0 400 110 0 jerry 

This dataset will have c. 500 lines and c. 120 columns. The amount of bias will be very close to 0-12. I thought about this with the basic functions of Python, but I also found that a complex and temporary consumer of the program would win in the ultimate goal, which is to remove some tasks performed in Microsoft Excel.

I complain a lot about how Excel is inferior to such large tasks, but it seems that the current offset () function in Excel outperforms this very simple use, but with thousands of formulas, very slowly. I sold my workplace on the advantages of Python over Excel, and this is my first real test, so speed is very important to me because I try to convince my colleagues that Python can gobble up this table much faster than the current limit the file weighs in file size 96 MB in size.

I came pretty close to the melt () function, and then took the old column numbers and added the offset to them. However, I had a lot of problems trying to reform the dataframe with a hinge. No luck with the application or application (lambda)!

Thanks for any help anyone can give!

+5
source share
1 answer

This is not particularly elegant or concise, but should do the trick. It is a little easier for me to shuffle columns in numpy (should also be a little faster), so I first convert from a data array to an array.

 arr = df1.values[:,:-2] # just the numbers offset = df1.values[:,-1] # just the offsets column_pad = 2 arr2 = np.zeros( (arr.shape[0],arr.shape[1]+column_pad) ) 

Here's a key code that simply shifts each line by the amount of the offset.

 for i, j in enumerate(offset): arr2[i,j:3+j] = arr[i] array([[ 0. , 5.5, 100. , 150. , 0. ], [ 6. , 200. , 100. , 0. , 0. ], [ 0. , 0. , 10. , 300. , 240. ], [ 0. , 20. , 400. , 110. , 0. ]]) 

Other than that, just a little manual labor, add space for the columns and place them in the correct order.

 df2 = df1.copy() last_column = 6 for i in map(str,range(3,last_column)): df2[i] = 0 df2 = df2[ map(str,range(1,last_column))+['first','offset']] 

Then load arr2 into df2.

 df2.loc[:,'1':'5'] = arr2 1 2 3 4 5 first offset 0 0 5.5 100 150 0 John 1 1 6 200.0 100 0 0 Mary 0 2 0 0.0 10 300 240 Larry 2 3 0 20.0 400 110 0 jerry 1 
+3
source

All Articles