How to change the group of column headers with their values ​​in Pandas

I have the following data frame:

a1 | a2 | a3 | a4 --------------------- Bob | Cat | Dov | Edd Cat | Dov | Bob | Edd Edd | Cat | Dov | Bob 

and I want to convert it to

 Bob | Cat | Dov | Edd --------------------- a1 | a2 | a3 | a4 a3 | a1 | a2 | a4 a4 | a2 | a3 | a1 

Note that the number of columns is equal to the number of unique values, and the number and order of rows are preserved

+7
python pandas dataframe
source share
3 answers

1) The necessary approach:

A faster implementation would be to sort the values ​​of the data frame and align the columns, respectively, based on the resulting indices after np.argsort .

 pd.DataFrame(df.columns[np.argsort(df.values)], df.index, np.unique(df.values)) 

enter image description here

Using np.argsort gives us the data we are looking for:

 df.columns[np.argsort(df.values)] Out[156]: Index([['a1', 'a2', 'a3', 'a4'], ['a3', 'a1', 'a2', 'a4'], ['a4', 'a2', 'a3', 'a1']], dtype='object') 

2) Slow generalized approach:

A more generalized approach, although at the cost of some speed / efficiency, would be to use apply after creating a dict display of the rows / values ​​present in the data frame, with their corresponding column names.

Use the dataframe constructor later after converting the resulting rows to their list view.

 pd.DataFrame(df.apply(lambda s: dict(zip(pd.Series(s), pd.Series(s).index)), 1).tolist()) 

3) Faster generalized approach:

After receiving the list of dictionaries from df.to_dict + orient='records' we need to change the corresponding key and value pairs, iterate through them in a loop.

 pd.DataFrame([{val:key for key, val in d.items()} for d in df.to_dict('r')]) 

Test Case Example:

 df = df.assign(a5=['Foo', 'Bar', 'Baz']) 

Both of these approaches produce:

enter image description here


@piRSquared EDIT 1

generalized solution

 def nic(df): v = df.values n, m = v.shape u, inv = np.unique(v, return_inverse=1) i = df.index.values c = df.columns.values r = np.empty((n, len(u)), dtype=c.dtype) r[i.repeat(m), inv] = np.tile(c, n) return pd.DataFrame(r, i, u) 

1 I would like to thank user @ piRSquared for coming up with a very fast and generalized alternative based on numpy SOLN.sub>

+9
source share

You can change it using the stack and unfasten it by replacing the values ​​and index:

 df_swap = (df.stack() # reshape the data frame to long format .reset_index(level = 1) # set the index(column headers) as a new column .set_index(0, append=True) # set the values as index .unstack(level=1)) # reshape the data frame to wide format df_swap.columns = df_swap.columns.get_level_values(1) # drop level 0 in the column index df_swap 

enter image description here

+5
source share

numpy + pandas

 v = df.values n, m = v.shape i = df.index.values c = df.columns.values # create series with values that were column values # create multi index with first level from existing index # and second level from flattened existing values # then unstack pd.Series( np.tile(c, n), [i.repeat(m), v.ravel()] ).unstack() Bob Cat Dov Edd 0 a1 a2 a3 a4 1 a3 a1 a2 a4 2 a4 a2 a3 a1 
+1
source share

All Articles