How to handle column names and create new columns

This is my pandas DataFramewith original column names.

old_dt_cm1_tt   old_dm_cm1   old_rr_cm2_epf   old_gt
1               3            0                0
2               1            1                5
  • First, I want to extract all the unique options cm, for example. in this case cm1and cm2.
  • After that, I want to create a new column for each unique one cm. There should be two new columns in this example.
  • Finally, in each new column, I have to store the total number of non-zero initial values ​​of the column, i.e.
old_dt_cm1_tt   old_dm_cm1   old_rr_cm2_epf   old_gt    cm1    cm2    
1               3            0                0         2      0        
2               1            1                5         2      1

I performed the first step as follows:

cols = pd.DataFrame(list(df.columns))
ind = [c for c in df.columns if 'cm' in c]
df.ix[:, ind].columns

How to follow steps 2 and 3 so that the solution is automatic (I don’t want to manually determine the column names cm1and cm2, because in the original dataset I can have many options cm.

+4
2

:

print df
   old_dt_cm1_tt  old_dm_cm1  old_rr_cm2_epf  old_gt
0              1           3               0       0
1              2           1               1       5

filter cm, cm .

df1 = df.filter(regex='cm')

, cm1, cm2, cm3.

print [cm for c in df1.columns for cm in c.split('_') if cm[:2] == 'cm']
['cm1', 'cm1', 'cm2']

df1.columns = [cm for c in df1.columns for cm in c.split('_') if cm[:2] == 'cm']
print df1
   cm1  cm1  cm2
0    1    3    0
1    2    1    1

- df1 boolean DataFrame sum - True 1 False 0. , groupby sum.

df1 = df1.astype(bool)
print df1
    cm1   cm1    cm2
0  True  True  False
1  True  True   True

print df1.groupby(df1.columns, axis=1).sum()
   cm1  cm2
0    2    0
1    2    1

unique , df:

print df1.columns.unique()
['cm1' 'cm2']

, df[['cm1','cm2']] groupby :

df[df1.columns.unique()] = df1.groupby(df1.columns, axis=1).sum()
print df
   old_dt_cm1_tt  old_dm_cm1  old_rr_cm2_epf  old_gt  cm1  cm2
0              1           3               0       0    2    0
1              2           1               1       5    2    1
+2

, cm, ( dict) :

col_map = {c:'cm'+c[c.index('cm') + len('cm')] for c in ind}
                                   #   ^ if you are hard coding this in you might as well use 2

cm cm , , :

{'old_dm_cm1': 'cm1', 'old_dt_cm1_tt': 'cm1', 'old_rr_cm2_epf': 'cm2'}

DataFrame, dict:

for col,new_col in col_map.items():
    if new_col not in df:
        df[new_col] =[int(a!=0) for a in df[col]]
    else:
        df[new_col]+=[int(a!=0) for a in df[col]]

, int(a!=0) 0, 0 1 . , dict , : (, )

import operator

for col,new_col in sorted(col_map.items(),key=operator.itemgetter(1)):
    if new_col in df:
        df[new_col]+=[int(a!=0) for a in df[col]]
    else:
        df[new_col] =[int(a!=0) for a in df[col]]

.

0

All Articles