Python Pandas: how can I group and assign an identifier to all elements in a group?

Question

Python Pandas: how can I group and assign an identifier to all elements in a group?

I have a df:

domain           orgid
csyunshu.com    108299
dshu.com        108299
bbbdshu.com     108299
cwakwakmrg.com  121303
ckonkatsunet.com    121303

I would like to add a new column with replacing the domain column with numeric identifiers for orgid:

domain           orgid   domainid
csyunshu.com    108299      1
dshu.com        108299      2
bbbdshu.com     108299      3
cwakwakmrg.com  121303      1
ckonkatsunet.com 121303     2

I already tried this line, but this does not give the result that I want:

df.groupby('orgid').count['domain'].reset_index()

Does anyone help?

+4

python pandas indexing group-by

UserYmY Mar 17 '16 at 14:19

source share

2 answers

You can use LabelEncoder from sklearn.preprocessing, for example:

df["domain"] = LabelEncoder().fit_transform(df.domain)

0

Shahnawaz akhtar Oct 31 '16 at 13:28

source share

Edchum · Accepted Answer · 2016-03-17T14:21:58+0000

You can call rankin the object groupbyand pass the parameter method='first':

In [61]:
df['domainId'] = df.groupby('orgid')['orgid'].rank(method='first')
df

Out[61]:
             domain   orgid  domainId
0      csyunshu.com  108299         1
1          dshu.com  108299         2
2       bbbdshu.com  108299         3
3    cwakwakmrg.com  121303         1
4  ckonkatsunet.com  121303         2

If you want to overwrite the column you can do:

df['domain'] = df.groupby('orgid')['orgid'].rank(method='first')

Python Pandas: how can I group and assign an identifier to all elements in a group?

More articles: