I have a Pandas DataFrame that contains several string values. I want to replace them with integer values ββin order to calculate the similarities. For instance:
stores[['CNPJ_Store_Code','region','total_facings']].head() Out[24]: CNPJ_Store_Code region total_facings 1 93209765046613 Geo RS/SC 1.471690 16 93209765046290 Geo RS/SC 1.385636 19 93209765044084 Geo PR/SPI 0.217054 21 93209765044831 Geo RS/SC 0.804633 23 93209765045218 Geo PR/SPI 0.708165
and I want to replace the region == 'Geo RS / SC' ==> 1, the region == 'Geo PR / SPI' ==> 2, etc.
Clarification: I want to make a replacement automatically without first creating a dictionary, since I do not know in advance what my regions will be. Any ideas? I am trying to use DictVectorizer, without success.
I am sure there is a way to do this in a reasonable way, but I just can't find it.
Is anyone familiar with the solution?
source share