Convert currency from $ to numbers in Python pandas

I have the following data in a pandas dataframe:

state 1st 2nd 3rd 0 California $11,593,820 $109,264,246 $8,496,273 1 New York $10,861,680 $45,336,041 $6,317,300 2 Florida $7,942,848 $69,369,589 $4,697,244 3 Texas $7,536,817 $61,830,712 $5,736,941 

I want to do a simple analysis (e.g. sum, groupby) with three columns (1st, 2nd, 3rd), but the data type of these three columns is an object (or row).

So, I used the following code to convert the data:

 data = data.convert_objects(convert_numeric=True) 

But the conversion is not working, possibly due to the dollar sign. Any suggestion?

+7
python pandas
source share
3 answers
Answer

@EdChum is smart and works well. But since there is more than one way to bake a cake .... why not use a regex? For example:

 df[df.columns[1:]].replace('[\$,]', '', regex=True).astype(float) 

For me it is a little readable.

+12
source share

You can also use locale as follows

 import locale import pandas as pd locale.setlocale(locale.LC_ALL,'') df['1st']=df.1st.map(lambda x: locale.atof(x.strip('$'))) 

Please note that the above code has been tested in Python 3 and Windows

+1
source share

You can use the vectorized str methods to replace unwanted characters, and then apply the type to int:

 In [81]: df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str.replace('$','')).apply(lambda x: x.str.replace(',','')).astype(np.int64) df Out[81]: state 1st 2nd 3rd index 0 California 11593820 109264246 8496273 1 New York 10861680 45336041 6317300 2 Florida 7942848 69369589 4697244 3 Texas 7536817 61830712 5736941 

dtype now confirmed change:

 In [82]: df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 4 columns): state 4 non-null object 1st 4 non-null int64 2nd 4 non-null int64 3rd 4 non-null int64 dtypes: int64(3), object(1) memory usage: 160.0+ bytes 

Another way:

 In [108]: df[df.columns[1:]] = df[df.columns[1:]].apply(lambda x: x.str[1:].str.split(',').str.join('')).astype(np.int64) df Out[108]: state 1st 2nd 3rd index 0 California 11593820 109264246 8496273 1 New York 10861680 45336041 6317300 2 Florida 7942848 69369589 4697244 3 Texas 7536817 61830712 5736941 
0
source share

All Articles