Trying to remove comma and dollar icons using Pandas in Python

Perform removal of comma and dollar icons from columns. But when I do this, the table prints them and is still there. Is there any other way to remove the commans and dollars signs using the pandas function. I was not able to find anything in the API Docs or maybe I was looking for the wrong place

import pandas as pd import pandas_datareader.data as web players = pd.read_html('http://www.usatoday.com/sports/mlb/salaries/2013/player/p/') df1 = pd.DataFrame(players[0]) df1.drop(df1.columns[[0,3,4, 5, 6]], axis=1, inplace=True) df1.columns = ['Player', 'Team', 'Avg_Annual'] df1['Avg_Annual'] = df1['Avg_Annual'].replace(',', '') print (df1.head(10)) 
+8
python pandas
source share
3 answers

You need to access the str attribute for http://pandas.pydata.org/pandas-docs/stable/text.html

 df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '') df1['Avg_Annual'] = df1['Avg_Annual'].str.replace('$', '') df1['Avg_Annual'] = df1['Avg_Annual'].astype(int) 
+18
source share

Shamelessly stolen from this answer ... but this answer only concerns the change of one character and does not end the cool: since it accepts a dictionary, you can replace any number of characters at once, as well as in any number of columns.

 # if you want to operate on multiple columns, put them in a list like so: cols = ['col1', 'col2', ..., 'colN'] # pass them to df.replace(), specifying each char and it replacement: df[cols] = df[cols].replace({'\$': '', ',': ''}, regex=True) 

@shivsn caught that you need to use regex=True ; You already knew about the replacement (but also did not show attempts to use it on several columns or dollar signs and commas at the same time).

This answer simply describes the details that I found in others in one place for people like me (e.g. noobs to python a pandas ). Hope this will be helpful.

+3
source share

@bernie answers the question about your problem. Here I take on the general task of loading numerical data into pandas.

Often data sources are reports created for direct consumption. Therefore, the presence of additional formatting, such as % , thousands separator, currency symbols, etc. All this is useful for reading, but causes problems for the parser by default. My solution is to put the column in a row, replace these characters one by one, and then return it to the corresponding number formats. Having a template function that only stores [0-9.] Is enticing, but causes problems when thousands of separators and decimal numbers are reversed, as well as in the case of scientific notation. Here is my code that I enter into the function and apply if necessary.

 df[col] = df[col].astype(str) # cast to string # all the string surgery goes in here df[col] = df[col].replace('$', '') df[col] = df[col].replace(',', '') # assuming ',' is the thousand separator in your locale df[col] = df[col].replace('%', '') df[col] = df[col].astype(float) # cast back to appropriate type 
0
source share

All Articles