@bernie answers the question about your problem. Here I take on the general task of loading numerical data into pandas.
Often data sources are reports created for direct consumption. Therefore, the presence of additional formatting, such as % , thousands separator, currency symbols, etc. All this is useful for reading, but causes problems for the parser by default. My solution is to put the column in a row, replace these characters one by one, and then return it to the corresponding number formats. Having a template function that only stores [0-9.] Is enticing, but causes problems when thousands of separators and decimal numbers are reversed, as well as in the case of scientific notation. Here is my code that I enter into the function and apply if necessary.
df[col] = df[col].astype(str) # cast to string # all the string surgery goes in here df[col] = df[col].replace('$', '') df[col] = df[col].replace(',', '') # assuming ',' is the thousand separator in your locale df[col] = df[col].replace('%', '') df[col] = df[col].astype(float) # cast back to appropriate type
Bigyan
source share