I have a .csv with the following structure:
date_begin,date_end,name,name_code,active_accounts,transaction_amount,transaction_count 1/1/2008,1/31/2008,Name_1,1001,"123,456","$7,890,123.45","67,890" 2/1/2008,2/29/2008,Name_1,1001,"43,210","$987,654.32","109,876" 3/1/2008,3/31/2008,Name_1,1001,"485,079","$1,265,789,433.98","777,888" ... 12/1/2008,12/31/2008,Name_1,1001,"87,543","$432,098,987","87,987" 1/1/2008,1/31/2008,Name_2,1002,"268,456","$890,123.45","97,890" 2/1/2008,2/29/2008,Name_2,1002,"53,210","$987,654.32","109,876" ... etc
I am trying to read them in pandas using the following code:
import pandas as pd data = pd.read_csv('my_awesome_csv.csv'),parse_dates=[[0,1]], infer_datetime_format=True)
This works fine, except that I would like to control the data type in each column. When I run the following code in the interpreter, I find that the numbers in quotation marks are not recognized as numbers, nor dollars, nor others.
In [10]: data.dtypes Out[10]: date_begin_date_end object name object name_code int64 active_accounts object
I traced the csv documentation in Pandas , but did not find what I was looking for, declaring the types that are the amounts when they are stored as strings with commas and dollar signs in csv. My ultimate goal here is to do some arithmetic operations on the values ββin these columns.
Any thoughts?