I assign np.nan missing values in the DataFrame column. Then the DataFrame is written to the csv file using to_csv. As a result, the csv file does not correctly have missing values between commas if I open the file with a text editor. But when I read this csv file back into the DataFrame using read_csv, the missing values will become the string 'nan' instead of NaN. As a result, isnull() does not work. For instance:
In [13]: df Out[13]: index value date 0 975 25.35 nan 1 976 26.28 nan 2 977 26.24 nan 3 978 25.76 nan 4 979 26.08 nan In [14]: df.date.isnull() Out[14]: 0 False 1 False 2 False 3 False 4 False
Am I doing something wrong? Should I assign some other values instead of np.nan missing values so that isnull() can pick up?
EDIT: Sorry, forgot to mention that I also set parse_dates = [2] to parse this column. This column contains dates with some rows missing. I would like to have the missing NaN lines.
EIDT: I just found out that the problem is really related to parse_dates. If the date column contains missing values, read_csv will not parse that column. Instead, it will read the dates as a string and assign the string "nan" to null values.
In [21]: data = pd.read_csv('test.csv', parse_dates = [1]) In [22]: data Out[22]: value date id 0 2 2013-3-1 a 1 3 2013-3-1 b 2 4 2013-3-1 c 3 5 nan d 4 6 2013-3-1 d In [23]: data.date[3] Out[23]: 'nan'
pd.to_datetime does not work:
In [12]: data Out[12]: value date id 0 2 2013-3-1 a 1 3 2013-3-1 b 2 4 2013-3-1 c 3 5 nan d 4 6 2013-3-1 d In [13]: data.dtypes Out[13]: value int64 date object id object In [14]: pd.to_datetime(data['date']) Out[14]: 0 2013-3-1 1 2013-3-1 2 2013-3-1 3 nan 4 2013-3-1 Name: date
Is there a way for read_csv parse_dates to work with columns that contain missing values? That is, assign NaN to missing values and still parse valid dates?
source share