Removing column values ​​that do not meet the requirement

I have a pandas data frame with a column of 'date_of_birth'. Values ​​take the form 1977-10-24T00:00:00.000Z, for example.

I want to capture the year, so I tried the following:

X['date_of_birth'] = X['date_of_birth'].apply(lambda x: int(str(x)[4:]))

This works if I am guaranteed that the first 4 letters are always integers, but in my dataset this does not work, because some dates are messed up or garbage. Is there a way to tweak my lambda without using regex? If not, how can I write this in regex?

+4
source share
1 answer

, to_datetime datetime dtype, , dropna, dt.year:

In [58]:
df = pd.DataFrame({'date':['1977-10-24T00:00:00.000Z', 'duff', '200', '2016-01-01']})
df['mod_dates'] = pd.to_datetime(df['date'], errors='coerce')
df

Out[58]:
                       date  mod_dates
0  1977-10-24T00:00:00.000Z 1977-10-24
1                      duff        NaT
2                       200        NaT
3                2016-01-01 2016-01-01

In [59]:    
df.dropna()

Out[59]:
                       date  mod_dates
0  1977-10-24T00:00:00.000Z 1977-10-24
3                2016-01-01 2016-01-01

In [60]:
df['mod_dates'].dt.year

Out[60]:
0    1977.0
1       NaN
2       NaN
3    2016.0
Name: mod_dates, dtype: float64
+2

All Articles