I have a dataframe with some (hundreds) millions of rows. And I want to efficiently convert the date and time to a timestamp. How can i do this?
My df example:
df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1), end=dt.datetime(2016,1,2,0,0,1), freq='H'))\ .reset_index().rename(columns={'index':'datetime'}) df.head() datetime 0 2016-01-01 00:00:01 1 2016-01-01 01:00:01 2 2016-01-01 02:00:01 3 2016-01-01 03:00:01 4 2016-01-01 04:00:01
Now I convert datetime to a timestamp value-by-value with .apply() , but it takes a very long time (several hours) if I have several (hundreds of millions) rows:
df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int) df.head() datetime ts 0 2016-01-01 00:00:01 1451602801 1 2016-01-01 01:00:01 1451606401 2 2016-01-01 02:00:01 1451610001 3 2016-01-01 03:00:01 1451613601 4 2016-01-01 04:00:01 1451617201
The above result is what I want.
If I try to use .dt accessor pandas.Series , then I get an error:
df['ts'] = df['datetime'].dt.timestamp
AttributeError: DatetimeProperties object does not have a 'Timestamp' attribute
If I try to create, for example. parts of the datetimes date using .dt accessor, then this is much faster than using .apply() :
df['date'] = df['datetime'].dt.date df.head() datetime ts date 0 2016-01-01 00:00:01 1451602801 2016-01-01 1 2016-01-01 01:00:01 1451606401 2016-01-01 2 2016-01-01 02:00:01 1451610001 2016-01-01 3 2016-01-01 03:00:01 1451613601 2016-01-01 4 2016-01-01 04:00:01 1451617201 2016-01-01
I want something similar with timestamps ...
But I really don't understand the official documentation: it talks about " Converting to timestamps" , but I don't see any timestamps there; it just talks about converting to datetime with pd.to_datetime() , but not a timestamp ...
Constructor
pandas.Timestamp also does not work (returns with an error below):
df['ts2'] = pd.Timestamp(df['datetime'])
TypeError: Unable to convert input to timestamp
pandas.Series.to_timestamp also does something completely different that I want:
df['ts3'] = df['datetime'].to_timestamp df.head() datetime ts ts3 0 2016-01-01 00:00:01 1451602801 <bound method Series.to_timestamp of 0 2016... 1 2016-01-01 01:00:01 1451606401 <bound method Series.to_timestamp of 0 2016... 2 2016-01-01 02:00:01 1451610001 <bound method Series.to_timestamp of 0 2016... 3 2016-01-01 03:00:01 1451613601 <bound method Series.to_timestamp of 0 2016... 4 2016-01-01 04:00:01 1451617201 <bound method Series.to_timestamp of 0 2016...
Thanks!