Python pandas convert datetime to timestamp via dt accessor

I have a dataframe with some (hundreds) millions of rows. And I want to efficiently convert the date and time to a timestamp. How can i do this?

My df example:

 df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1), end=dt.datetime(2016,1,2,0,0,1), freq='H'))\ .reset_index().rename(columns={'index':'datetime'}) df.head() datetime 0 2016-01-01 00:00:01 1 2016-01-01 01:00:01 2 2016-01-01 02:00:01 3 2016-01-01 03:00:01 4 2016-01-01 04:00:01 

Now I convert datetime to a timestamp value-by-value with .apply() , but it takes a very long time (several hours) if I have several (hundreds of millions) rows:

 df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int) df.head() datetime ts 0 2016-01-01 00:00:01 1451602801 1 2016-01-01 01:00:01 1451606401 2 2016-01-01 02:00:01 1451610001 3 2016-01-01 03:00:01 1451613601 4 2016-01-01 04:00:01 1451617201 

The above result is what I want.

If I try to use .dt accessor pandas.Series , then I get an error:

 df['ts'] = df['datetime'].dt.timestamp 

AttributeError: DatetimeProperties object does not have a 'Timestamp' attribute

If I try to create, for example. parts of the datetimes date using .dt accessor, then this is much faster than using .apply() :

 df['date'] = df['datetime'].dt.date df.head() datetime ts date 0 2016-01-01 00:00:01 1451602801 2016-01-01 1 2016-01-01 01:00:01 1451606401 2016-01-01 2 2016-01-01 02:00:01 1451610001 2016-01-01 3 2016-01-01 03:00:01 1451613601 2016-01-01 4 2016-01-01 04:00:01 1451617201 2016-01-01 

I want something similar with timestamps ...

But I really don't understand the official documentation: it talks about " Converting to timestamps" , but I don't see any timestamps there; it just talks about converting to datetime with pd.to_datetime() , but not a timestamp ...

Constructor

pandas.Timestamp also does not work (returns with an error below):

 df['ts2'] = pd.Timestamp(df['datetime']) 

TypeError: Unable to convert input to timestamp

pandas.Series.to_timestamp also does something completely different that I want:

 df['ts3'] = df['datetime'].to_timestamp df.head() datetime ts ts3 0 2016-01-01 00:00:01 1451602801 <bound method Series.to_timestamp of 0 2016... 1 2016-01-01 01:00:01 1451606401 <bound method Series.to_timestamp of 0 2016... 2 2016-01-01 02:00:01 1451610001 <bound method Series.to_timestamp of 0 2016... 3 2016-01-01 03:00:01 1451613601 <bound method Series.to_timestamp of 0 2016... 4 2016-01-01 04:00:01 1451617201 <bound method Series.to_timestamp of 0 2016... 

Thanks!

+18
python pandas datetime timestamp
source share
4 answers

It seems to me that you need to first convert the numpy array to values and make int64 output in ns , so you need to divide by 10 ** 9 :

 df['ts'] = df.datetime.values.astype(np.int64) // 10 ** 9 print (df) datetime ts 0 2016-01-01 00:00:01 1451606401 1 2016-01-01 01:00:01 1451610001 2 2016-01-01 02:00:01 1451613601 3 2016-01-01 03:00:01 1451617201 4 2016-01-01 04:00:01 1451620801 5 2016-01-01 05:00:01 1451624401 6 2016-01-01 06:00:01 1451628001 7 2016-01-01 07:00:01 1451631601 8 2016-01-01 08:00:01 1451635201 9 2016-01-01 09:00:01 1451638801 10 2016-01-01 10:00:01 1451642401 11 2016-01-01 11:00:01 1451646001 12 2016-01-01 12:00:01 1451649601 13 2016-01-01 13:00:01 1451653201 14 2016-01-01 14:00:01 1451656801 15 2016-01-01 15:00:01 1451660401 16 2016-01-01 16:00:01 1451664001 17 2016-01-01 17:00:01 1451667601 18 2016-01-01 18:00:01 1451671201 19 2016-01-01 19:00:01 1451674801 20 2016-01-01 20:00:01 1451678401 21 2016-01-01 21:00:01 1451682001 22 2016-01-01 22:00:01 1451685601 23 2016-01-01 23:00:01 1451689201 24 2016-01-02 00:00:01 1451692801 

to_timestamp used to convert from period to datetime index .

+22
source share

There is also another method for this, using the "hidden" DatetimeIndex attribute named asi8 , which creates an integer timestamp.

pd.DatetimeIndex(df.datetime).asi8

Wes McKinney suggested this on this related stack overflow issue

+2
source share

If you do not want to use numpy, you can use pure panda transforms

 df['ts'] = pd.to_timedelta(df['datetime'], unit='ns').dt.total_seconds().astype(int) 
+1
source share

I think you should not use apply, just astype will be fine:

 df['ts'] = df.datetime.astype('int64') // 10**9 
0
source share

All Articles