Get week start date (monday) from date column in python (pandas)?

I saw a lot of posts about how you can do this with a date string, but I'm trying to do something for a dataframe column and still have no luck. My current method: get the day of the week with "myday", and then shift to get Monday.

df['myday'] is column of dates. mydays = pd.DatetimeIndex(df['myday']).weekday df['week_start'] = pd.DatetimeIndex(df['myday']) - pd.DateOffset(days=mydays) 

But I get TypeError: unsupported type for component timedelta days: numpy.ndarray

How can I get the start date of a week from a df column?

+14
python date numpy pandas
source share
4 answers

this fails because pd.DateOffset expects a single integer as a parameter (and you pass it an array). You can use DateOffset only to change the date column with the same offset.

try it:

 import datetime as dt # Change 'myday' to contains dates as datetime objects df['myday'] = pd.to_datetime(df['myday']) # 'daysoffset' will container the weekday, as integers df['daysoffset'] = df['myday'].apply(lambda x: x.weekday()) # We apply, row by row (axis=1) a timedelta operation df['week_start'] = df.apply(lambda x: x['myday'] - dt.TimeDelta(days=x['daysoffset']), axis=1) 

I have not actually tested this code (there were no examples of data), but this should work for what you described.

However, you can look at pandas.Resample , which can provide a better solution - depending on what exactly you are looking for.

+2
source share

Another alternative:

 df['week_start'] = df['myday'].dt.to_period('W').apply(lambda r: r.start_time) 

This will set 'week_start' on the first Monday before the time at 'myday'.

+23
source share

Although the @knightofni and @Paul solutions work, I try to avoid applying apply in Pandas because it is usually quite slow compared to array based methods. To avoid this, we can change the method based on the days of the week and simply convert the day of the week to numpy timedelta64 [D] .

 df['week_start'] = df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]') 

Using my test data with 60,000 dates, I got the following two times, using the other two suggested answers and the cast method.

 %timeit df.apply(lambda x: x['myday'] - datetime.timedelta(days=x['myday'].weekday()), axis=1) >>> 1 loop, best of 3: 7.43 s per loop %timeit df['myday'].dt.to_period('W').apply(lambda r: r.start_time) >>> 1 loop, best of 3: 2.38 s per loop %timeit df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]') >>> 100 loops, best of 3: 12.3 ms per loop 

or almost 200 times faster on my dataset.

+16
source share

(just adding an answer to n8yoder )

Using .astype('timedelta64[D]') seems not so readable to me - I found an alternative using only pandas functionality:

 df['myday'] - pd.to_timedelta(arg=df['myday'].dt.weekday, unit='D') 
+3
source share

All Articles