Here is one way:
Cut the corresponding columns ( ['Client', 'Month'] ) from the input data block into the NumPy array. This is basically a performance-oriented idea, because in the future we will use NumPy functions that are optimized for working with NumPy arrays.
Convert the data of two columns from ['Client', 'Month'] into one 1D array, which will be its equivalent linear index, treating elements from two columns as pairs. Thus, we can assume that the elements from 'Client' represent row indices, while 'Month' are column indices. This is like going from 2D to 1D . But the question will determine the shape of the 2D mesh to perform such a mapping. To cover all pairs, one safe assumption would be to have a two-dimensional grid that would be larger than the maximum along each column, due to indexing based on P in language 0. Thus, we get linear indices.
Then we mark each linear index based on their uniqueness among others. I think this would match the keys obtained with grouby . We also need to get the counts of each group / unique key along the entire length of this 1D array. Finally, indexing in counts with these tags should display the corresponding values for each element.
That’s the whole idea! Here's the implementation -
# Save relevant columns as a NumPy array for performing NumPy operations afterwards arr_slice = df[['Client', 'Month']].values
Runtime test
1) Define functions:
def original_app(df): df["Nbcontrats"] = df.groupby(['Client', 'Month'])['Contrat'].transform(len) def vectorized_app(df): arr_slice = df[['Client', 'Month']].values lidx = np.ravel_multi_index(arr_slice.T,arr_slice.max(0)+1) unq,unqtags,counts = np.unique(lidx,return_inverse=True,return_counts=True) df["Nbcontrats"] = counts[unqtags]
2) Confirm the results:
In [143]:
3) Finally, the time:
In [145]:
source share