Is the search sorted faster than get_loc to find the location of the label in the DataFrame index?

I need to find the whole place for the label in the Pandas index. I know that I can use the get_loc method, but then I found searchsorted. It's just interesting if the latter should be used to improve speed, since I need to look for thousands of shortcuts.

+2
source share
1 answer

It will depend on your usecase. using the @ayhan example.

With get_loc there is a large initial cost to creating a hash table on the first search.

 In [22]: idx = pd.Index(['R{0:07d}'.format(i) for i in range(10**7)]) In [23]: to_search = np.random.choice(idx, 10**5, replace=False) In [24]: %time idx.get_loc(to_search[0]) Wall time: 1.57 s 

But subsequent searches may be faster. (not guaranteed, data dependent)

 In [9]: %%time ...: for i in to_search: ...: idx.get_loc(i) Wall time: 200 ms In [10]: %%time ...: for i in to_search: ...: np.searchsorted(idx, i) Wall time: 486 ms 

In addition, as Jeff noted, get_loc guaranteed to always work, where searchsorted requires monotony (and does not check).

+4
source

All Articles