It will depend on your usecase. using the @ayhan example.
With get_loc there is a large initial cost to creating a hash table on the first search.
In [22]: idx = pd.Index(['R{0:07d}'.format(i) for i in range(10**7)]) In [23]: to_search = np.random.choice(idx, 10**5, replace=False) In [24]: %time idx.get_loc(to_search[0]) Wall time: 1.57 s
But subsequent searches may be faster. (not guaranteed, data dependent)
In [9]: %%time ...: for i in to_search: ...: idx.get_loc(i) Wall time: 200 ms In [10]: %%time ...: for i in to_search: ...: np.searchsorted(idx, i) Wall time: 486 ms
In addition, as Jeff noted, get_loc guaranteed to always work, where searchsorted requires monotony (and does not check).
source share