Well, this is basically a template-matching problem with a template-matching problem occurs when processing images. This post lists two approaches: based on Pure NumPy and based on OpenCV (cv2).
Approach No. 1. Using NumPy, you can create a 2D array of moving indexes along the entire length of the input array. Thus, each row will be a sliding window of the elements. Then map each line to the input sequence, which will result in broadcasting for the vectorized solution. We look for all True strings pointing to those that are perfect matches and thus will be the starting index of matches. Finally, using these indexes, create a range of indexes that extends to the length of the sequence to obtain the desired result. The implementation will be -
def search_sequence_numpy(arr,seq): """ Find sequence in an array using NumPy only. Parameters ---------- arr : input 1D array seq : input 1D array Output ------ Output : 1D Array of indices in the input array that satisfy the matching of input sequence in the input array. In case of no match, an empty list is returned. """
Approach No. 2. In OpenCV (cv2) we have a built-in function for template-matching : cv2.matchTemplate . Using this, we would get initial match indices. The remaining steps will be the same as for the previous approach. Here is the implementation with cv2 :
from cv2 import matchTemplate as cv2m def search_sequence_cv2(arr,seq): """ Find sequence in an array using cv2. """
Trial run
In [512]: arr = np.array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0]) In [513]: seq = np.array([0,0]) In [514]: search_sequence_numpy(arr,seq) Out[514]: array([1, 2, 3, 4, 8, 9]) In [515]: search_sequence_cv2(arr,seq) Out[515]: array([1, 2, 3, 4, 8, 9])
Runtime test
In [477]: arr = np.random.randint(0,9,(100000)) ...: seq = np.array([3,6,8,4]) ...: In [478]: np.allclose(search_sequence_numpy(arr,seq),search_sequence_cv2(arr,seq)) Out[478]: True In [479]: %timeit search_sequence_numpy(arr,seq) 100 loops, best of 3: 11.8 ms per loop In [480]: %timeit search_sequence_cv2(arr,seq) 10 loops, best of 3: 20.6 ms per loop
Pure NumPy seems to be the safest and fastest!