Search for sequences in a NumPy array

Let's say I have the following array:

array([2, 0, 0, 1, 0, 1, 0, 0]) 

How to get indices in which I meet a sequence of values: [0,0] ? Thus, the expected result for such a case would be: [1,2,6,7] .

Edit:

1) Note that [0,0] is just a sequence. It can be [0,0,0] or [4,6,8,9] or [5,2,0] , just now.

2) If my array was changed to: array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0]) , the expected result with the same sequence [0,0] would be [1,2,3,4,8,9] .

I am looking for some NumPy shortcuts.

+1
source share
3 answers

Well, this is basically a template-matching problem with a template-matching problem occurs when processing images. This post lists two approaches: based on Pure NumPy and based on OpenCV (cv2).

Approach No. 1. Using NumPy, you can create a 2D array of moving indexes along the entire length of the input array. Thus, each row will be a sliding window of the elements. Then map each line to the input sequence, which will result in broadcasting for the vectorized solution. We look for all True strings pointing to those that are perfect matches and thus will be the starting index of matches. Finally, using these indexes, create a range of indexes that extends to the length of the sequence to obtain the desired result. The implementation will be -

 def search_sequence_numpy(arr,seq): """ Find sequence in an array using NumPy only. Parameters ---------- arr : input 1D array seq : input 1D array Output ------ Output : 1D Array of indices in the input array that satisfy the matching of input sequence in the input array. In case of no match, an empty list is returned. """ # Store sizes of input array and sequence Na, Nseq = arr.size, seq.size # Range of sequence r_seq = np.arange(Nseq) # Create a 2D array of sliding indices across the entire length of input array. # Match up with the input sequence & get the matching starting indices. M = (arr[np.arange(Na-Nseq+1)[:,None] + r_seq] == seq).all(1) # Get the range of those indices as final output if M.any() >0: return np.where(np.convolve(M,np.ones((Nseq),dtype=int))>0)[0] else: return [] # No match found 

Approach No. 2. In OpenCV (cv2) we have a built-in function for template-matching : cv2.matchTemplate . Using this, we would get initial match indices. The remaining steps will be the same as for the previous approach. Here is the implementation with cv2 :

 from cv2 import matchTemplate as cv2m def search_sequence_cv2(arr,seq): """ Find sequence in an array using cv2. """ # Run a template match with input sequence as the template across # the entire length of the input array and get scores. S = cv2m(arr.astype('uint8'),seq.astype('uint8'),cv2.TM_SQDIFF) # Now, with floating point array cases, the matching scores might not be # exactly zeros, but would be very small numbers as compared to others. # So, for that use a very small to be used to threshold the scorees # against and decide for matches. thresh = 1e-5 # Would depend on elements in seq. So, be careful setting this. # Find the matching indices idx = np.where(S.ravel() < thresh)[0] # Get the range of those indices as final output if len(idx)>0: return np.unique((idx[:,None] + np.arange(seq.size)).ravel()) else: return [] # No match found 

Trial run

 In [512]: arr = np.array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0]) In [513]: seq = np.array([0,0]) In [514]: search_sequence_numpy(arr,seq) Out[514]: array([1, 2, 3, 4, 8, 9]) In [515]: search_sequence_cv2(arr,seq) Out[515]: array([1, 2, 3, 4, 8, 9]) 

Runtime test

 In [477]: arr = np.random.randint(0,9,(100000)) ...: seq = np.array([3,6,8,4]) ...: In [478]: np.allclose(search_sequence_numpy(arr,seq),search_sequence_cv2(arr,seq)) Out[478]: True In [479]: %timeit search_sequence_numpy(arr,seq) 100 loops, best of 3: 11.8 ms per loop In [480]: %timeit search_sequence_cv2(arr,seq) 10 loops, best of 3: 20.6 ms per loop 

Pure NumPy seems to be the safest and fastest!

+10
source

I think it should fit your requirements.

 def search(seq, lst): ls, ll = len(seq), len(lst) r = range(ls) res = [] i = 0 while i<ll-ls: if lst[i:i+ls]!=seq: i = i+1 else: res = res + [i+j for j in r] # concatenate list of indices and new indices i = i+ls # advance index after the match return res 
0
source

Here is one way to do this with a gateway.

 def find_seq(a_list, val): seqs = [] for i, item in enumerate(a_list[:-1]): if item == val and a_list[i + 1] == val: #or you could append a tuple: seqs.append((i, i+1)) seqs.append(i) seqs.append(i + 1) return seqs print(find_seq([2, 0, 0, 1, 0, 1, 0, 0], 0)) 

Result:

 [1, 2, 6, 7] 
-1
source

All Articles