Multiple correlation - vectorization

I have a large number of cross-correlations to calculate, and I'm looking for the fastest way to do this. I assume that vectorizing the problem will help, rather than do it with loops

I have a three-dimensional array labeled xxxx (x: xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx I want to calculate the maximum cross-correlation of each time point electrode for each.

In particular: for each test, I want to take each pair of electrodes and calculate the maximum cross-correlation value for each pair. This will result in 4096 (64 * 64) maximum cross-correlation values ​​per line / vector. This will be done for each test, stacking each of the lines / vectors on top of each other, getting the final 2D array of form 913 * 4096 containing the maximum cross-correlation values

This is a lot of calculations, but I want to try to find the fastest way to do this. I mocked some protocode using lists as containers that might help explain the problem a little better. There may be some logical errors, but in any case, the code does not run on my computer, because there is so much to calculate that python just hangs. Here he is:

#allData is a 64x256x913 array all_experiment_trials = [] for trial in range(allData.shape[2]): all_trial_electrodes = [] for electrode in range(allData.shape[0]): for other_electrode in range(allData.shape[0]): if electrode == other_electrode: pass else: single_xcorr = max(np.correlate(allData[electrode,:,trial], allData[other_electrode,:,trial], "full")) all_trial_electrodes.append(single_xcorr) all_experiment_trials.append(all_trial_electrodes) 

Obviously, for this type of thing, loops are very slow. Is there a vectorized solution for this using numpy arrays?

I checked things like correlate2d () and the like, but I don’t think they really work in my case, since I do not multiply 2 matrices together

+5
source share
1 answer

Here is one vector approach based on np.einsum -

 def vectorized_approach(allData): # Get shape M,N,R = allData.shape # Valid mask based on condition: "if electrode == other_electrode" valid_mask = np.mod(np.arange(M*M),M+1)!=0 # Elementwise multiplications across all elements in axis=0, # and then summation along axis=1 out = np.einsum('ijkl,ijkl->lij',allData[:,None,:,:],allData[None,:,:,:]) # Use valid mask to skip columns and have the final output return out.reshape(R,-1)[:,valid_mask] 

Verification of performance and verification of results -

 In [10]: allData = np.random.rand(20,80,200) In [11]: def org_approach(allData): ...: all_experiment_trials = [] ...: for trial in range(allData.shape[2]): ...: all_trial_electrodes = [] ...: for electrode in range(allData.shape[0]): ...: for other_electrode in range(allData.shape[0]): ...: if electrode == other_electrode: ...: pass ...: else: ...: single_xcorr = max(np.correlate(allData[electrode,:,trial], allData[other_electrode,:,trial])) ...: all_trial_electrodes.append(single_xcorr) ...: all_experiment_trials.append(all_trial_electrodes) ...: return all_experiment_trials ...: In [12]: %timeit org_approach(allData) 1 loops, best of 3: 1.04 s per loop In [13]: %timeit vectorized_approach(allData) 100 loops, best of 3: 15.1 ms per loop In [14]: np.allclose(vectorized_approach(allData),np.asarray(org_approach(allData))) Out[14]: True 
+3
source

All Articles