Best way to move the contents of each column to numpy

What is the best way to efficiently move the contents of each column to a numpy array?

I have something like:

>>> arr = np.arange(16).reshape((4, 4)) >>> arr array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) >> # Shuffle each column independently to obtain something like array([[ 8, 5, 10, 7], [ 12, 1, 6, 3], [ 4, 9, 14, 11], [ 0, 13, 2, 15]]) 
+8
python arrays numpy shuffle
source share
2 answers

If your array is multi-dimensional, np.random.permutation by default moves along the first axis (columns):

 >>> np.random.permutation(arr) array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [ 0, 1, 2, 3], [12, 13, 14, 15]]) 

However, this shuffles the row indices, so each column has the same (random) ordering.

The easiest way to shuffle each column independently can be to loop through the columns and use np.random.shuffle to shuffle each one in place:

 for i in range(arr.shape[1]): np.random.shuffle(arr[:,i]) 

Which gives, for example:

 array([[12, 1, 14, 11], [ 4, 9, 10, 7], [ 8, 5, 6, 15], [ 0, 13, 2, 3]]) 

This method can be useful if you have a very large array that you do not want to copy because the permutation of each column is done in place. On the other hand, even simple Python loops can be very slow, and there are faster NumPy methods, such as those provided by @jme.

+6
source share

Here is another way to do this:

 def permute_columns(x): ix_i = np.random.sample(x.shape).argsort(axis=0) ix_j = np.tile(np.arange(x.shape[1]), (x.shape[0], 1)) return x[ix_i, ix_j] 

Quick test:

 >>> x = np.arange(16).reshape(4,4) >>> permute_columns(x) array([[ 8, 9, 2, 3], [ 0, 5, 10, 11], [ 4, 13, 14, 7], [12, 1, 6, 15]]) 

The idea is to create a bunch of random numbers, and then argsort them inside each column independently. This leads to a random permutation of the indices of each column.

Note that this has suboptimal asymptotic time complexity, since sorting takes O(nm log m) for an array of size mxn . But since Python for loops are pretty slow, you actually get better performance for all but very high matrices.

+5
source share

All Articles