Numpy: get a random set of strings from a 2D array

I have a very large 2D array that looks something like this:

a= [[a1, b1, c1], [a2, b2, c2], ..., [an, bn, cn]] 

Using numpy, is there an easy way to get a new 2D array, for example, with 2 random strings from the original array a (without replacement)?

eg

 b= [[a4, b4, c4], [a99, b99, c99]] 
+118
python numpy
Jan 10 '13 at 16:30
source share
7 answers
 >>> A = np.random.randint(5, size=(10,3)) >>> A array([[1, 3, 0], [3, 2, 0], [0, 2, 1], [1, 1, 4], [3, 2, 2], [0, 1, 0], [1, 3, 1], [0, 4, 1], [2, 4, 2], [3, 3, 1]]) >>> idx = np.random.randint(10, size=2) >>> idx array([7, 6]) >>> A[idx,:] array([[0, 4, 1], [1, 3, 1]]) 

Combining this for the general case:

 A[np.random.randint(A.shape[0], size=2), :] 

To replace (numpy 1.7.0 +):

 A[np.random.choice(A.shape[0], 2, replace=False), :] 

I do not believe that there is a good way to generate a random list without replacing up to 1.7. Perhaps you can customize a small definition that ensures that the two values ​​do not match.

+148
Jan 10 '13 at 16:35
source share

This is an old post, but this is what works best for me:

 A[np.random.choice(A.shape[0], num_rows_2_sample, replace=False)] 

change replace = False to True to get the same, but with a replacement.

+37
Jan 07 '15 at 8:37
source share

Another option is to create a random mask if you just want to downsample your data by a certain factor. Let's say I want to reset the sample to 25% of my original dataset, which is currently stored in the data_arr array:

 # generate random boolean mask the length of data # use p 0.75 for False and 0.25 for True mask = numpy.random.choice([False, True], len(data_arr), p=[0.75, 0.25]) 

Now you can call data_arr[mask] and return ~ 25% of the strings randomly selected.

+23
Aug 03 '15 at 18:58
source share

If you need the same lines, but only a random sample, then

 import random new_array = random.sample(old_array,x) 

Here x should be "int", determining the number of lines that you want to randomly select.

+5
May 16 '17 at 10:55 p.m.
source share

This is an answer similar to the one provided by Hezi Rasheff, but simplified so that new Python users understand what is happening (I noticed that many new students studying data choose random samples in the strangest way because they don’t know what they are doing in Python).

You can get some random indexes from your array using:

 indices = np.random.choice(A.shape[0], amount_of_samples, replace=False) 

Then you can use slicing with your numpy array to get samples at these indices:

 A[indices] 

This will give you the specified number of random samples from your data.

+4
Dec 20 '18 at 10:35
source share

I see a permutation has been proposed. In fact, this can be done in one line:

 >>> A = np.random.randint(5, size=(10,3)) >>> np.random.permutation(A)[:2] array([[0, 3, 0], [3, 1, 2]]) 
+3
Oct 19 '18 at 21:35
source share

If you want to create several random subsets of strings, for example if you are doing RANSAC.

 num_pop = 10 num_samples = 2 pop_in_sample = 3 rows_to_sample = np.random.random([num_pop, 5]) random_numbers = np.random.random([num_samples, num_pop]) samples = np.argsort(random_numbers, axis=1)[:, :pop_in_sample] # will be shape [num_samples, pop_in_sample, 5] row_subsets = rows_to_sample[samples, :] 
+1
Oct 23 '18 at 11:24
source share



All Articles