How to generate random pairs of numbers in Python, including pairs with one record, the same and exclusive pairs, both records being the same?

Question

How to generate random pairs of numbers in Python, including pairs with one record, the same and exclusive pairs, both records being the same?

I use Python and used numpy for this. I want to generate pairs of random numbers. I want to exclude duplicate pair results, with both records having the same number, and I want to include pairs that have only one record of the same number. I tried to use

import numpy
numpy.random.choice(a,(m,n),replace=False)

for him, but he completely eliminates any tepochki with the same records, i.e.

import numpy
numpy.random.choice(a=2,(m=2,n=1),replace=False)

gives me only (1,0) and (0,1), not (1,1), (0,0), (1,0) and (0,1).

I want to do this because I want to draw a sample of random tuples with large a and large n (as used above), without getting exactly the same trowels more than once. It should also be more or less effective. Is there a way that is already implemented for this?

+4

python numpy random

Dave Jun 17 '15 at 11:36

source share

3 answers

James mills · Answer 1 · 2015-06-17T11:47:17+0000

Random Unique Coordinate Generator:

from random import randint

def gencoordinates(m, n):
    seen = set()

    x, y = randint(m, n), randint(m, n)

    while True:
        seen.add((x, y))
        yield (x, y)
        x, y = randint(m, n), randint(m, n)
        while (x, y) in seen:
            x, y = randint(m, n), randint(m, n)

Conclusion:

>>> g = gencoordinates(1, 100)
>>> next(g)
(42, 98)
>>> next(g)
(9, 5)
>>> next(g)
(89, 29)
>>> next(g)
(67, 56)
>>> next(g)
(63, 65)
>>> next(g)
(92, 66)
>>> next(g)
(11, 46)
>>> next(g)
(68, 21)
>>> next(g)
(85, 6)
>>> next(g)
(95, 97)
>>> next(g)
(20, 6)
>>> next(g)
(20, 86)

As you can see by coincidence, the coordinate was repeated x!

ali_m · Answer 2 · 2015-10-19T11:58:38+0000

Say your x and y coordinates are integers from 0 to n. For small n, a simple method can generate a set of all possible xy-coordinates with np.mgrid, reformat it into an array (nx * ny, 2), then select random strings from this:

nx, ny = 100, 200
xy = np.mgrid[:nx,:ny].reshape(2, -1).T
sample = xy.take(np.random.choice(xy.shape[0], 100, replace=False), axis=0)

, nx / ny , - , .

@morningsun, nx * ny , x, y, nx * ny x, y.

, N- , , :

def sample_comb1(dims, nsamp):
    perm = np.indices(dims).reshape(len(dims), -1).T
    idx = np.random.choice(perm.shape[0], nsamp, replace=False)
    return perm.take(idx, axis=0)

def sample_comb2(dims, nsamp):
    idx = np.random.choice(np.prod(dims), nsamp, replace=False)
    return np.vstack(np.unravel_index(idx, dims)).T

, :

In [1]: %timeit sample_comb1((100, 200), 100)
100 loops, best of 3: 2.59 ms per loop

In [2]: %timeit sample_comb2((100, 200), 100)
100 loops, best of 3: 2.4 ms per loop

In [3]: %timeit sample_comb1((1000, 2000), 100)
1 loops, best of 3: 341 ms per loop

In [4]: %timeit sample_comb2((1000, 2000), 100)
1 loops, best of 3: 319 ms per loop

scikit-learn, sklearn.utils.random.sample_without_replacement :

from sklearn.utils.random import sample_without_replacement

def sample_comb3(dims, nsamp):
    idx = sample_without_replacement(np.prod(dims), nsamp)
    return np.vstack(np.unravel_index(idx, dims)).T

In [5]: %timeit sample_comb3((1000, 2000), 100)
The slowest run took 4.49 times longer than the fastest. This could mean that an
intermediate result is being cached 
10000 loops, best of 3: 53.2 µs per loop

borgr · Answer 3 · 2017-07-29T09:29:20+0000

@James Miles answer is great, but just to avoid endless loops when accidentally requesting too many arguments, I suggest the following (it also removes some repetitions):

def gencoordinates(m, n):
    seen = set()
    x, y = randint(m, n), randint(m, n)
    while len(seen) < (n + 1 - m)**2:
        while (x, y) in seen:
            x, y = randint(m, n), randint(m, n)
        seen.add((x, y))
        yield (x, y)
    return

Note that the wrong range of values will continue to propagate.

How to generate random pairs of numbers in Python, including pairs with one record, the same and exclusive pairs, both records being the same?

More articles: