How to generate random pairs of numbers in Python, including pairs with one record, the same and exclusive pairs, both records being the same?

I use Python and used numpy for this. I want to generate pairs of random numbers. I want to exclude duplicate pair results, with both records having the same number, and I want to include pairs that have only one record of the same number. I tried to use

import numpy
numpy.random.choice(a,(m,n),replace=False) 

for him, but he completely eliminates any tepochki with the same records, i.e.

import numpy
numpy.random.choice(a=2,(m=2,n=1),replace=False) 

gives me only (1,0) and (0,1), not (1,1), (0,0), (1,0) and (0,1).

I want to do this because I want to draw a sample of random tuples with large a and large n (as used above), without getting exactly the same trowels more than once. It should also be more or less effective. Is there a way that is already implemented for this?

+4
source share
3 answers

Random Unique Coordinate Generator:

from random import randint

def gencoordinates(m, n):
    seen = set()

    x, y = randint(m, n), randint(m, n)

    while True:
        seen.add((x, y))
        yield (x, y)
        x, y = randint(m, n), randint(m, n)
        while (x, y) in seen:
            x, y = randint(m, n), randint(m, n)

Conclusion:

>>> g = gencoordinates(1, 100)
>>> next(g)
(42, 98)
>>> next(g)
(9, 5)
>>> next(g)
(89, 29)
>>> next(g)
(67, 56)
>>> next(g)
(63, 65)
>>> next(g)
(92, 66)
>>> next(g)
(11, 46)
>>> next(g)
(68, 21)
>>> next(g)
(85, 6)
>>> next(g)
(95, 97)
>>> next(g)
(20, 6)
>>> next(g)
(20, 86)

As you can see by coincidence, the coordinate was repeated x!

+9
source

Say your x and y coordinates are integers from 0 to n. For small n, a simple method can generate a set of all possible xy-coordinates with np.mgrid, reformat it into an array (nx * ny, 2), then select random strings from this:

nx, ny = 100, 200
xy = np.mgrid[:nx,:ny].reshape(2, -1).T
sample = xy.take(np.random.choice(xy.shape[0], 100, replace=False), axis=0)

, nx / ny , - , .


@morningsun, nx * ny , x, y, nx * ny x, y.

, N- , , :

def sample_comb1(dims, nsamp):
    perm = np.indices(dims).reshape(len(dims), -1).T
    idx = np.random.choice(perm.shape[0], nsamp, replace=False)
    return perm.take(idx, axis=0)

def sample_comb2(dims, nsamp):
    idx = np.random.choice(np.prod(dims), nsamp, replace=False)
    return np.vstack(np.unravel_index(idx, dims)).T

, :

In [1]: %timeit sample_comb1((100, 200), 100)
100 loops, best of 3: 2.59 ms per loop

In [2]: %timeit sample_comb2((100, 200), 100)
100 loops, best of 3: 2.4 ms per loop

In [3]: %timeit sample_comb1((1000, 2000), 100)
1 loops, best of 3: 341 ms per loop

In [4]: %timeit sample_comb2((1000, 2000), 100)
1 loops, best of 3: 319 ms per loop


scikit-learn, sklearn.utils.random.sample_without_replacement :

from sklearn.utils.random import sample_without_replacement

def sample_comb3(dims, nsamp):
    idx = sample_without_replacement(np.prod(dims), nsamp)
    return np.vstack(np.unravel_index(idx, dims)).T

In [5]: %timeit sample_comb3((1000, 2000), 100)
The slowest run took 4.49 times longer than the fastest. This could mean that an
intermediate result is being cached 
10000 loops, best of 3: 53.2 µs per loop
+5

@James Miles answer is great, but just to avoid endless loops when accidentally requesting too many arguments, I suggest the following (it also removes some repetitions):

def gencoordinates(m, n):
    seen = set()
    x, y = randint(m, n), randint(m, n)
    while len(seen) < (n + 1 - m)**2:
        while (x, y) in seen:
            x, y = randint(m, n), randint(m, n)
        seen.add((x, y))
        yield (x, y)
    return

Note that the wrong range of values ​​will continue to propagate.

+1
source

All Articles