Len selective random results in python

Python question. I create a large array of objects, and I only need to make a small random sample. In fact, the generation of the objects in question takes some time, so I wonder if it is possible to somehow skip those objects that do not need to be generated, and only explicitly create those objects that were selected.

In other words, now I have

a = createHugeArray() s = random.sample(a,len(a)*0.001) 

which is pretty wasteful. I would prefer something more lazy like

 a = createArrayGenerator() s = random.sample(a,len(a)*0.001) 

I do not know if this works. The documentation for random.sample is not very clear, although it mentions xrange as very fast - which makes me think that this might work. Converting the creation of an array to a generator will work a little (my knowledge of generators is very rusty), so I want to know if this works in advance. :)

The alternative that I see is creating an arbitrary selection through xrange and generating only those objects that are actually selected by the index. This is not very clean, because the generated indexes are arbitrary and not needed, and I will need some pretty hacker logic to support this in my generateHugeArray method.

For bonus points: how does random.sample work? Especially, how does it work if he does not know the size of the population in advance, as, for example, for generators such as xrange?

+6
python random lazy-evaluation sampling
source share
4 answers

It doesn't seem to avoid figuring out how indexes map to your permutations. If you do not know this, how would you create a random object from your array? You can use the trick with xrange() , which you proposed to yourself, or implement a class that defines the __getitem__() and __len__() methods, and pass the object of this class as an argument to population in random.sample() .

Some additional comments:

  • Converting createHugeArray () to a generator will not buy you anything - random.sample() will no longer work. He needs an object that supports len() .

  • Means must know the number of elements in the population from the very beginning.

  • the implementation contains two different algorithms and selects one that will use less memory. For a relatively small k (i.e., in this case), it simply saves the indices already selected in set and will make a new random choice if it falls into one of them.

Edit: A completely different approach would be to iterate over all permutations once and decide for each permutation if it should be included. If the total number of permutations is n and you would like to choose k from them, you could write

 selected = [] for i in xrange(n): perm = nextPermutation() if random.random() < float(k-len(selected))/(ni): selected.append(perm) 

This would select exactly k permutations at random.

+2
source share

You can create a list of array indices with a sample and then generate objects according to the results:

 def get_object(index): return MyClass(index) 

or something like that. Then use the sample to generate the indices you need and call this function with these indices:

 objs = map(get_object, random.sample(range(length), 0.001 * length)) 

This is a bit indirect, since it selects only a list of possible array indices.

0
source share

Explaining how random.sample works,

random.sample(container, k) will return k the number of values ​​randomly from the container. Since the generator is repeated as lists, tuples, and keys or values ​​in dicts, it will go through the container and then take these random elements.

eg. random.sample(xrange(111),4) returns something like [33,52,111,1] like k = 4 , which means 4 random numbers from the xrange generator to 111.

0
source share

I assume that the createHugeArray () function contains a piece of code that repeats once for each created object. And I assume that the objects are generated from some initial value or seed, in which case createHugeArray () looks something like this:

 def createHugeArray( list_of_seeds ): huge_array = [] for i in list_of_seeds: my_object = makeObject( i ) huge_array.append( my_object ) return huge_array 

(I used lists not arrays, but you got the idea.)

To perform random sampling before actually creating objects, just add a line that generates a random number, and then create the object only if the random number is less than a certain threshold. Say that you only need one object in a thousand. random.randint (0,999) gives a number from 0 to 999 - so only generate an object if you get zero. Above code:

 import random def createHugeArray( list_of_seeds ): huge_array = [] for i in list_of_seeds: die_roll = random.randint(0,999) if( die_roll == 0 ): my_object = makeObject( i ) huge_array.append( my_object ) return huge_array 

Of course, if my assumption about how your code works is wrong, this is useless to you, and in this case, sorry and good luck :-)

0
source share

All Articles