Python question. I create a large array of objects, and I only need to make a small random sample. In fact, the generation of the objects in question takes some time, so I wonder if it is possible to somehow skip those objects that do not need to be generated, and only explicitly create those objects that were selected.
In other words, now I have
a = createHugeArray() s = random.sample(a,len(a)*0.001)
which is pretty wasteful. I would prefer something more lazy like
a = createArrayGenerator() s = random.sample(a,len(a)*0.001)
I do not know if this works. The documentation for random.sample is not very clear, although it mentions xrange as very fast - which makes me think that this might work. Converting the creation of an array to a generator will work a little (my knowledge of generators is very rusty), so I want to know if this works in advance. :)
The alternative that I see is creating an arbitrary selection through xrange and generating only those objects that are actually selected by the index. This is not very clean, because the generated indexes are arbitrary and not needed, and I will need some pretty hacker logic to support this in my generateHugeArray method.
For bonus points: how does random.sample work? Especially, how does it work if he does not know the size of the population in advance, as, for example, for generators such as xrange?
python random lazy-evaluation sampling
Verhoevenv
source share