For the problem I'm currently working on, I would like to get a fairly uniform random choice from the set of parameters for this set. Unfortunately, itβs straight into the statistics that I didnβt study at all (something that I need to fix now when I get into real programming), so I wanted to run my solution on some people who know this.
If a given set has size n, then there exist (nk) = n! / [K! (nk)!] subsets of size k, and the total size N of the parameter set is specified as the sum (nk) over k from 0 to n. (also given as 2 n but I donβt think the use is here. I could obviously be wrong).
So my plan is to split [0, 1] into intervals:
[0, (n 0)/N] ((n 0)/N, [(n 0) + (n 1)]/N] ([(n 0) + (n 1)]/N, [(n 0) + (n 1) + (n 2)]/N] ... ([N - (nn)]/N, 1]
Algorithmically, the intervals are constructed by taking the largest element of the previous interval for the largest lower boundary of the new interval, adding (nj) / N to it to get the largest element. Hope this is clear.
Then I can find out how many elements are in a random subset by choosing a uniform float in [0, 1] and comparing it with the index of the interval to which it belongs. From there I can choose a random subset of the appropriate size.
I am sure (from a simple intuitive point of view) that my scheme provides a uniform choice of the size of the subset (uniform with respect to the total number of subsets). This is clearly not uniform on the set {1, 2, .., n} of sizes).
I am using the library (python random.sample ) to get a subset of a given size, so I'm sure it will be uniform.
So my question is, if you put the two together as I describe, makes the choice of a random subset of random size uniform. If the answer is a lot of work, I am happy to accept pointers on how this can be proved and do the job for me. Also, if there is a better way to do this, I would, of course, be pleased with this.