Set S of n numbers - have a subset with the probability that each element of S found in it is equal to

I have this problem

You are given a set S of n numbers. You must select a subset S 'of k numbers from S so that the probability of each element of S entering S' is equal (i.e., Each was selected with probability k / n). You can make only one pass by numbers. What if n is unknown?

and I even have a solution: http://www.algorithm.cs.sunysb.edu/algowiki/index.php/TADM2E_2.43

However: I do not understand the text of the problem at all .

I have been given the set S of n numbers. Good. I need to select a subset (maybe 2 ^ n subsets) of k numbers, so the probability that every element of S occurring in S 'will be equal ... the obvious answer to me would be to simply grab an empty S' set: each an element from S would have the probability of being in S '.

If this is unacceptable (and it should have been pointed out), I suggest that I should count on the most repeating element (appearing T times) in S and make every other element exactly equal to T instances in S '(it should still be a subset if the elements are contained in S).

I do not understand the decision of the priority queue, as well as the probability of k / n. Can someone help me with this?

+5
source share
1 answer

This is a very famous problem with the resulting technique called Reservoir Sampling - a very useful algorithm for processing large amounts of data. The previous link can give you the exact tuning, motivation and explanation of the decision.

+6
source

All Articles