How to distribute a small amount of data in a random order in a much larger amount of data?
For example, I have several thousand rows of "real" data, and I want to insert a dozen or two lines of control data in random order over all the "real" data.
Now I am not trying to ask how to use random number generators, I am asking a statistical question, I know how to generate random numbers, but my question is how can I ensure that this data is inserted at random and at the same time fairly evenly scattered across the file.
If I simply rely on generating random numbers, there is a possibility (albeit very small) that all of my control data, or at least their clumps, will be inserted into a rather narrow selection of "real" data. What is the best way to stop this?
To express this in another way, I want to insert control data into all my real data, not being able for a third party to calculate which lines are control and which are real.
Update: I made this a “community wiki”, so if someone wants to edit my question, then that makes sense, and then go straight ahead.
Update. Let me try an example (I do not want this language or platform to be dependent, as this is not a coding issue, this is a statistical question).- 3000 "" ( run to run, , ).
- 20 "control" ( , , , ).
20 "" 150 "" (3000/20 = 150). , , , , .
, "" , , "" .