Random sampling from a dataset while maintaining the original probability distribution

I have a set of 2000 numbers compiled from a dimension. I want to try from this data set ~ 10 times in each test, while preserving the probability distribution as a whole and in each test (as much as possible). For example, in each test I want a small value, some average value of the class, some big value, with an average value and dispersion that is approximately close to the initial distribution. Combining all the tests, I also want the total average and variance of all samples to be approximately close to the initial distribution.

Since my data set is the probability distribution of the long tail , the amount of data in each quantile is not the same:

Probability density

Fig 1. Graph of density ~ 2k data elements.

I use Java, and now I use uniform distribution and use a random int from the dataset and return the data item in this position:

public int getRandomData() { int data[] ={1231,414,222,4211,,41,203,123,432,...}; length=data.length; Random r=new Random(); int randomInt = r.nextInt(length); return data[randomInt]; } 

I do not know if it works the way I want, because I use the data so that it is measured, which has a large amount of consistent correlation.

+6
source share
2 answers

It works the way you want. The order of the data does not matter.

+3
source

Random sampling preserves the probability distribution.

+2
source

All Articles