To simulate a given mixture with a different structure of the relationship between two variables?

I would like to simulate mixture data, such as 3D data. I would like to have 2 different components between two variables.

That is, to simulate these mixtures (V1 and V2), where the dependencies between them are two different normal components. Then between V2 and V3 there are two more normal components. So, I will have 3D data, the relationship between the first and second variable is a mixture of two normals. And the relationship between the second and third variable is a mixture of two more different components.

Another way to explain my question:

Suppose I would like to generate mixture data as follows:

1- 0.3 normal (0.5.1) + 0.7 normal (2.4) #, so here I get two-dimensional data of the mixture obtained from two different normal (two components of the mixture), the sum of the mass of the mixer is 1.

Then I would like to get the following variable as follows:

2- 0.5 normal (2.4) # this is the second variable during the first simulation + 0.5 normal (2.6)

so here I get 3D modeling data, where V1 and V2 are generated by two different components of the mixture, and V2 and V3 are generated by the other different components of the mixture.

Here's how to generate data in r: (I believe that it does not generate two-dimensional data)

N <- 100000 #Sample N random uniforms U U <- runif(N) #Variable to store the samples from the mixture distribution rand.samples <- rep(NA,N) #Sampling from the mixture for(i in 1:N) { if(U[i]<.3) { rand.samples[i] <- rnorm(1,1,3) } else { rand.samples[i] <- rnorm(1,2,5) } } 

therefore, if we generate mixed two-dimensional data (two variables), then how to expand it to have 4 or 5 variables, where V1 and V2 are generated from two different normals (the dependency structures between them are a mixture of two normals) and then V3 will be generated from another other than normal, and then from V2. That is, when we build V2 ~ V3, we find that the dependency structures between them are a mixture of two normals, etc.

+7
r simulation mixture
source share
1 answer

I'm not sure I understood the question correctly, but I will try. You have 3 distributions of D1, D2 and D3. From these three distributions, you would like to create variables that use 2 of these 3, but not the same ones.

Since I don’t know how distributions should be combined, I used flags using a binomial distribution (its length vector is 200 with 0s and 1s) to determine which distribution each value will be selected from (you can change that if it’s not, as you want it).

 D1 = rnorm(200,2,1) D2 = rnorm(200,3,1) D3= rnorm(200,1.5,2) 

To create a mixed distribution, we can use the rbinom function to create the vector 1s and 0s according to the selected probability. This is a way to have some values ​​from both distributions.

 var_1_flag <- rbinom(200, size=1, prob = 0.3) var_1 <- var_1_flag*D1 + (1 - var_1_flag)*D2 var_2_flag <- rbinom(200, size=1, prob = 0.7) var_2 <- var_2_flag*D2 + (1 - var_2_flag)*D3 var_3_flag <- rbinom(200, size=1, prob = 0.6) var_3 <- var_3_flag*D1 + (1 - var_3_flag)*D3 

To find out what values ​​come from the distribution, you can do the following:

var_1[var_1_flag] # This gives you the values ​​in the mixed distribution that come from the first distribution (D1)

var1[!var_1_flag] # This gives you the values ​​in the mixed distribution that come from the second distribution (D2)

Since I found this a little tame, and I assume that you might want to change the variables, you can use the function below to get the same results

 create_distr <- function(observations, mean1, sd1, mean2, sd2, flag_prob) { flag <- rbinom(observations, size=1, prob = flag_prob) my_distribution <- flag * rnorm(observations, mean1, sd1) + (1 - flag) * rnorm(observations, mean2, sd2) } var_1 <- create_distr(200, 2, 1, 3, 1, 0.5) var_2 <- create_distr(200, 3, 1, 1.5, 2, 0.7) var_3 <- create_distr(200, 2, 1, 1.5, 2, 0.6) 

If you want to have more than two variables (distributions) for the mix, you can extend the code you provided as follows:

 N <- 100000 #Sample N random uniforms U U <- runif(N) #Variable to store the samples from the mixture distribution rand.samples <- rep(NA,N) for(i in 1:N) { if(U[i] < 0.3) { rand.samples[i] <- rnorm(1,1,3) } else if (U[i] < 0.5){ rand.samples[i] <- rnorm(1,2,5) } else if (U[i] < 0.8) { rand.samples[i] <- rnorm(1,5,2) } else { rand.samples[i] <- rt(1, 2) } } 

Thus, each element is taken from one of each distribution. If you want to get the same result, but without each element one at a time, you can do the following:

 N <- 100000 #Sample N random uniforms U U <- runif(N) #Variable to store the samples from the mixture distribution rand.samples <- rep(NA,N) D1 = rnorm(N,1,3) D2 = rnorm(N,2,5) D3= rnorm(N,5,2) D4 = rt(N, 2) rand.samples <- c(D1[U < 0.3], D2[U >= 0.3 & U < 0.5], D3[U >= 0.5 & U < 0.8], D4[U >= 0.8]) 

Which corresponds to 0.3 * normal (1.3) + 0.2 * normal (2.5) + 0.3 * normal (5.2) + 0.2 * students (2 degrees of freedom)

If you want to create two mixtures, and in the second to keep the same values ​​from the usual distribution, you can do the following:

 mixture_1 <- c(D1[U < 0.3], D2[U >= 0.3 ]) mixture_2 <- c(D1[U < 0.3], D3[U >= 0.3]) 

This will use the same elements from normal (1.3) in both blends. The trick is not to recount rnorm (N, 1,3) every time you use it. And in both cases, the samples consist of 30%, approximately coming from the first normal (D1) and 70% from about the second distribution. For example:

  set.seed(1) N <- 100000 U <- runif(N) > prop.table(table(U < 0.3)) FALSE TRUE 0.6985 0.3015 

30% of the values ​​in the vector U are below 0.3.

+4
source share

All Articles