How to randomize (or move) a data frame and split it into columns?

I have a dataframe (df1) like this.

f1 f2 f3 f4 f5 d1 1 0 1 1 1 d2 1 0 0 1 0 d3 0 0 0 1 1 d4 0 1 0 0 1 

Column d1 ... d4 is the name of rowname, row f1 ... f5 is the name of the column.

To fetch (df1), I get a new framework with the number 1 equal to df1. Thus, counter 1 is stored for the entire data frame, but not for each row or each column.

Is it possible to perform randomization by row or column?

I want to randomize df1 columns by column for each column, i.e. the number 1 in each column remains unchanged. and each column needs to be changed at least once. For example, I may have a randomized df2 like this: (It is noted that the number 1 in each column remains unchanged, but the number 1 in each row is different.

  f1 f2 f3 f4 f5 d1 1 0 0 0 1 d2 0 1 0 1 1 d3 1 0 0 1 1 d4 0 0 1 1 0 

Similarly, I also want to randomize the df1 line for each line, i.e. no. of 1 in each row remain unchanged, and each row must be changed (but no changed records can be different). For example, a randomized df3 might look something like this:

  f1 f2 f3 f4 f5 d1 0 1 1 1 1 <- two entries are different d2 0 0 1 0 1 <- four entries are different d3 1 0 0 0 1 <- two entries are different d4 0 0 1 0 1 <- two entries are different 

PS. Many thanks for the help from Gavin Simpson, Joris Mays and Chase for the previous answers to my previous question on the randomization of two columns.

+80
random r permutation
Jun 21 '11 at 8:17
source share
8 answers

For R. data.frame data:

 > df1 abc 1 1 1 0 2 1 0 0 3 0 1 0 4 0 0 0 

Shuffle line:

 > df2 <- df1[sample(nrow(df1)),] > df2 abc 3 0 1 0 4 0 0 0 2 1 0 0 1 1 1 0 

By default, sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing the replace=FALSE (default) parameter to sample(...) ensures that the selection is performed without replacement, which performs a sequence with a sequence.

Shuffle columns:

 > df3 <- df1[,sample(ncol(df1))] > df3 cab 1 0 1 1 2 0 1 0 3 0 0 1 4 0 0 0 
+215
Jul 16 '12 at 11:35
source share

This is another way to data.frame with the dplyr package:

line by line:

 df2 <- slice(df1, sample(1:n())) 

or

 df2 <- sample_frac(df1, 1L) 

columns:

 df2 <- select(df1, one_of(sample(names(df1)))) 
+14
Mar 01 '18 at 19:34
source share

Take a look at permatswap() in the vegan package. Here is an example that supports both summary rows and columns, but you can relax and fix only one of the sums of rows or columns.

 mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5) set.seed(4) out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab") 

This gives:

 R> out$perm[[1]] [,1] [,2] [,3] [,4] [,5] [1,] 1 0 1 1 1 [2,] 0 1 0 1 0 [3,] 0 0 0 1 1 [4,] 1 0 0 0 1 R> out$perm[[2]] [,1] [,2] [,3] [,4] [,5] [1,] 1 1 0 1 1 [2,] 0 0 0 1 1 [3,] 1 0 0 1 0 [4,] 0 0 1 0 1 

To explain the call:

 out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab") 
  • times - the number of randomized matrices you want, here 99
  • burnin - the number of swaps made before we start accepting random samples. This allows the matrix from which we randomly select randomness before we begin to take each of our randomized matrices
  • thin says only take a random draw every thin swaps
  • mtype = "prab" says that the matrix is ​​considered as presence / absence, i.e. binary data 0/1.

A few notes, this does not guarantee that any column or row has been randomized, but if burnin is long enough, there should be a good chance that this will happen. In addition, you can draw more random matrices than you need and drop those that do not meet all your requirements.

Your requirement for a different number of changes per line is also not considered here. Again, you can select more matrices than you want, and then discard those that do not meet this requirement.

+10
Jun 21 2018-11-11T00:
source share

you can also use randomizeMatrix function in R picante package

Example:

 test <- matrix(c(1,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0),nrow=4,ncol=4) > test [,1] [,2] [,3] [,4] [1,] 1 0 1 0 [2,] 1 1 0 1 [3,] 0 0 0 0 [4,] 1 0 1 0 randomizeMatrix(test,null.model = "frequency",iterations = 1000) [,1] [,2] [,3] [,4] [1,] 0 1 0 1 [2,] 1 0 0 0 [3,] 1 0 1 0 [4,] 1 0 1 0 randomizeMatrix(test,null.model = "richness",iterations = 1000) [,1] [,2] [,3] [,4] [1,] 1 0 0 1 [2,] 1 1 0 1 [3,] 0 0 0 0 [4,] 1 0 1 0 > 

The null.model="frequency" parameter supports column sums, and richness supports row sums. Although it is mainly used to randomize datasets of the absence of species in the community ecology, it works well here.

This function has other options of the zero model, see the additional link (page 36) picante documentation

+6
Sep 11 '12 at 21:32
source share

Of course, you can try each line:

 sapply (1:4, function (row) df1[row,]<<-sample(df1[row,])) 

will shuffle the lines themselves, so the number 1 in each line does not change. Small changes, and it works great with columns, but this is an exercise for the reader: -P

+4
Jun 21 2018-11-11T00:
source share

You can also “sample” the same number of elements in your data frame like this:

 nr<-dim(M)[1] random_M = M[sample.int(nr),] 
+1
Nov 25 '18 at 4:17
source share

Random samples and permutations in the data frame If it is in matrix form, convert to data.frame use the example function from the basic package indexes = sample (1: nrow (df1), size = 1 * nrow (df1)) Random samples and permutations

0
Feb 18 '18 at 4:27
source share

If the goal is to randomly shuffle each column, some of the answers above do not work as the columns shuffle together (this keeps correlations between the columns). Others require package installation. However, there are single line:

 df2 = lapply(df1, function(x) { sample(x) }) 
0
Sep 06 '19 at 16:44
source share



All Articles