How to randomize (or move) a data frame and split it into columns?

Question

How to randomize (or move) a data frame and split it into columns?

I have a dataframe (df1) like this.

f1 f2 f3 f4 f5 d1 1 0 1 1 1 d2 1 0 0 1 0 d3 0 0 0 1 1 d4 0 1 0 0 1

Column d1 ... d4 is the name of rowname, row f1 ... f5 is the name of the column.

To fetch (df1), I get a new framework with the number 1 equal to df1. Thus, counter 1 is stored for the entire data frame, but not for each row or each column.

Is it possible to perform randomization by row or column?

I want to randomize df1 columns by column for each column, i.e. the number 1 in each column remains unchanged. and each column needs to be changed at least once. For example, I may have a randomized df2 like this: (It is noted that the number 1 in each column remains unchanged, but the number 1 in each row is different.

  f1 f2 f3 f4 f5 d1 1 0 0 0 1 d2 0 1 0 1 1 d3 1 0 0 1 1 d4 0 0 1 1 0

Similarly, I also want to randomize the df1 line for each line, i.e. no. of 1 in each row remain unchanged, and each row must be changed (but no changed records can be different). For example, a randomized df3 might look something like this:

  f1 f2 f3 f4 f5 d1 0 1 1 1 1 <- two entries are different d2 0 0 1 0 1 <- four entries are different d3 1 0 0 0 1 <- two entries are different d4 0 0 1 0 1 <- two entries are different

PS. Many thanks for the help from Gavin Simpson, Joris Mays and Chase for the previous answers to my previous question on the randomization of two columns.

+80

random r permutation

a83 Jun 21 '11 at 8:17

source share

8 answers

pms · Answer 1 · 2012-07-16 11:35

For R. data.frame data:

 > df1 abc 1 1 1 0 2 1 0 0 3 0 1 0 4 0 0 0

Shuffle line:

 > df2 <- df1[sample(nrow(df1)),] > df2 abc 3 0 1 0 4 0 0 0 2 1 0 0 1 1 1 0

By default, sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing the replace=FALSE (default) parameter to sample(...) ensures that the selection is performed without replacement, which performs a sequence with a sequence.

Shuffle columns:

 > df3 <- df1[,sample(ncol(df1))] > df3 cab 1 0 1 1 2 0 1 0 3 0 0 1 4 0 0 0

Enrique Pérez Herrero · Answer 2 · 2018-03-01 19:34

This is another way to data.frame with the dplyr package:

line by line:

 df2 <- slice(df1, sample(1:n()))

or

 df2 <- sample_frac(df1, 1L)

columns:

 df2 <- select(df1, one_of(sample(names(df1))))

Gavin Simpson · Answer 3 · 2011-06-21 08:39

Take a look at permatswap() in the vegan package. Here is an example that supports both summary rows and columns, but you can relax and fix only one of the sums of rows or columns.

 mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5) set.seed(4) out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")

This gives:

 R> out$perm[[1]] [,1] [,2] [,3] [,4] [,5] [1,] 1 0 1 1 1 [2,] 0 1 0 1 0 [3,] 0 0 0 1 1 [4,] 1 0 0 0 1 R> out$perm[[2]] [,1] [,2] [,3] [,4] [,5] [1,] 1 1 0 1 1 [2,] 0 0 0 1 1 [3,] 1 0 0 1 0 [4,] 0 0 1 0 1

To explain the call:

 out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")

times - the number of randomized matrices you want, here 99
burnin - the number of swaps made before we start accepting random samples. This allows the matrix from which we randomly select randomness before we begin to take each of our randomized matrices
thin says only take a random draw every thin swaps
mtype = "prab" says that the matrix is considered as presence / absence, i.e. binary data 0/1.

A few notes, this does not guarantee that any column or row has been randomized, but if burnin is long enough, there should be a good chance that this will happen. In addition, you can draw more random matrices than you need and drop those that do not meet all your requirements.

Your requirement for a different number of changes per line is also not considered here. Again, you can select more matrices than you want, and then discard those that do not meet this requirement.

Anne Heloise Theo · Answer 4 · 2012-09-11 21:32

you can also use randomizeMatrix function in R picante package

Example:

 test <- matrix(c(1,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0),nrow=4,ncol=4) > test [,1] [,2] [,3] [,4] [1,] 1 0 1 0 [2,] 1 1 0 1 [3,] 0 0 0 0 [4,] 1 0 1 0 randomizeMatrix(test,null.model = "frequency",iterations = 1000) [,1] [,2] [,3] [,4] [1,] 0 1 0 1 [2,] 1 0 0 0 [3,] 1 0 1 0 [4,] 1 0 1 0 randomizeMatrix(test,null.model = "richness",iterations = 1000) [,1] [,2] [,3] [,4] [1,] 1 0 0 1 [2,] 1 1 0 1 [3,] 0 0 0 0 [4,] 1 0 1 0 >

The null.model="frequency" parameter supports column sums, and richness supports row sums. Although it is mainly used to randomize datasets of the absence of species in the community ecology, it works well here.

This function has other options of the zero model, see the additional link (page 36) picante documentation

binfalse · Answer 5 · 2011-06-21 08:37

Of course, you can try each line:

 sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))

will shuffle the lines themselves, so the number 1 in each line does not change. Small changes, and it works great with columns, but this is an exercise for the reader: -P

Marcos Pedrosa · Answer 6 · 2018-11-25 04:17

You can also “sample” the same number of elements in your data frame like this:

 nr<-dim(M)[1] random_M = M[sample.int(nr),]

thrinadhn · Answer 7 · 2018-02-18 04:27

Random samples and permutations in the data frame If it is in matrix form, convert to data.frame use the example function from the basic package indexes = sample (1: nrow (df1), size = 1 * nrow (df1)) Random samples and permutations

rimorob · Answer 8 · 2019-09-06 16:44

If the goal is to randomly shuffle each column, some of the answers above do not work as the columns shuffle together (this keeps correlations between the columns). Others require package installation. However, there are single line:

 df2 = lapply(df1, function(x) { sample(x) })

How to randomize (or move) a data frame and split it into columns?

More articles: