R For the list, Create all factor combinations, all mergers, and combine

So, I work with cancer stage data. Suppose this is a dataset. Its data block.

cancertype stage TCGA-67-6215-01 1 TCGA-67-6216-01 1 TCGA-67-6217-01 2 TCGA-69-7760-01 2 TCGA-69-7761-01 1 TCGA-69-7763-01 1 TCGA-69-7764-01 1 TCGA-69-7765-01 4 TCGA-69-7980-01 1 TCGA-71-6725-01 1 TCGA-73-4658-01 1 TCGA-73-4659-01 3 TCGA-73-4662-01 1 TCGA-73-4675-01 3 

So what I want is a list in which each item is a data framework. There are 4 levels for 4 possible stages of cancer. For each combo from 2 levels, from 3 levels, etc. There must be a data frame up to level levels. But also a data frame for each combination of combined levels. I mean

 list( dataframe of stage1 and 2 dataframe of stage1 and 3 dataframe of stage 1 and 4 dataframe of stage 2 and 3 ...etc dataframe of stage 1,2 and 3 dataframe of stage 2,3 and 4 ... dataframe of stage 1,2 and 3,4 dataframe of stage 1,3 and 2,4 dataframe of stage 1,2,3 and 4 dataframe of stage 1,2,4 and 3 .. etc etc I think this should give you the idea. ) 

Here, when I say, stage 1,2,4, I mean that they were all united on one level.

Basically, I am trying to do my best t-test comparison, so I am setting up the samples that I will need for this comparison. It would be nice to do everything possible combos and combine combos.

Where am I still, I can combine all the elements of unrelated comparisons, which is 11. that is, 6 combos from two stages, 4 combos from 3 stages, 1 combination of 4 stages.

 stage # dataframe of stage data as factors stage_split <-split(stage,stage[,1]) allcombos<- c(combn(stage_split,2,simplify=F), combn(stage_split,3,simplify=F), combn(stage_split,4,simplify=F)) allcombos_cmbnd<- lapply(allcombos, function(x) Reduce(rbind,x)) 

How can I generate additional data frames from all possible merge permutations and then add to this list? Maybe there is an elegant way from the first dataframe to accomplish this. One way is to iterate over this list from 11 and generate a merge starting from combo 3? I could do it, but I hope there is an elegant way to accomplish this that could be increased. Nothing I have found so far explains how to create all the level combinations in your data and all the merging combinations of your levels.

Thanks for any help

+5
source share
1 answer

When you group steps together, you break sets of sizes 3 or 4. There is a partitions package that implements a given split into setparts . Here I focus on this merging part, because it sounds like youโ€™ve already sorted out an unconnected grouping.

  ## For unmerged, get groupings with something like this combos <- unlist(lapply(2:4, function(x) combn(unique(dat$stage), x, simplify=F)), rec=F) ## For merged groupings, use set partitioning library(partitions) dats <- unlist(sapply(3:4, function(p) { parts <- setparts(p) # set partitions of size p lst <- lapply(split(parts, col(parts)), function(idx) { if (p==3) { # with sets of 3, need to exclude one of the stages subLst <- lapply(1:4, function(exclude) { tmp <- dat$stage tmp[dat$stage==exclude] <- NA ids <- seq(4)[-exclude] for (i in 1:3) tmp[dat$stage==ids[i]] <- idx[i] data.frame(dat$cancertype, stage=tmp) }) names(subLst) <- paste(1:4) subLst } else { # sets of 4, no need to exclude tmp <- dat$stage for (i in 1:length(idx)) tmp[dat$stage==i] <- idx[i] data.frame(dat$cancertype, stage=tmp) } }) names(lst) <- lapply(split(parts, col(parts)), paste, collapse=".") lst }), rec=F) 

dats now a list of data.frames with stage , grouped by given sections. When dialing partitions of size 3, one of the steps should have been deleted. Thus, those entries in dats displayed in the form of lists of length four, each element corresponds to the removal of a stage from consideration (the lists are ordered, so the first component removes stage 1, the second removes stage 2, etc.). Let's look at one or size 3 sections,

 dats[4] $`2.1.1` # $`2.1.1`$`1` # dat.cancertype stage # 1 TCGA-67-6215-01 NA # 2 TCGA-67-6216-01 NA # 3 TCGA-67-6217-01 2 # 4 TCGA-69-7760-01 2 # 5 TCGA-69-7761-01 NA # 6 TCGA-69-7763-01 NA # 7 TCGA-69-7764-01 NA # 8 TCGA-69-7765-01 1 # 9 TCGA-69-7980-01 NA # 10 TCGA-71-6725-01 NA # 11 TCGA-73-4658-01 NA # 12 TCGA-73-4659-01 1 # 13 TCGA-73-4662-01 NA # 14 TCGA-73-4675-01 1 # # $`2.1.1`$`2` # dat.cancertype stage # 1 TCGA-67-6215-01 2 # 2 TCGA-67-6216-01 2 # 3 TCGA-67-6217-01 NA # 4 TCGA-69-7760-01 NA # 5 TCGA-69-7761-01 2 # 6 TCGA-69-7763-01 2 # 7 TCGA-69-7764-01 2 # 8 TCGA-69-7765-01 1 # 9 TCGA-69-7980-01 2 # 10 TCGA-71-6725-01 2 # 11 TCGA-73-4658-01 2 # 12 TCGA-73-4659-01 1 # 13 TCGA-73-4662-01 2 # 14 TCGA-73-4675-01 1 

The naming convention here is group1.group2.group3$excludedGroup , and identical numbers mean that the groups have been merged. Thus, 2.1.1$1 means that the first group was excluded ( $1 , actually just converted to NA ), and in the remaining data, groups 2 and 3 were combined. This is a bit confusing, and probably needs a more naming scheme. For example, $2.1.1$1 means that stage 1 (NA) is excluded, and stage 3 and stage 4 are merged. "Thus, I could access this data using dats[['2.1.1']][['1']] . There are two more data frames in this list that are not shown (except for steps 3 and 4).

The set-4 sections are now simpler since there were no exceptions. For instance,

 dats[19] # $`2.3.1.1` # dat.cancertype stage # 1 TCGA-67-6215-01 2 # 2 TCGA-67-6216-01 2 # 3 TCGA-67-6217-01 3 # 4 TCGA-69-7760-01 3 # 5 TCGA-69-7761-01 2 # 6 TCGA-69-7763-01 2 # 7 TCGA-69-7764-01 2 # 8 TCGA-69-7765-01 1 # 9 TCGA-69-7980-01 2 # 10 TCGA-71-6725-01 2 # 11 TCGA-73-4658-01 2 # 12 TCGA-73-4659-01 1 # 13 TCGA-73-4662-01 2 # 14 TCGA-73-4675-01 1 

The naming here is "Group1.Group2.Group3.Group4". At this stage, groupings 3 and 4 were combined, for example (both == 1).

There is redundancy here, you can either go with partition sets, or size 3 with exception sets or partitions of size 4, and make a few comparisons on each data.frame . For example, from the datasets shown above, equivalent tests can be performed using dats[['2.3.1.1']] or both dats[['2.1.1']][['1']] and dats[['2.1.1']][['2']] in combination.

To simplify the task, instead of storing all this data.frame in a list, you can just save the indexes or just do the calculations in a loop.

+2
source

All Articles