When you group steps together, you break sets of sizes 3 or 4. There is a partitions package that implements a given split into setparts . Here I focus on this merging part, because it sounds like youโve already sorted out an unconnected grouping.
#
dats now a list of data.frames with stage , grouped by given sections. When dialing partitions of size 3, one of the steps should have been deleted. Thus, those entries in dats displayed in the form of lists of length four, each element corresponds to the removal of a stage from consideration (the lists are ordered, so the first component removes stage 1, the second removes stage 2, etc.). Let's look at one or size 3 sections,
dats[4] $`2.1.1` # $`2.1.1`$`1` # dat.cancertype stage # 1 TCGA-67-6215-01 NA # 2 TCGA-67-6216-01 NA # 3 TCGA-67-6217-01 2 # 4 TCGA-69-7760-01 2 # 5 TCGA-69-7761-01 NA # 6 TCGA-69-7763-01 NA # 7 TCGA-69-7764-01 NA # 8 TCGA-69-7765-01 1 # 9 TCGA-69-7980-01 NA # 10 TCGA-71-6725-01 NA # 11 TCGA-73-4658-01 NA # 12 TCGA-73-4659-01 1 # 13 TCGA-73-4662-01 NA # 14 TCGA-73-4675-01 1 # # $`2.1.1`$`2` # dat.cancertype stage # 1 TCGA-67-6215-01 2 # 2 TCGA-67-6216-01 2 # 3 TCGA-67-6217-01 NA # 4 TCGA-69-7760-01 NA # 5 TCGA-69-7761-01 2 # 6 TCGA-69-7763-01 2 # 7 TCGA-69-7764-01 2 # 8 TCGA-69-7765-01 1 # 9 TCGA-69-7980-01 2 # 10 TCGA-71-6725-01 2 # 11 TCGA-73-4658-01 2 # 12 TCGA-73-4659-01 1 # 13 TCGA-73-4662-01 2 # 14 TCGA-73-4675-01 1
The naming convention here is group1.group2.group3$excludedGroup , and identical numbers mean that the groups have been merged. Thus, 2.1.1$1 means that the first group was excluded ( $1 , actually just converted to NA ), and in the remaining data, groups 2 and 3 were combined. This is a bit confusing, and probably needs a more naming scheme. For example, $2.1.1$1 means that stage 1 (NA) is excluded, and stage 3 and stage 4 are merged. "Thus, I could access this data using dats[['2.1.1']][['1']] . There are two more data frames in this list that are not shown (except for steps 3 and 4).
The set-4 sections are now simpler since there were no exceptions. For instance,
dats[19]
The naming here is "Group1.Group2.Group3.Group4". At this stage, groupings 3 and 4 were combined, for example (both == 1).
There is redundancy here, you can either go with partition sets, or size 3 with exception sets or partitions of size 4, and make a few comparisons on each data.frame . For example, from the datasets shown above, equivalent tests can be performed using dats[['2.3.1.1']] or both dats[['2.1.1']][['1']] and dats[['2.1.1']][['2']] in combination.
To simplify the task, instead of storing all this data.frame in a list, you can just save the indexes or just do the calculations in a loop.