Arbitrary assign values ​​to a frame / data matrix of various size groups that satisfy several criteria

This is the next from the previous question I asked , but adds an additional level of complexity, hence a new question.

In the example below, two groups ( 39 and 380 ). I need to assign 889 people to 39 groups of 2 to 7 people , and 380 groups of 2 to 6 people . > people.

However, there is a limit to the total number of people who may belong to certain groups of groups. In the example below, the maximum value allowed for each row is in column X6.

Using the example below. If in line 2 there were six people assigned in column X2 and 120 people assigned in column X4, then the total number of people would be 18 (6 * 3) +240 (120 * 2) = 258, so that would be good, as it would be below 324.

So, for each row, I get the value X1 * X2 + X3 * X4 (to create a column X5) that is less than or equal to X6, the sum of X2 is 39, the sum of X4 is 380 and the total amount of X5 is 889. Ideally, any solution would be like possible more random (so if you repeat, you will get a different solution, if possible) and one that will work when the values ​​are different from 889, 39 and 380.

Thanks!

DF <- data.frame(matrix(0, nrow = 7, ncol = 6)) DF[,1] <- c(2:7,"Sum") DF[7,2] <- 39 DF[2:6,3] <- 2:6 DF[7,4] <- 380 DF[7,5] <- 889 DF[1:6,6] <- c(359, 324, 134, 31, 5, 2) DF[1,3:4] <- NA DF[7,3] <- NA DF[7,6] <- NA 

EDIT

The wording of my problem may be unclear. Here is an example of the code that I am currently using and how it does not meet the criteria set above.

 homeType=rep(c("a", "b"), times=c(39, 380)) H <- vector(mode="list", length(homeType)) for(i in seq(H)){ H[[i]]$type <- homeType[i] H[[i]]$n <- 0 } # Place people in houses up to max number of people npeople <- 889 for(i in seq(npeople)){ placed_in_house <- FALSE while(!placed_in_house){ house_num <- sample(length(H), 1) if(H[[house_num]]$type == "a"){ if(H[[house_num]]$n < 7){ H[[house_num]]$n <- H[[house_num]]$n + 1 placed_in_house <- TRUE } } if(H[[house_num]]$type == "b"){ if(H[[house_num]]$n < 6){ H[[house_num]]$n <- H[[house_num]]$n + 1 placed_in_house <- TRUE } } } } # move people around to get up to min number of people for(i in seq(H)){ while(H[[i]]$n < 2){ knock_on_door <- sample(length(H), 1) if( H[[knock_on_door]]$n > 2){ H[[i]]$n <- H[[i]]$n + 1 # house i takes 1 person H[[knock_on_door]]$n <- H[[knock_on_door]]$n - 1 # house knock_on_door loses 1 person } } } Ha <- H[which(lapply(H, function(x){x$type}) == "a")] Hb <- H[which(lapply(H, function(x){x$type}) == "b")] Ha_T <- data.frame(t(table(data.frame(matrix(unlist(Ha), nrow=length(Ha), byrow=T))))) Hb_T <- data.frame(t(table(data.frame(matrix(unlist(Hb), nrow=length(Hb), byrow=T))))) DF_1 <- data.frame(matrix(0, nrow = 7, ncol = 6)) DF_1[,1] <- c(2:7,"Sum") DF_1[7,2] <- 39 DF_1[2:6,3] <- 2:6 DF_1[7,4] <- 380 DF_1[7,5] <- 889 DF_1[1:6,6] <- c(359, 324, 134, 31, 5, 2) for(i in 1:nrow(Ha_T)){DF_1[as.numeric(as.character(Ha_T[i,1]))-1,2] <- Ha_T[i,3]} for(i in 1:nrow(Hb_T)){DF_1[as.numeric(as.character(Hb_T[i,1])),4] <- Hb_T[i,3]} DF_1$X5[1:6] <- (as.numeric(as.character(DF_1$X1[1:6]))*DF_1$X2[1:6])+(as.numeric(as.character(DF_1$X3[1:6]))*DF_1$X4[1:6]) DF_1$X7 <- DF_1$X2+DF_1$X4 DF_1[1,3:4] <- NA DF_1[7,3] <- NA DF_1[7,6] <- NA 

Using this example, the problem is line 2 in DF_1. The value in column X7 (X2 + X4) is greater than the allowed number indicated in column X6. What I need is a solution in which the values ​​in X7 are less than or equal to the values ​​in X6, but the sum of the columns X2, X4 and X5 (X1 * X2 + X3 * X4) is 39, 380 and 889 respectively (although these numbers change depending on the data used).

+6
source share
2 answers

The initial description of the problem in the question cannot be satisfied, since there are no values ​​that can satisfy all these restrictions.

"So, for each row, I get the value X1 * X2 + X3 * X4 (to make column X5) that is less than or equal to X6, with the sum of X2 being 39, the sum of X4 is 380, and the total amount of X5 is 889."

However, after repeating the problem in the comments, the revised description of the problem can be solved as follows.

Update: comment-based solution to the problem

As explained in the comments

"I actually don't fill out the number of houses in full. I just assign the number of children in the house." a "is from 2 to 7, and" b "is from 2 to 6, since" a "households will also include 1 adults and" b "households 2. For this area, I know how many are from 2 to 8 (419) , and how many 2,3,4,5,6,7 or 8 people (359,324,134,31,5,2). I also know the total number of households with 1 (39) or 2 (380) adults and how many children are there ( 889 in my example).

Based on this updated information, we can do the following in which we loop: 1) count how many more houses of each type can be allocated in accordance with the criteria, 2) randomly select one of the types of houses that can still be allocated without violating one from rules 3) and is repeated until all 889 children are in homes. Note that here I am using more descriptive column names to make it easier to follow the logic:

 DT <- data.table(HS1 = 2:7, # type 1 house size NH1 = 0, # number of type 1 houses with children HS2 = 1:6, # type 2 house size NH2 = 0, # number of type 2 houses with children C = 0, # number of children in houses MaxNH = c(359, 324, 134, 31, 5, 2)) # maximum number of type1+type 2 houses NR = DT[,.N] set.seed(1234) repeat { while (DT[, sum(C) < 889]) { DT[, MaxH1 := (MaxNH - NH1 - NH2)] DT[, MaxH2 := (MaxNH - NH1 - NH2)] DT[1,MaxH2 := 0 ] DT[MaxH1 > 39 - sum(NH1), MaxH1 := 39 - sum(NH1)] DT[MaxH2 > 380- sum(NH2), MaxH2 := 380- sum(NH2)] if (DT[, sum(NH1)] >= 39) DT[, MaxH1 := 0] if (DT[, sum(NH2)] >= 380) DT[, MaxH1 := 0] if (DT[, all(MaxH1==0) & all(MaxH2==0)]) { # check if it is not possible to assign anyone else to a group print("No solution found. Check constraints or try again") break } # If you wish to preferentially fill a particular type of house, then change the probability weights in the next line accordingly newgroup = sample(2*NR, 1, prob = DT[, c(MaxH1, MaxH2)]) if (newgroup > NR) DT[rep(1:NR, 2)[newgroup], NH2 := NH2+1] else DT[rep(1:NR, 2)[newgroup], NH1 := NH1+1] DT[, C := HS1*NH1 + HS2*NH2] } if (DT[, sum(C)==889]) break } DT[,1:6, with=F] # HS1 NH1 HS2 NH2 C MaxNH #1: 2 7 1 0 14 359 #2: 3 7 2 218 457 324 #3: 4 14 3 76 284 134 #4: 5 9 4 14 101 31 #5: 6 2 5 3 27 5 #6: 7 0 6 1 6 2 colSums(DT[, .(NH1, NH2, C)]) # NH1 NH2 C # 39 312 889 
+1
source

This code provides a check to see if the generated data meets the criteria. With each iteration, it stops to decide the user to continue trying. For me, the selection process never fell below 348 b-houses of 2 people each, and therefore the result always violated the second condition (less than 324 houses). Should house types a and b be offset in df?

 df <- data.frame(a=2:7, afreq=0, b=c(0,2:6), bfreq=0, housed=0, houses=500, correct=c(359, 324, 134, 31, 5, 2)) H <- data.frame(type=homeType, n=0) # using df instead of lists, easier for me npeople <- 889 while(any(df$houses > df$correct)){ H <- data.frame(type=homeType, n=0) # This code is yours, changed to df for(i in 1:npeople){ placed_in_house <- FALSE while(!placed_in_house){ house_num <- sample(nrow(H), 1) if(H$type[house_num] == "a"){ if(H$n[house_num] < 7){ H$n[house_num] <- H$n[house_num] + 1 placed_in_house <- TRUE } } if(H$type[house_num] == "b"){ if(H$n[house_num] < 6){ H$n[house_num] <- H$n[house_num] + 1 placed_in_house <- TRUE } } } } # Subsets of houses with lack of people and possible sources # This is iterative to randomize the full dataset Hempty <- which(H$n < 2) Hfull <- which(H$n >= 2) k <- 1 # effort counter while(length(Hempty) > 0){ for(hempty in Hempty){ knock_on_door <- sample(Hfull, 1) H$n[knock_on_door] <- H$n[knock_on_door] - 1 # moves from a full house H$n[hempty] <- H$n[hempty] + 1 # moves into an empty house } Hempty <- which(H$n < 2) Hfull <- which(H$n >= 2) print(paste("Iteration:", k, ", remaining empty houses:", length(Hempty))) k <- k + 1 } # Frequencies how many houses house how many people freqs <- data.frame(table(H)) df$afreq[match(freqs$n[freqs$type == "a"], df$a)] <- freqs$Freq[freqs$type == "a"] df$bfreq[match(freqs$n[freqs$type == "b"], df$b)] <- freqs$Freq[freqs$type == "b"] df$housed <- df[,1]*df[,2] + df[,3]*df[,4] df$houses <- df$afreq + df$bfreq # Check what is wrong with the occupancy and let user have a say print(df) if(any(df$houses > df$correct)){ readline("There are more houses with a number of occupants than permitter. Hit [enter]") } } 
0
source

All Articles