Extension of TheGreatContini idea:
First try
Let's look at this as looking for combinations belonging to AxB with sets of strings A and B. These combinations must satisfy a condition or condition, but we also assume that the cracking weight a is not less than b (to avoid some duplicates).
Now divide A by A0 (lines starting with 0) and A1 (lines starting with 1). Do the same for B. Now we have brought the problem to three smaller problems: A0xB1, A1xB1 and A1xB0. If A and B were the same, then A0xB1 and A1xB0 are the same, so we only need to do this. These three subtasks are not only less combined than the first, we also thoroughly checked the first column and can now ignore it.
To solve these subtasks, we can use the same approach, but now with column 2, 3, ... At some point, we will either check all the columns, or #A and #B will be 1.
Depending on the implementation, faster acceleration may be more efficient. At this point, we can perform an exhaustive check of the remaining combinations. Remember that if we have already checked k columns, it will only cost mk per combination.
Better Column Selection
As TheGreatContini suggested, instead of selecting the first column, we can select the column that will lead to the smallest subtasks. The cost of finding this column at each step is quite high, but the scales can be calculated once at the beginning, and then used as an estimate for the best column. Then we can reorder the columns using the algorithm, as usual, after this is done, rebuild them again.
The exact best column would be the one for which the number of zeros in A times the number of zeros in B is the maximum.
Hamming Weight Trimming
We know that the sum of the weights a and b must be at least m. And since we assumed that this is the highest weight to hold, we can remove all a values โโthat have a storage weight of less than m / 2. (the acceleration it gives may be careless, I'm not sure). The calculation of all weights for interference is O (m * n).
Effective separation
If we sort the rows, grouping can be done much faster using the bisection algorithm. It can also lead to efficient representation of sets in memory. We can simply specify the minimum and maximum lines. Sorting can be done in O (n * m * log (n)). Then the splitting can be done in log (n).
Here is code that will not compile, but should give the right idea.
private List<Row> rows; public int findFirstOne(int column, int start, int end){ if(rows.get(start).get(column) == 1) return start; if(rows.get(end).get(column) == 0) return -1; while(start < end){ int mid = (start+end)/2; if(rows.get(mid).get(column) == 0){ start = mid+1; }else{ end = mid; } } return start; }
Complexity
In the following calculations, the effect of the best choice of columns is ignored, since it will not improve the efficiency of the worst case. However, in medium cases, this can lead to significant improvement, reducing the search space as soon as possible and thereby speeding up other checks.
The running time of the algorithms is limited to nยฒm. However, the worst examples I found are all O (n * log (n) * m).
First, the sorting of the matrix will be O (n * log (n) * m) for the rows and, optionally, the sorting of the columns will be O (n * m + m * log (m)).
Then, create subtasks. Reevaluate first. We need to divide no more than 2 times, and the cost of the full level of units at a depth I can be overestimated as log (n) * 3 ^ i (the cost of splitting units at times). This leads to the sum O (log (n) * 3 ^ m).
Also, 3 ^ i <= nยฒ / 2 must be fulfilled, since this is the maximum possible number of combinations, so for large m it overlaps with O (nยฒ * log (n) * m). I'm struggling to find an example that really behaves this way.
I think it is reasonable to assume that many of the subtasks became very trivial very early. Going to O (log (n) * m * n) (if anyone wants to check this out, I'm not sure about that).