If you change the Horowitz-Sahni algorithm correctly, it will be slightly slower than the original Horowitz-Sahni. Recall that Horowitz-Sahni has two lists of subsets of sums: sums of subsets in the left half of the original list and sums of subsets in the right half. Call these two lists of sums L and R. To get the subsets that sum with some fixed value of A, you can sort R and then look for a number in R that matches each number in L using a binary search. However, the algorithm is asymmetric only to maintain a constant factor in space and time. It is a good idea for this problem to sort both L and R.
In my code below, I also cancel L. Then you can save two pointers in R, updated for each record in L: a pointer to the last record in R, which is too small, and a pointer to the first record in R, which is too high. When you move to the next entry in L, each pointer can either move forward or stay in place, but they will not have to move backward. Thus, the second stage of the Horowitz-Sahni algorithm takes only linear time in the data generated in the first stage, plus linear time in the output length. Until the constant factor, you cannot do better (after you have completed this “mid-visit” algorithm).
Here is the Python code with an example input:
# Input terms = [29371, 108810, 124019, 267363, 298330, 368607, 438140, 453243, 515250, 575143, 695146, 840979, 868052, 999760] (A,B) = (500000,600000) # Subset iterator stolen from Sage def subsets(X): yield []; pairs = [] for x in X: pairs.append((2**len(pairs),x)) for w in xrange(2**(len(pairs)-1), 2**(len(pairs))): yield [x for m, x in pairs if m & w] # Modified Horowitz-Sahni with toolow and toohigh indices L = sorted([(sum(S),S) for S in subsets(terms[:len(terms)/2])]) R = sorted([(sum(S),S) for S in subsets(terms[len(terms)/2:])]) (toolow,toohigh) = (-1,0) for (Lsum,S) in reversed(L): while R[toolow+1][0] < A-Lsum and toolow < len(R)-1: toolow += 1 while R[toohigh][0] <= B-Lsum and toohigh < len(R): toohigh += 1 for n in xrange(toolow+1,toohigh): print '+'.join(map(str,S+R[n][1])),'=',sum(S+R[n][1])
Moron (I think he should change his username) raises the reasonable problem of optimizing the algorithm a bit, skipping one of them. In fact, since each list of L and R is a list of sizes of subsets, you can do combined generation and sort each of them in linear time! (T. E. Linear in list lengths.) L is the union of two lists of sums, those that include the first term, the term [0], and those that do not. Therefore, in fact, you should just make one of these halves in sorted form, add a constant and then merge the two sorted lists. If you apply this idea recursively, you save the logarithmic coefficient over time to make sorted L, i.e. The coefficient N in the original variable of the problem. This provides a good reason to sort both lists as they are created. If you sort only one list, you have several binary searches that can re-enter this coefficient N; at best, you need to optimize them somehow.
At first glance, the O (N) factor can still exist for another reason: if you want not only the sum of the subset, but also the subset that makes up the sum, then it looks like O (N) time and space for storing each subset in L and in R. However, there is a data exchange trick that also gets rid of this O (N) coefficient. The first step is to save each subset of the left or right half as a linked list of bits (1 if the term is included, 0 if it is not included). Then, when the list L is doubled in size, as in the previous paragraph, the two linked lists for the subset and its partner can be shared, except in the chapter:
0 | v 1 -> 1 -> 0 -> ...
Actually, this trick of the linked list is an artifact of the cost model and will never be really useful. Since in order to have pointers in RAM with a value of O (1), you need to define data words with bits O (log (memory)). But if you have data words of this size, you can store each word as a vector with one bit, and not with this pointer structure. Ie, if you need less than gigaword memory, you can save each subset in a 32-bit word. If you need more than gigaword, then you have 64-bit architecture or emulation of it (or maybe 48 bits), and you can still store each subset in one word. If you are correcting the RAM cost model to account for the size of a word, then this factor N has never been like that.
So, interestingly, the time complexity of the original Horowitz-Sahni algorithm is not equal to O (N * 2 ^ (N / 2)), it is O (2 ^ (N / 2)). Similarly, the time complexity of this problem is O (K + 2 ^ (N / 2)), where K is the output length.