Subset Sum Problem

Question

Subset Sum Problem

I have a problem with counting, which is a continuation of this question. I'm actually not a mathematician, so it’s very difficult for me to understand this subset sum problem , which was proposed as a resolution.

I have 4 ArrayList that stores data: alId, alTransaction, alNumber, alPrice

Type | Deal | Number | Price
8 | Buy | 95.00000000 | 305.00000000
8 | Buy | 126.00000000 | 305.00000000
8 | Buy | 93.00000000 | 306.00000000
8 | Transfer | 221.00000000 | 305.00000000
8 | Translation in | 221.00000000 | 305.00000000
8 | Sell | 93.00000000 | 360.00000000
8 | Sell | 95.00000000 | 360.00000000
8 | Sell | 126.00000000 | 360.00000000
8 | Buy | 276.00000000 | 380.00000000

In the end, I'm trying to get what is left for the client and what remains to be left in 3 lists of arrays:
- alNewHowMuch (matches alNumber),
- alNewPrice (corresponds to alPrice),
- alNewInID (refers to alID)

  ArrayList alNewHowMuch = new ArrayList(); ArrayList alNewPrice = new ArrayList(); ArrayList alNewInID = new ArrayList(); for (int i = 0; i < alTransaction.Count; i++) { string transaction = (string) alTransaction[i]; string id = (string) alID[i]; decimal price = (decimal) alPrice[i]; decimal number = (decimal) alNumber[i]; switch (transaction) { case "Transfer out": case "Sell": int index = alNewHowMuch.IndexOf(number); if (index != -1) { alNewHowMuch.RemoveAt(index); alNewPrice.RemoveAt(index); alNewInID.RemoveAt(index); } else { ArrayList alTemp = new ArrayList(); decimal sum = 0; for (int j = 0; j < alNewHowMuch.Count; j ++) { string tempid = (string) alNewInID[j]; decimal tempPrice = (decimal) alNewPrice[j]; decimal tempNumbers = (decimal) alNewHowMuch[j]; if (id == tempid && tempPrice == price) { alTemp.Add(j); sum = sum + tempNumbers; } } if (sum == number) { for (int j = alTemp.Count - 1; j >= 0; j --) { int tempIndex = (int) alTemp[j]; alNewHowMuch.RemoveAt(tempIndex); alNewPrice.RemoveAt(tempIndex); alNewInID.RemoveAt(tempIndex); } } } break; case "Transfer In": case "Buy": alNewHowMuch.Add(number); alNewPrice.Add(price); alNewInID.Add(id); break; } }

Basically, I add and remove things from the array depending on the type of transaction, transaction ID and numbers. I add numbers to ArrayList, for example 156, 340 (when it is TransferIn or Buy), etc., and then I delete them, doing it as 156, 340 (when it transfers, sells). My solution works for this without a problem. The problem is that for some old data, employees entered an amount equal to 1500 instead of 500 + 400 + 100 + 500. How would I change it so that when there is Sell/TransferOut or Buy/Transfer In and there is no match inside ArrayList, it should try adding some elements from this ArrayList and find the elements that are combined in the aggregate.

Inside my code, I tried to solve this problem by simply adding up everything when there is no match (index == 1)

  int index = alNewHowMuch.IndexOf(number); if (index != -1) { alNewHowMuch.RemoveAt(index); alNewPrice.RemoveAt(index); alNewInID.RemoveAt(index); } else { ArrayList alTemp = new ArrayList(); decimal sum = 0; for (int j = 0; j < alNewHowMuch.Count; j ++) { string tempid = (string) alNewInID[j]; decimal tempPrice = (decimal) alNewPrice[j]; decimal tempNumbers = (decimal) alNewHowMuch[j]; if (id == tempid && tempPrice == price) { alTemp.Add(j); sum = sum + tempNumbers; } } if (sum == number) { for (int j = alTemp.Count - 1; j >= 0; j --) { int tempIndex = (int) alTemp[j]; alNewHowMuch.RemoveAt(tempIndex); alNewPrice.RemoveAt(tempIndex); alNewInID.RemoveAt(tempIndex); } } }

But it works only under certain conditions and is not suitable for the rest.

Edit: Since some of you were so surprised (and blinded) by my names of Polish variables, I translated them all into English for simplicity and visibility. Hope this helps me get help :-)

+5

c # algorithm sql-server .net-3.5

Madboy Apr 25 '10 at 13:50

source share

2 answers

Here is my algorithm. It works in O(2^(n/2)) and resolves SubsetSum(1000, list-of-1000-ones) in 20 milliseconds. See Comments at the end of the IVlad post.

 using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Diagnostics; namespace SubsetSum { class Program { static void Main(string[] args) { var ns = new List<int>(); for (int i = 0; i < 1000; i++) ns.Add(1); var s1 = Stopwatch.StartNew(); bool result = SubsetSum(ns, 1000); s1.Stop(); Console.WriteLine(result); Console.WriteLine(s1.Elapsed); Console.Read(); } static bool SubsetSum(ist<int> nums, int targetL) { var left = new List<int> { 0 }; var right = new List<int> { 0 }; foreach (var n in nums) { if (left.Count < right.Count) left = Insert(n, left); else right = Insert(n, right); } int lefti = 0, righti = right.Count - 1; while (lefti < left.Count && righti >= 0) { int s = left[lefti] + right[righti]; if (s < target) lefti++; else if (s > target) righti--; else return true; } return false; } static List<int> Insert(int num, List<int> nums) { var result = new List<int>(); int lefti = 0, left = nums[0]+num; for (var righti = 0; righti < nums.Count; righti++) { int right = nums[righti]; while (left < right) { result.Add(left); left = nums[++lefti] + num; } if (right != left) result.Add(right); } while (lefti < nums.Count) result.Add(nums[lefti++] + num); return result; } } }

And here is an improved version that cuts out sets:

 static bool SubsetSum(List<int> nums, int target) { var remainingSum = nums.Sum(); var left = new List<int> { 0 }; var right = new List<int> { 0 }; foreach (var n in nums) { if (left.Count == 0 || right.Count == 0) return false; remainingSum -= n; if (left.Count < right.Count) left = Insert(n, left, target - remainingSum - right.Last(), target); else right = Insert(n, right, target - remainingSum - left.Last(), target); } int lefti = 0, righti = right.Count - 1; while (lefti < left.Count && righti >= 0) { int s = left[lefti] + right[righti]; if (s < target) lefti++; else if (s > target) righti--; else return true; } return false; } static List<int> Insert(int num, List<int> nums, int min, int max) { var result = new List<int>(); int lefti = 0, left = nums[0]+num; for (var righti = 0; righti < nums.Count; righti++) { int right = nums[righti]; while (left < right) { if (min <= left && left <= max) result.Add(left); left = nums[++lefti] + num; } if (right != left && min <= right && right <= max) result.Add(right); } while (lefti < nums.Count) { left = nums[lefti++] + num; if (min <= left && left <= max) result.Add(left); } return result; }

This last one solved the problem with 100,000 units in 5 milliseconds (but this is the best example of the algorithm, with real data it will be slower).

For your use, this algorithm is probably fast enough (and I don't see any obvious improvements). If you introduce 10,000 products with a random price from 0 to 20, and your goal is to total up to 500, this will be solved in 0.04 seconds on my laptop.

Edit: I just read on Wikipedia that the most famous algorithm is O(2^(n/2)*n) . This is O(2^(n/2)) . Am I getting a Turing Award?

+5

Jules May 03, '10 at 22:53

source share

Ivlad · Accepted Answer · 2010-04-26T10:24:39+0000

How you should do this depends on a number of important things: how many numbers will you have and how large will they be? In addition, as I understand it, your data can change (add / remove numbers, etc.), right ?. How often do you need to do these queries?

I will introduce two solutions. I suggest you use the second, as I suspect that this is better for what you need, and it is much easier to understand.

Solution 1 - Dynamic Programming

Let S[i] = true if we can make sum i and false otherwise.

 S[0] = true // we can always make sum 0: just don't choose any number S[i] = false for all i != 0 for each number i in your input for s = MaxSum downto i if ( S[s - i] == true ) S[s] = true; // if we can make the sum s - i, we can also make the sum s by adding i to the sum s - i.

To get the actual numbers that make up your sum, you must save another vector P[i] = the last number that was used to make sum i . You would update this according to the if condition above.

The complexity of this time is O(numberOfNumbers * maxSumOfAllNumbers) , which is pretty bad, especially since you have to re-run this algorithm whenever your data changes. It also slows down even for one run, while your numbers can be very large, and there can be many. In fact, "a lot" is misleading. If you have 100 numbers, and each number can be up to 10,000, you will do approximately 100 * 10,000 = 1,000,000 operations each time your data changes.

This is a good decision to know, but not very useful in practice, or at least not in your case, I think.

It is a bit of C # for the approach I suggest:

  class Program { static void Main(string[] args) { List<int> testList = new List<int>(); for (int i = 0; i < 1000; ++i) { testList.Add(1); } Console.WriteLine(SubsetSum.Find(testList, 1000)); foreach (int index in SubsetSum.GetLastResult(1000)) { Console.WriteLine(index); } } } static class SubsetSum { private static Dictionary<int, bool> memo; private static Dictionary<int, KeyValuePair<int, int>> prev; static SubsetSum() { memo = new Dictionary<int, bool>(); prev = new Dictionary<int, KeyValuePair<int, int>>(); } public static bool Find(List<int> inputArray, int sum) { memo.Clear(); prev.Clear(); memo[0] = true; prev[0] = new KeyValuePair<int,int>(-1, 0); for (int i = 0; i < inputArray.Count; ++i) { int num = inputArray[i]; for (int s = sum; s >= num; --s) { if (memo.ContainsKey(s - num) && memo[s - num] == true) { memo[s] = true; if (!prev.ContainsKey(s)) { prev[s] = new KeyValuePair<int,int>(i, num); } } } } return memo.ContainsKey(sum) && memo[sum]; } public static IEnumerable<int> GetLastResult(int sum) { while (prev[sum].Key != -1) { yield return prev[sum].Key; sum -= prev[sum].Value; } } }

Perhaps you need to do some error checking and possibly keep the last amount in the class so that you cannot call GetLastResult with a different amount than the last amount with the Find number. Anyway, this is an idea.

Solution 2 - a randomized algorithm

Now it’s easier. Save the two lists: usedNums and unusedNums . Also save the usedSum variable, which at any time contains the sum of all numbers in the usedNums list.

Whenever you need to insert a number into your set, add it to one of the two lists (it doesn’t matter, but do it randomly so that there is a relatively uniform distribution). Update usedSum accordingly.

Whenever you need to remove a number from your set, find out which of the two lists it is in. You can do this with a linear search if you have little (this time, it means more than 10,000, maybe even 100,000 on a fast computer and assuming that you do not perform this operation often and quickly. In any case, linear Search can be optimized if you need it.). After you find the number, remove it from the list. Update usedSum accordingly.

Whenever you need to find if there are numbers in your set that sum with S , use this algorithm:

 while S != usedSum if S > usedSum // our current usedSum is too small move a random number from unusedNums to usedNums and update usedSum else // our current usedSum is too big move a random number from usedNums to unusedNums and update usedSum

At the end of the algorithm, the list usedNums will give you numbers whose sum is S

This algorithm should be good for what you need, I think. It handles changes to the data set very well and works well for a large number of counters. It also does not depend on how large the numbers are, which is very useful if you have large numbers.

Please write if you have any questions.

Subset Sum Problem

More articles: