The smallest subset of an array whose sum is not less than the key

Given an array (we allow non-negative integers), we need to find the smallest subset of length, such that the sum of the elements is not less than K. K is another integer provided as an input.

Is it possible to have a solution with time complexity O (n) [large oh of n]?

my current thinking consists of the following lines: we could sort the array in O (n * log n), and then iterate over the sorted array, starting with the largest number and save the current amount until the current amount becomes> = K.

However, this would be the worst running time of O (n * (log n + 1)).

So, if anyone could share ideas on doing this in O (n) time, I would really appreciate it.

Note. Subarray elements should not be the sequential sequence of the source array in this context

+6
source share
6 answers

There is a linear time algorithm for finding the K largest numbers - http://en.wikipedia.org/wiki/Selection_algorithm . Of course, what you want is just big numbers to sum at least K.

In the standard selection algorithm, you take a random rod, and then see how many numbers fall on each side. Then you accept or decline half and continue to work on the other half. You just looked at each number in each half, in turn - the cost of each stage of the turn is linear, but the amount of data considered at each stage decreases quickly enough that the total cost remains only linear.

The cost of the turn step will still be linear if you take the sum of all the numbers above the turn point. Using this, you can work if accepting all these numbers together with any previously selected numbers will give you a set of numbers that will contain at least K. If so, you can cut out other numbers and use the numbers above the axis for the next pass. If this is not the case, you can take all the numbers above the pivot point and use the numbers below the pivot point for the next pass. As in the selection algorithm, the core itself and any connections give you several special cases and the opportunity to find the exact answer earlier.

(So, I think you can do this in (randomized) linear time using a modified version of the selection algorithm in which you look at the sum of the numbers above the pivot, rather than the number of numbers above the pivot.

+4
source

This seems like a problem for dynamic programming. When you create an array, you create another array containing the cumulative sum up to each specific index. Therefore, each i in this array has sums from 1..i .

Now it is easy to see that the sum of the values ​​for the indices p..q is SUM(q) - SUM(p-1) (with the special case that SUM(0) - 0 ). Obviously, here I am using indexes based on 1 ... This operation is O (1), so now you only need the O (n) algorithm to find the best one.

A simple solution is to track p and q and move them around the array. You start with q . Then you cut p and expand q several times, like a caterpillar scanning your array.

To expand q :

 p <- 1 q <- 1 while SUM(q) - SUM(p-1) < K q <- q + 1 end while 

Now q is in a position where the sum of the submarine has just exceeded (or is equal to) K The length of the subarray is q - p + 1 .

After the q loop, you check to see if the length of the subarray is less than your current one. Then you advance p one step (so you don't accidentally miss the optimal solution) and go back in.

You don't need to create a SUM array ... You can just build the sum of the subarray as you go ... You need to return to using the "real" p instead of once before that.

 subsum <- VAL(1) p <- 1 q <- 1 while q <= N -- Expand while q < N and subsum < K q <- q + 1 subsum <- subsum + VAL(q) end while -- Check the length against our current best len <- q - p + 1 if len < bestlen ... end if -- Contract subsum <- subsum - VAL(p) p <- p + 1 end while 

Notes:

j_random_hacker said: this will help explain why it is acceptable to consider only O (n) different subarrays, what this algorithm considers, and not all O (n ^ 2) possible different subarrays

The philosophy of dynamic programming:

  • the solution path that leads to a suboptimal result does not follow; and
  • use the knowledge of previous solutions to calculate a new solution.

In this case, one candidate for a solution (some (p,q) such that p <= q ) is calculated by summing the elements. Since these elements are positive integers, we know that for any candidate candidate (p,q) (p,q+1) will be more candidate solutions (p,q+1) .

So, we know that if (p,q) is a minimal solution, then (p,q+1) is not. We finish our search as soon as we have a candidate, and check if this candidate is better than any that we have seen so far. This means that for each p we only need to check one candidate. This leads to the fact that both p and q only increase, and, therefore, the search is linear.

Another part of this (using the previous solutions) comes from recognizing that sum(p,q+1) = sum(p,q) + X(q+1) and similarly sum(p+1,q) = sum(p,q) - X(p) . Therefore, we do not need to sum all the elements between p and q at each step. We only need to add or subtract one value when we advance one of the search pointers.

Hope this helps.

+4
source

The OP clarified in its responses to the comments that the problem is to find a subset, NOT necessarily a continuous sequence (the term “subarray” was admittedly poor). Then, I believe that the method specified by mcdowella is correct, including the following steps:

Starting with N elements, find the MEDIAN element (i.e., the (N / 2) th element representing a sorted array that you do not have and not created). This is achieved using the Median of Medians algorithm, proven as O (n), see Wiki ref already defined and repeated here: Selection Algorithm, see Section on the Median of the Median Algorithm

Having a middle element: scan a linearly complete set and divide “lower” and “higher”, meanwhile adding up, counting and doing what you want to track for each of the “halves”. This step is (also) O (N).

When you finish scanning, if the “upper half” is the sum above the target (K), you will forget everything about the lower half and repeat the procedure for the upper half, the size of which is (approximately) N / 2. If, on the other hand, the “upper half” of the sum is less than K, then you add this upper half to the final result, subtract its sum from K and repeat the procedure with the lower half.

In general, you process sets of sizes N, N / 2, N / 4, N / 8, etc., each in O (M) relative to their respective sizes M, and therefore the overall material is also linear in N, because N + N / 2 + N / 4 + N / 8 ... remains below 2N.

+3
source

Here is a solution that should be fast enough. I guess this is almost linear.

 def solve(A, k): assert sum(A) >= k max_ = max(A) min_ = min(A) n = len(A) if sum(A) - min_ < k: return A bucket_size = (max_ - min_)/n + 1 buckets = [] for i in range(n): buckets.append([]) for item in A: bucket = (item - min_)/bucket_size buckets[bucket].append(item) solution = [] while True: bucket = buckets.pop() #the last bucket sum_ = sum(bucket) if sum_ >= k: #don't need everything from this bucket return solution + solve(bucket, k) else: k -= sum_ solution += bucket print solve([5,2,7,52,30,12,18], 100) "[52, 30, 18]" 
+1
source

I believe the term "auxiliary array" refers to the adjacent part of the array ( as here , another problem as an example).

Thus, there is a simple O (n) algorithm for finding the minimum length:

Set two indexes (left, right) in the first element and move them to the end of the array. Check the sum between these indices, the forward indicator, if the sum is too small (or the pointers are equal), move left if the sum is large

0
source

A subarray must be contiguous by definition of an array.

Use 2 pointers (start, end). Initialize them before the start of the array. Keep track of the current amount between (start, end) and move right one at a time. Each time you move the end pointer, sum = sum + array [end].

And when sum> = target, start moving to the right and continue to track the sum as sum = sum - array [start].

When moving to the right, continue to verify that the amount is still no less than the goal. And we also need to track the length by doing length = end-start + 1, as well as minLength = min (minLength, length).

Now that we have moved both pointers as right as possible, we just need to return minLength.

The general idea is to first find a “window” that satisfies the condition (sum> = target), then shift the window to the right one element at a time and maintain a minimum window size each time we move the window.

0
source

All Articles