How fast can one “find the max in an array” possibly get?

This question comes from the discussion that was raised on this other issue: Parallelize an already linear time algorithm . This is not homework.

You are given an array of N and a machine with P processors and CREW shared memory (Concurrent Read, Exclusive Write memory).

What is the strictest upper bound on the fastest algorithm to find the largest number in an array? [Obviously also: what is the algorithm itself?]

I do not mean the total amount of work done [which can never be less than O (N)].

+7
source share
5 answers

I think O(N/P') + O(Log2(P')) , where P'=min{N,P} . P' processors look for max out of N/P' elements each, followed by Log2 pairwise merging in parallel. The first mergers P'/2 are performed by processors with an even number, then "P" / 4 "- by processors in places divisible by 8, then by 16, etc.

Change P' to cover the case where you have significantly more processor nodes than the elements you need to look for.

+8
source

Cook, Dwork and Reischuk showed that any CREW algorithm to find the maximum n elements should work in Omega (lg n), even with an unlimited number of processors and unlimited memory. If I remember correctly, an algorithm with a corresponding upper bound appears in their document:

Stephen Cook, Cynthia Dvork and Rudiger Reischuk. Upper and lower time limits for parallel random access machines without simultaneous recording. SIAM Journal on Computing, 15 (1): 87-97, 1986.

+5
source

The following is the best estimate:

If p <= n / log n, you can do it in O (n / p); otherwise O (log n), i.e. when p> n / log n you get nothing compared to p = n / log n.

Proof - lower bound:

Requirement 1: you can never do faster than Ω (n / p), since p-processors can only give acceleration p

Requirement 2: you can never do faster than Ω (log n), because of the CREW model (see waterproof paper); if you want to check if the array 0-1 has at least one 1, you need O (log n) time.

Proof - upper bound:

Requirement 3: you can find the maximum using n / log n processors and in O (log n) time

Evidence. It is easy to find the maximum using n processors and log n time; but in fact, in this algorithm, most processors are inactive in most cases; (see, for example, the book of complexity Papadimitriou) their number can be lowered to n / log n.


Now, given fewer n / log n processors, you can give the work assigned to processors K to 1 processor, this divides the processor requirement by K and multiplies the required time by K.

Let K = (n / log n) / p; the previous algorithm works in time O (K log n) = O (n / p) and requires n / (log n * K) = p processors.


Edited: I just realized that when p <= n / log n, the dasblinkenlight algorithm has the same asymptotic runtime:

n / p + log p <= n / p + log (n / log n) <= n / p + log n <= n / p + n / p <= 2n / p = O (p / p)

so that you can use this algorithm which has O (n / p) complexity when p <= n / log n and O (log n) otherwise.

+4
source

I suspect this is O (N / P) + O (P)

  • Sharing work between processors P has cost O (P)
  • combining the work performed by processors P is also worth the cost of O (P)
  • An ideal parallel search for N elements by processors P has a time value of O (N / P)

My naive algorithm would be

  • write element 0 in the cell CREW with the inscription "result"
  • run P completely independent search queries, each through 1 / P th of N elements
  • At the end of each search, use CAS spinloop to replace the “result” with the partial search result, if any. (Depending on your definition of CREW, you may not need a spinloop)
+1
source

For P = N ^ 2, this is O (1).

All initialize boolean array CannotBeMax [i] = FALSE

Proc (i, j) sets CannotBeMax [i] = A [i] A [J]

Max is A [CannotBeMax [i] == FALSE]

Note that all concurrent records try to write identical information, so the answer is consistent no matter which one succeeds.

+1
source

All Articles