How fast can one “find the max in an array” possibly get?

Question

How fast can one “find the max in an array” possibly get?

This question comes from the discussion that was raised on this other issue: Parallelize an already linear time algorithm . This is not homework.

You are given an array of N and a machine with P processors and CREW shared memory (Concurrent Read, Exclusive Write memory).

What is the strictest upper bound on the fastest algorithm to find the largest number in an array? [Obviously also: what is the algorithm itself?]

I do not mean the total amount of work done [which can never be less than O (N)].

+7

algorithm complexity-theory

Arjunshankar Dec 23 '11 at 11:24

source share

5 answers

Cook, Dwork and Reischuk showed that any CREW algorithm to find the maximum n elements should work in Omega (lg n), even with an unlimited number of processors and unlimited memory. If I remember correctly, an algorithm with a corresponding upper bound appears in their document:

Stephen Cook, Cynthia Dvork and Rudiger Reischuk. Upper and lower time limits for parallel random access machines without simultaneous recording. SIAM Journal on Computing, 15 (1): 87-97, 1986.

+5

Massimo cafafo Dec 23 '11 at 15:38

source share

The following is the best estimate:

If p <= n / log n, you can do it in O (n / p); otherwise O (log n), i.e. when p> n / log n you get nothing compared to p = n / log n.

Proof - lower bound:

Requirement 1: you can never do faster than Ω (n / p), since p-processors can only give acceleration p

Requirement 2: you can never do faster than Ω (log n), because of the CREW model (see waterproof paper); if you want to check if the array 0-1 has at least one 1, you need O (log n) time.

Proof - upper bound:

Requirement 3: you can find the maximum using n / log n processors and in O (log n) time

Evidence. It is easy to find the maximum using n processors and log n time; but in fact, in this algorithm, most processors are inactive in most cases; (see, for example, the book of complexity Papadimitriou) their number can be lowered to n / log n.

Now, given fewer n / log n processors, you can give the work assigned to processors K to 1 processor, this divides the processor requirement by K and multiplies the required time by K.

Let K = (n / log n) / p; the previous algorithm works in time O (K log n) = O (n / p) and requires n / (log n * K) = p processors.

Edited: I just realized that when p <= n / log n, the dasblinkenlight algorithm has the same asymptotic runtime:

n / p + log p <= n / p + log (n / log n) <= n / p + log n <= n / p + n / p <= 2n / p = O (p / p)

so that you can use this algorithm which has O (n / p) complexity when p <= n / log n and O (log n) otherwise.

+4

sdcvvc Dec 23 '11 at 22:09

source share

I suspect this is O (N / P) + O (P)

Sharing work between processors P has cost O (P)
combining the work performed by processors P is also worth the cost of O (P)
An ideal parallel search for N elements by processors P has a time value of O (N / P)

My naive algorithm would be

write element 0 in the cell CREW with the inscription "result"
run P completely independent search queries, each through 1 / P th of N elements
At the end of each search, use CAS spinloop to replace the “result” with the partial search result, if any. (Depending on your definition of CREW, you may not need a spinloop)

+1

Eugen ieck Dec 23 '11 at 11:38

source share

For P = N ^ 2, this is O (1).

All initialize boolean array CannotBeMax [i] = FALSE

Proc (i, j) sets CannotBeMax [i] = A [i] A [J]

Max is A [CannotBeMax [i] == FALSE]

Note that all concurrent records try to write identical information, so the answer is consistent no matter which one succeeds.

+1

NPP Jul 25 '12 at 18:55

source share

dasblinkenlight · Accepted Answer · 2011-12-23T11:38:25+0000

I think O(N/P') + O(Log2(P')) , where P'=min{N,P} . P' processors look for max out of N/P' elements each, followed by Log2 pairwise merging in parallel. The first mergers P'/2 are performed by processors with an even number, then "P" / 4 "- by processors in places divisible by 8, then by 16, etc.

Change P' to cover the case where you have significantly more processor nodes than the elements you need to look for.

How fast can one “find the max in an array” possibly get?

More articles: