Parallel implementation of the reduction algorithm

I studied the implementations of the [inject, fold, whatever you want to call it] reduction function in Objective-C using blocks and wondered if there were any methods for parallelizing computations that use the associative function (for example, the sum of a set of integers )?

i.e. is it possible to parallelize or improve something like this on NSArray:

- (id)reduceWithBlock:(id (^)(id memo, id obj))block andAccumulator:(id)accumulator { id acc = [[accumulator copy] autorelease]; for (id obj in self) { acc = block(acc, obj); } return acc; } 

Using central central dispatch?

EDIT: I made a second attempt, breaking the array into smaller pieces and reducing them in separate send queues, but in my testing there is no noticeable performance increase: (here)

+4
source share
2 answers

You can use dispatch_apply with Dispatch Global Queue to parallelize it, but your code seems to be less efficient at the same time. Because the storage object requires exclusive access and is used tightly by the unit, this way it will cause a gigantic lock for the battery object.

For example, this code is almost non-competitive, although when using dispatch_apply with Dispatch Global Queue.

 dispatch_semaphore_t sema = dispatch_semaphore_create(1); dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); dispatch_apply([array count], queue, ^(size_t index) { dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER); acc = block(acc, [array objectAtIndex:index]); dispatch_semaphore_signal(sema); }); dispatch_release(sema); 

For efficient parallelization, you need to separate the block and the implementation of the battery.

Edition:

(I did not check your code algorithm.)

 dispatch_queue_t result_queue = dispatch_queue_create(NULL, NULL); 

You are using a sequential queue. A sequential queue executes one block at a time. So it could be

 dispatch_queue_t result_queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0); 

or

 dispatch_queue_t result_queue = dispatch_queue_create(NULL, DISPATCH_QUEUE_CONCURRENT); /* DISPATCH_QUEUE_CONCURRENT is only available OS X 10.7/iOS 4.3 or later. */ 
+6
source

I implemented a parallel separation and conquest algorithm that works with associative functions here . Unfortunately, I could not get from this noticeable acceleration, so at the moment I'm sticking to a simple serial version. I believe that my basic case needs to be optimized - I read somewhere that the inequality n >= p^2 should be fulfilled, where n is the number of jobs and p is the number of processors.

Obviously, a lot of time is wasted when splitting and recursing the array if someone has suggestions that they really like.

+1
source

All Articles