Reduce peak memory usage with @autoreleasepool

I am working on an iPad application that has a synchronization process that uses web services and Core Data in a tight loop. To reduce memory in accordance with Apple's recommendation , I periodically allocate and erase NSAutoreleasePool . This currently works great, and there is no memory problem in the current application. However, I plan to migrate to ARC, where NSAutoreleasePool no longer valid and would like to maintain the same performance. I created some examples and timed them, and I wonder what is the best way to use ARC to achieve the same performance and maintain code readability .

For testing purposes, I came up with 3 scenarios, each of which creates a line with a number from 1 to 10,000,000. I ran each example 3 times to determine how much time they took using a 64-bit Mac application using the Apple LLVM 3.0 compiler ( without gdb-O0) and Xcode 4.2. I also used each example with tools to see what a peak of memory is.

Each of the examples below is contained in the following code block:

 int main (int argc, const char * argv[]) { @autoreleasepool { NSDate *now = [NSDate date]; //Code Example ... NSTimeInterval interval = [now timeIntervalSinceNow]; printf("Duration: %f\n", interval); } } 

NSAutoreleasePool Batch [Original pre-ARC] (peak memory: ~ 116KB)

  static const NSUInteger BATCH_SIZE = 1500; NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init]; for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++) { NSString *text = [NSString stringWithFormat:@"%u", count + 1U]; [text class]; if((count + 1) % BATCH_SIZE == 0) { [pool drain]; pool = [[NSAutoreleasePool alloc] init]; } } [pool drain]; 

Lead time:
10.928158
10.912849
11.084716


Outer @autoreleasepool (peak memory: ~ 382 MB)

  @autoreleasepool { for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++) { NSString *text = [NSString stringWithFormat:@"%u", count + 1U]; [text class]; } } 

Lead time:
11.489350
11.310462
11.344662


Internal @autoreleasepool (peak memory: ~ 61.2 KB)

  for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++) { @autoreleasepool { NSString *text = [NSString stringWithFormat:@"%u", count + 1U]; [text class]; } } 

Lead time:
03/14/1112
14.284014
14.099625


@autoreleasepool w / goto (peak memory: ~ 115KB)

  static const NSUInteger BATCH_SIZE = 1500; uint32_t count = 0; next_batch: @autoreleasepool { for(;count < MAX_ALLOCATIONS; count++) { NSString *text = [NSString stringWithFormat:@"%u", count + 1U]; [text class]; if((count + 1) % BATCH_SIZE == 0) { count++; //Increment count manually goto next_batch; } } } 

Lead time:
10.908756
10.960189
11.018382

The goto suggested the closest performance, but uses goto . Any thoughts?

Update:

Note. The goto is the normal output for @autoreleasepool, as indicated in the documentation , and there will be no memory leak.

When you enter, the auto-advertisement pool will be pressed. With a normal exit (break, return, goto, fall-through, etc.), the autoresist pool opens. For compatibility with existing code, if the exit is due to an exception, the resource pool is not unloaded.

+7
source share
2 answers

The following should achieve the same as goto answer without goto :

 for (NSUInteger count = 0; count < MAX_ALLOCATIONS;) { @autoreleasepool { for (NSUInteger j = 0; j < BATCH_SIZE && count < MAX_ALLOCATIONS; j++, count++) { NSString *text = [NSString stringWithFormat:@"%u", count + 1U]; [text class]; } } } 
+9
source

Note that ARC provides significant optimization that is not included in -O0 . If you intend to measure performance in ARC, you should test with optimizations enabled. Otherwise, you will measure your configured save / release from placement in naive ARC mode.

Repeat the tests with optimization and see what happens.

Refresh . I was curious, so I started it myself. These are the results of execution in Release (-Os) mode with 7,000,000 distributions.

 arc-perf[43645:f803] outer: 8.1259 arc-perf[43645:f803] outer: 8.2089 arc-perf[43645:f803] outer: 9.1104 arc-perf[43645:f803] inner: 8.4817 arc-perf[43645:f803] inner: 8.3687 arc-perf[43645:f803] inner: 8.5470 arc-perf[43645:f803] withGoto: 7.6133 arc-perf[43645:f803] withGoto: 7.7465 arc-perf[43645:f803] withGoto: 7.7007 arc-perf[43645:f803] non-ARC: 7.3443 arc-perf[43645:f803] non-ARC: 7.3188 arc-perf[43645:f803] non-ARC: 7.3098 

And peaks of memory (only work with 100,000 allocations, because the tools took forever):

 Outer: 2.55 MB Inner: 723 KB withGoto: ~747 KB Non-ARC: ~748 KB 

These results surprise me a little. Well, peak memory results don't; this is exactly what you expect. But the time difference between inner and withGoto , even when optimization is turned on, is higher than I expected.

Of course, this is a somewhat pathological micro-test, which is unlikely to be able to simulate the real performance of any application. The conclusion here is that ARC can indeed impose some overhead, but you should always measure your actual application before making assumptions.

(Also, I tested @ipmcc's answer using nested for loops, it behaved almost the same as goto version.)

+2
source

All Articles