How does the Apple Objective-C runtime multithreaded reference count without sacrificing performance?

Question

How does the Apple Objective-C runtime multithreaded reference count without sacrificing performance?

So, I read this article about trying to remove the global interpreter (GIL) from the Python interpreter to improve multithreading performance and writing something interesting.

It turns out that one of the places where the GIL removal really got worse was in memory management:

With free thread, reference counting operations lose thread safety. Thus, the patch introduces a global mutex lock with reference counting along with atomic operations to update the counter. On Unix, locking is implemented using the standard pthread_mutex_t lock (wrapped inside the PyMutex structure) and the following functions ...
... In Unix, it must be emphasized that the simple manipulation of counters is replaced by at least three function calls, as well as the overhead of the actual lock. It is much more expensive ...
... Obviously, the fine sorting of link locks is the main culprit for poor performance, but even if you remove the lock, the link counting performance is still very sensitive to any additional overhead (for example, the call function, etc.). In this case, performance is still about twice as slow as Python with GIL.

and later:

Link counting is truly a paradoxical free-thread memory management technology. This was already widely known, but performance indicators gave a more specific figure. This will definitely be the most difficult issue for anyone trying to remove a GIL patch.

So the question is, if link counting is so bad for streaming, how does Objective-C do it? I wrote multi-threaded Objective-C applications and did not notice a lot of memory management overhead. Are they doing something else? Like some kind of lock on an object, not a global one? Is Objective-C reference counting actually technically unsafe with threads? I don't have enough concurrency expert to really think a lot, but I would be interested to know.

+8

performance memory-management multithreading objective-c reference-counting

Mike akers Dec 18 '12 at 10:03

source share

2 answers

In addition to what bbum himself said, many of Cocoa's most frequently thrown objects around redefine the usual link counting mechanisms and save the object built into the refcount object, which they manipulate using atomic add and subtract commands rather than locks.

(edit from the future: Objective-C now automatically performs this optimization on modern Apple platforms, mixing refcount with the 'isa' pointer)

+7

Catfish_man Dec 18 '12 at 10:35

source share

bbum · Accepted Answer · 2012-12-18T22:19:26+0000

There is overhead, and this can be significant in rare cases (for example, for micro-tests), regardless of the optimizations that exist (there are many). The usual case, however, is optimized for consistent manipulation of the reference count for an object.

So the question is, if link counting is so bad for streaming, how does Objective-C do it?

There are several locks in the game, and, in fact, saving / releasing on any given object selects a random lock (but always the same lock) for this object. Thus, reducing lock conflict, without requiring one lock for each object.

(And what Catfish_man said, some classes implement their own link counting scheme to use class-related blocking primitives to avoid competition and / or optimization for their specific needs.)

Implementation details are more complex.

Is Objectice-C reference counting actually technically unsafe with threads?

No - safe for threads.

In fact, typical code will invoke retain and release quite rarely compared to other operations. Thus, even if there were significant overheads for these code paths, it would be amortized in all other operations in the application (where, say, clicking pixels on the screen is really expensive, for comparison).

If an object is thread-separated (generally a bad idea), then blocking the overhead that protects access and data manipulation will usually be significantly more than the overhead of saving / freeing due to the oddness of saving / freeing.

Regarding the overhead of the Python GIL, I would argue that it is more related to how often the reference count goes up and down as part of the operations of the normal interpreter.

How does the Apple Objective-C runtime multithreaded reference count without sacrificing performance?

More articles: