How many ABA tag bits are needed in non-blocking structures?

One popular solution to ABA in non-blocking structures is to bind pointers with an additional monotonously increasing tag.

struct aba { void *ptr; uint32_t tag; }; 

However, this approach has a problem. It is very slow and has huge cache problems. I can get acceleration twice as much if I remove the tag field. But is it unsafe?

So, my next attempt for 64-bit platforms fills the bits in the ptr field.

 struct aba { uintptr __ptr; }; uint32_t get_tag(struct aba aba) { return aba.__ptr >> 48U; } 

But someone told me that only 16 bits for a tag are unsafe. My new plan is to use pointer alignment with cache lines to stuff more tag bits, but I want to know if this will work.

If this does not work, my next plan is to use the Linux MAP_32BIT mmap flag for the selected data, so I only need 32 bits of pointer space.

How many bits do I need for an ABA tag in loose data structures?

+8
multithreading lock-free thread-synchronization
source share
3 answers

The number of bit bits, which is practically safe, can be estimated based on the pause time and the frequency of pointer modifications.

We remind you that the ABA problem occurs when the stream reads the value that it wants to change by means of comparison and replacement, receives a prevention, and when it resumes the actual value of the pointer, it turns out that it read the stream before, Therefore, the comparison and replacement operation can succeed, despite modifications to the data structure that can be made by other threads during the eligibility time.

The idea of ​​adding a monotonically increasing tag is to make each pointer modification unique. To succeed, increments must create unique tag values ​​over time when the modifier stream can be unloaded; that is, for guaranteed correctness, the tag cannot be damaged during the entire prevention period.

Assume that preemption lasts for a single OS scheduling interval, which typically ranges from several tens to hundreds of milliseconds. CAS latency on modern systems ranges from tens to hundreds of nanoseconds. Thus, a rough estimate of the worst case is that there can be millions of pointer modifications while the stream is unloaded, and therefore there must be 20 + bits in the tag so that it is not damaged.

In practice, a more accurate estimate can be made for a particular real use case based on the known frequency of CAS operations. It is also necessary to more accurately estimate the time of the worst case; for example, a stream with a low priority, superseded by a higher priority task, can end with a much longer continuity time.

+3
source share

According to the document

http://web.cecs.pdx.edu/~walpole/class/cs510/papers/11.pdf Hazard indicators: safe memory reclamation for insecure objects (IEEE transactions on parabolic and distributed systems, VOL. Tag bits

must be sized to make impossible impossible in real life scenarios without blocking (I can read this as if you had N threads and each of them can access the structure, you should have N + 1 different states for tags) :

6.1.1 IBM ABA-Prevention Tags

The earliest and easiest locking method for reusing the node tag method (update counter) introduced with the CAS documentation on IBM System 370 [11]. This requires a tag to be associated with each location, which is the target of ABA comparison operations. An increment tag, when the value of the associated location written comparison operations (for example, CAS) can determine, the location has been written since the last access the same stream, thereby preventing the ABA problem. The method requires that the tag contain enough bits to make a complete exception at runtime of any single lock attempt. This method is very efficient and allows you to immediately reuse remote sites.

+2
source share

Depending on the data structure, you may steal some extra bits from the pointers. For example, if objects have 64 bytes and are always aligned at 64 bytes, the lower 6 bits of each pointer can be used for tags (but this is probably what you already suggested for your new plan).

Another option would be to use an index in your objects instead of pointers.

In the case of adjacent objects, which, of course, will simply be an index into an array or vector. In the case of lists or trees with objects allocated on the heap, you can use your own allocator and use the index in the selected blocks.

For 17M objects you only need 24 bits, leaving 40 bits for tags.

Obtaining the address will require some (small and fast) additional calculation, but if the alignment is 2, only shift and addition are required.

+1
source share

All Articles