Limiting the distribution of objects across multiple threads

I have an application that retrieves and caches the results of a client request and sends the results to the client from the cache.

I have a limit on the number of elements that can be cached at any given time, and tracking this limit significantly reduced application performance when processing a large number of simultaneous requests. Is there a better way to solve this problem without blocking so often that it can improve performance?

Edit: I went with the CAS approach and it seems to work very well.

+4
source share
2 answers

First, instead of using locks, use atomic decrements and compare-swap to manipulate your counter. The syntax for this depends on your compiler; in GCC you can do something like:

long remaining_cache_slots; void release() { __sync_add_and_fetch(&remaining_cache_slots, 1); } // Returns false if we've hit our cache limit bool acquire() { long prev_value, new_value; do { prev_value = remaining_cache_slots; if (prev_value <= 0) return false; new_value = prev_value - 1; } while(!__sync_bool_compare_and_swap(&remaining_cache_slots, prev_value, new_value)); return true; } 

This should help reduce the window for approval. However, you will still bounce this line of cache everywhere, which at a high request rate can seriously damage your performance.

If you agree to accept a certain amount of waste (i.e. allow the number of cached results (or rather, pending responses), slightly below the limit), you have other options. One of them is to make the cache stream local (if possible in your project). Another way is for each thread to reserve a pool of cache tokens.

What I mean by reserving a pool of cache tokens is that each thread can reserve in advance the right to insert N entries into the cache. When this thread deletes an entry from the cache, it adds it to its set of tokens; if it ends with tokens, it tries to get them from the global pool, and if there are too many, it returns some of them. The code might look something like this:

 long global_cache_token_pool; __thread long thread_local_token_pool = 0; // Release 10 tokens to the global pool when we go over 20 // The maximum waste for this scheme is 20 * nthreads #define THREAD_TOKEN_POOL_HIGHWATER 20 #define THREAD_TOKEN_POOL_RELEASECT 10 // If we run out, acquire 5 tokens from the global pool #define THREAD_TOKEN_POOL_ACQUIRECT 5 void release() { thread_local_token_pool++; if (thread_local_token_pool > THREAD_TOKEN_POOL_HIGHWATER) { thread_local_token_pool -= THREAD_TOKEN_POOL_RELEASECT; __sync_fetch_and_add(&global_token_pool, THREAD_TOKEN_POOL_RELEASECT); } } bool acquire() { if (thread_local_token_pool > 0) { thread_local_token_pool--; return true; } long prev_val, new_val, acquired; do { prev_val = global_token_pool; acquired = std::min(THREAD_TOKEN_POOL_ACQUIRECT, prev_val); if (acquired <= 0) return false; new_val = prev_val - acquired; } while (!__sync_bool_compare_and_swap(&remaining_cache_slots, prev_value, new_value)); thread_local_token_pool = acquired - 1; return true; } 

Performing such requests in this way reduces the frequency with which threads access shared data, and therefore the number of conflicts and cache failures. However, as mentioned earlier, this makes your limit a little less accurate and therefore requires careful tuning to get the right balance.

+3
source

In SendResults try updating totalResultsCached only once after processing the results. This minimizes the time taken to acquire / release the lock.

 void SendResults( int resultsToSend, Request *request ) { for (int i=0; i<resultsToSend; ++i) { send(request.remove()) } lock totalResultsCached totalResultsCached -= resultsToSend; unlock totalResultsCached } 

If resultsToSend usually 1, then my suggestion will not really matter.

In addition, after reaching the cache limit, some additional requests may be deleted in ResultCallback , because SendResults does not update totalResultsCached immediately after sending each request.

+1
source

All Articles