C ++ Design: How to Cache Last Used

We have a C ++ application for which we are trying to improve performance. We determined that data extraction is time consuming and wants to cache data. We cannot store all the data in memory, because it is huge. We want to store up to 1000 elements in memory. These items can be indexed using the long key. However, when the cache size exceeds 1000, we want to delete an element that has not been reached for the longest time, since we assume some kind of "link locality", that is, suppose that the elements in the cache that were recently accessed are likely to be available again.

Can you suggest a way to implement it?

My initial implementation was to have map<long, CacheEntry> to store the cache and add an accessStamp member to CacheEntry , which would be set to an incrementing counter whenever a record was created or opened. When the cache is full and a new entry is required, the code scans the entire cache card and finds the entry with the smallest accessStamp and deletes it. The problem is that after the cache is full, each insert requires a full cache scan.

Another idea was to contain a list of CacheEntries in addition to the cache map, and with each access move the available entry to the top of the list, but the problem was how to quickly find that entry in the list.

Can you suggest a better approach?

thank
splintor

+5
c ++ design caching
Dec 20 '09 at 12:59
source share
10 answers

Have map<long,CacheEntry> , but instead of access timestamps in CacheEntry put two references to other CacheEntry objects to make entries in the form of a doubly linked list. Whenever access is done, move it to the top of the list (this is a constant-time operation). Thus, you can easily find the entry in the cache, because it turned to the map and can delete the last recently used entry, since it is at the end of the list (my preference is to make doubly linked lists circular, so a pointer to the head is enough, to get quick access to the tail). Also, do not forget to put the key that you used on the card in CacheEntry so that you can delete the entry from the card when it leaves the cache.

+14
Dec 20 '09 at 13:27
source share

Scanning a map of 1000 elements will take very little time, and scanning will be performed only when the element is not in the cache, which, if your locality of reference ideas is correct, should be a small part of the time. Of course, if your ideas are wrong, the cache is probably a waste of time.

+10
Dec 20 '09 at 13:04
source share

Update: I got it now ...

It should be fast enough. Warning, some pseudo code in front.

 // accesses contains a list of id's. The latest used id is in front(), // the oldest id is in back(). std::vector<id> accesses; std::map<id, CachedItem*> cache; CachedItem* get(long id) { if (cache.has_key(id)) { // In cache. // Move id to front of accesses. std::vector<id>::iterator pos = find(accesses.begin(), accesses.end(), id); if (pos != accesses.begin()) { accesses.erase(pos); accesses.insert(0, id); } return cache[id]; } // Not in cache, fetch and add it. CachedItem* item = noncached_fetch(id); accesses.insert(0, id); cache[id] = item; if (accesses.size() > 1000) { // Remove dead item. std::vector<id>::iterator back_it = accesses.back(); cache.erase(*back_it); accesses.pop_back(); } return item; } 

Inserts and erasures can be a little expensive, but they can also not be too bad given the location (a few misses in the cache). In any case, if they become a big problem, you can go to std :: list.

+2
Dec 20 '09 at 13:12
source share

An alternative implementation, which could facilitate the "aging" of elements, but at the cost of lower search efficiency, was to save the CacheEntry elements in std :: list (or use std::pair<long, CacheEntry> . Added at the beginning of the list, so that they β€œmigrate” to the end of the list as they age. When you check to see if an item is present in the cache, you look at the list (which, admittedly, is an O (n) operation; on the contrary, it is an O (log n) operation on map). If you find it, you will remove it from your current location and again sun avte it in the top of the list. If the list is longer than 1000 items, you remove the required number of elements from the end of the list, to cut it below the 1000 elements.

+2
Dec 20 '09 at 13:26
source share

In my approach, he needed to have a hash table to quickly search for stored objects and a linked list to save the last used sequence.

When an object is requested. 1) try to find the object from the 2.yes hash table), if found (the value has a pointer to an object in the linked list), move the object in the linked list at the top of the linked list. 2.no) if not, remove the last object from the linked list and delete the data from the hash table, and then put the object in the hash table and the top of the linked list.

For example, let's say we have a cache for only 3 objects.

Sequence of requests 1 3 2 1 4.

1) Hash table: [1] Linked list: [1]

2) Hash table: [1, 3] Linked list: [3, 1]

3) Hash table: [1,2,3] Linked list: [2,3,1]

4) Hash table: [1,2,3] Linked list: [1,2,3]

5) Hash table: [1,2,4] Linked list: [4,1,2] => 3 out

+2
09 Oct '10 at 9:32
source share

Create std: priority_queue <map <int, CacheEntry> :: iterator> with a comparator to mark access. To insert, first drag the last item out of the queue and remove it from the map. Then insert the new item into the map and finally click the iterator on the queue.

+1
Dec 20 '09 at 13:18
source share

As a simpler alternative, you can create a map that grows indefinitely and clears every 10 minutes or so (adjust the time for the expected traffic).

You can also record some very interesting statistics this way.

0
Dec 20 '09 at 13:17
source share

I agree with Neil, scanning 1000 elements takes no time.

But if you still want to do this, you can simply use the optional list that you offer, and to avoid scanning the entire list each time, instead of storing only CacheEntry on your map, you can save the CacheEntry and a pointer to your list corresponding to this entry.

0
Dec 20 '09 at 13:21
source share

I think this is a good candidate for treaps . The priority will be time (virtual or otherwise), in ascending order (older than the root) and long as the key.

There is also a second random algorithm , which is good for caches. Although you lose the ability to search, it will not make much difference if you have only 1000 elements.

The naive method should be to have a map associated with the priority queue wrapped in a class. You use the card to find and delete the queue (first remove from the queue, grab the item, and then remove the key from the card).

0
Dec 20 '09 at 13:54
source share

Another option might be to use boost :: multi_index . It is designed to separate the index from the data and so that allows you to use multiple indexes for the same data.

I'm not sure if it really will be faster than scanning through 1000 points. Then he can use more memory than good. Or slow down the search and / or insert / delete.

0
Dec 20 '09 at 14:40
source share



All Articles