A random access container that doesn't fit in memory?

I have an array of objects (say, images) that is too large to fit in memory (e.g. 40 GB). But my code should be able to randomly access these objects at runtime.

What is the best way to do this?

From my code point of view, of course, it does not matter if some of the data is on disk or temporarily stored in memory; It must have transparent access:

container.getObject(1242)->process(); container.getObject(479431)->process(); 

But how do I implement this container? Should it just send queries to the database? If so, which one would be the best option? (If the database, then it should be free and not too complicated administration, maybe Berkeley DB or sqlite?)

Should I just implement it myself, memoizing objects after acces of sand clearing memory when it is full? Or are there good libraries (C ++) for this?

The requirements for the container are that it minimizes access to the disk (some elements can be accessed most often by my code, so they should be stored in memory) and provides quick access.

UPDATE: I will find out that STXXL does not work for my problem, because the objects stored in the container have a dynamic size, that is, my code can update them (increase or decrease the size of some objects) at runtime. But STXXL cannot handle this:

STXXL containers assume that the data types that they store are plain old data types (PODs). http://algo2.iti.kit.edu/dementiev/stxxl/report/node8.html

Could you comment on other solutions? How about using a database? And which?

+6
c ++ database memory data-structures random-access
source share
5 answers

Using STXXL :

The core of STXXL is the implementation of the standard C ++ STL template library for external memory (due to the kernel) calculations, i.e. STXXL implements containers and algorithms that can handle huge amounts of data that is only suitable for disks. While STL compatibility supports ease of use and compatibility with existing applications, another development priority is high performance.

+8
source share

You can look at files with memory mapping, and then access one of them.

+1
source share

I would use a basic cache. Thanks to this working set size, you will get the best results with set-associative-cache with x-cache-cache lines (x ==, which best suits your access pattern). Just implement in software what every modern processor already has in hardware. That should give you imho better results. You can optimize it further if you can optimize accesspattern somehow linearly.

+1
source share

One solution is to use a structure like B-Tree, indexes, and β€œpages” of arrays or vectors. The concept is that an index is used to determine which page is loaded into memory to access your variable.

If the page size is smaller, you can store several pages in memory. A caching system based on frequency of use or another rule will reduce the number of page loads.

0
source share

I saw very smart code that overloads operator[]() to access the disk on the fly and load the necessary data from the disk / database transparently.

0
source share

All Articles