A random access container that doesn't fit in memory?

Question

A random access container that doesn't fit in memory?

I have an array of objects (say, images) that is too large to fit in memory (e.g. 40 GB). But my code should be able to randomly access these objects at runtime.

What is the best way to do this?

From my code point of view, of course, it does not matter if some of the data is on disk or temporarily stored in memory; It must have transparent access:

container.getObject(1242)->process(); container.getObject(479431)->process();

But how do I implement this container? Should it just send queries to the database? If so, which one would be the best option? (If the database, then it should be free and not too complicated administration, maybe Berkeley DB or sqlite?)

Should I just implement it myself, memoizing objects after acces of sand clearing memory when it is full? Or are there good libraries (C ++) for this?

The requirements for the container are that it minimizes access to the disk (some elements can be accessed most often by my code, so they should be stored in memory) and provides quick access.

UPDATE: I will find out that STXXL does not work for my problem, because the objects stored in the container have a dynamic size, that is, my code can update them (increase or decrease the size of some objects) at runtime. But STXXL cannot handle this:

STXXL containers assume that the data types that they store are plain old data types (PODs). http://algo2.iti.kit.edu/dementiev/stxxl/report/node8.html

Could you comment on other solutions? How about using a database? And which?

+6

c ++ database memory data-structures random-access

Frank Jan 25 '10 at 19:30

source share

5 answers

James McNellis · Answer 1 · 2010-01-25T19:34:48+0000

Using STXXL :

The core of STXXL is the implementation of the standard C ++ STL template library for external memory (due to the kernel) calculations, i.e. STXXL implements containers and algorithms that can handle huge amounts of data that is only suitable for disks. While STL compatibility supports ease of use and compatibility with existing applications, another development priority is high performance.

Liz albin · Answer 2 · 2010-01-25T19:34:11+0000

You can look at files with memory mapping, and then access one of them.

Darthcoder · Answer 3 · 2010-01-25T19:38:06+0000

I would use a basic cache. Thanks to this working set size, you will get the best results with set-associative-cache with x-cache-cache lines (x ==, which best suits your access pattern). Just implement in software what every modern processor already has in hardware. That should give you imho better results. You can optimize it further if you can optimize accesspattern somehow linearly.

Thomas Matthews · Answer 4 · 2010-01-25T22:07:45+0000

One solution is to use a structure like B-Tree, indexes, and “pages” of arrays or vectors. The concept is that an index is used to determine which page is loaded into memory to access your variable.

If the page size is smaller, you can store several pages in memory. A caching system based on frequency of use or another rule will reduce the number of page loads.

Sf. · Answer 5 · 2010-01-26T14:01:54+0000

I saw very smart code that overloads operator[]() to access the disk on the fly and load the necessary data from the disk / database transparently.

A random access container that doesn't fit in memory?

More articles: