Using a large Python memory variable

Let's say that at runtime there is a dict variable that becomes very large - up to millions of key-value pairs.

Is this variable stored in RAM, efficiently using all available memory and slowing down the rest of the system?

Asking the interpreter to display the entire dict is a bad idea, but would it be nice if one key were available at a time?

+7
variables python memory ram
source share
4 answers

Yes, the dict will be stored in the process memory. Therefore, if it becomes so large that there is not enough space in the RAM of the system, you can expect a significant slowdown, as the system begins to exchange memory for and from the disk.

Others said several million items should not be a problem; I'm not quite sure. Significant lack of memory (before counting memory made by keys and values). For Python 2.6 or later, sys.getsizeof provides some useful information on how much RAM of various Python structures it takes. Some quick results from Python 2.6 on a 64-bit OS X machine:

>>> from sys import getsizeof >>> getsizeof(dict((n, 0) for n in range(5462)))/5462. 144.03368729403149 >>> getsizeof(dict((n, 0) for n in range(5461)))/5461. 36.053470060428495 

Thus, the dict overhead varies between 36 bytes per element and 144 bytes per element on this computer (the exact value depends on how complete the internal hash of the dictionary is; here 5461 = 2 ** 14 // 3 is one of the thresholds, where the internal hash table is increased). And this, before adding the overhead for the dict elements themselves; if they are all short lines (say, 6 characters or less), then this still adds> = 80 bytes per element (possibly less if many different keys have the same value).

Thus, on a typical machine, it will not take many millions of dict items to output RAM.

+8
source share

The main problem with millions of elements is not the dictionary itself, but how much space is in each of these items. However, if you are not doing something strange, they probably should be appropriate.

If you have a voice recorder with millions of keys, you are probably doing something wrong. You must do one or both:

  • Think about which data structure you should use, because one dict is probably not the right answer. Exactly what it will depend on what you do.

  • Use the database. Your Python must ship with the sqlite3 module to get started.

+5
source share

Yes, Python dict is stored in RAM. However, several million keys are not a problem for modern computers. If you need more and more data and the operating system is not working, consider using a real database. Parameters include a relational database such as SQLite (incidentally built into Python) or a key value store such as Redis.

The interpreter makes little sense to display millions of elements, but access to one element should be very effective.

+4
source share

For everyone I know, Python uses the best hashing algorithms, so you are likely to get the maximum efficiency and memory performance. Now, whether all the content is stored in RAM or transferred to the swap file depends on your OS and depends on the amount of RAM. I would say that it is best to just try:

 from random import randint a = {} for i in xrange(10*10**6): a[i] = i 

What does it look like when you run it? It accepts about 350 MB in my system, which should be minimal.

+1
source share

All Articles