Yes, the dict will be stored in the process memory. Therefore, if it becomes so large that there is not enough space in the RAM of the system, you can expect a significant slowdown, as the system begins to exchange memory for and from the disk.
Others said several million items should not be a problem; I'm not quite sure. Significant lack of memory (before counting memory made by keys and values). For Python 2.6 or later, sys.getsizeof provides some useful information on how much RAM of various Python structures it takes. Some quick results from Python 2.6 on a 64-bit OS X machine:
>>> from sys import getsizeof >>> getsizeof(dict((n, 0) for n in range(5462)))/5462. 144.03368729403149 >>> getsizeof(dict((n, 0) for n in range(5461)))/5461. 36.053470060428495
Thus, the dict overhead varies between 36 bytes per element and 144 bytes per element on this computer (the exact value depends on how complete the internal hash of the dictionary is; here 5461 = 2 ** 14 // 3 is one of the thresholds, where the internal hash table is increased). And this, before adding the overhead for the dict elements themselves; if they are all short lines (say, 6 characters or less), then this still adds> = 80 bytes per element (possibly less if many different keys have the same value).
Thus, on a typical machine, it will not take many millions of dict items to output RAM.
Mark dickinson
source share