To answer the question about dict vs list , you will need to provide accurate information about the number of elements, the number of missing elements, etc., so we could accurately estimate the memory usage with two structure data and try to predict and / or check their effectiveness.
In the general case, the <key-value dict pairs for N require a bit more memory than list with N values:
dict should follow the keysdict does not exceed 2/3. When this happens, the allocated memory doubles (this requires a dict ) to perform O (1) operations with time amortization.
However, there is an alternative to this data structure that should provide very good performance: blist . The blist package provides an interface that corresponds to the list interface, only it is implemented using B-trees. It can efficiently handle sparse lists. Most operations take O(1) or O(log n) , so they are quite efficient.
For example, you can first create a sparse blist to do:
from blist import blist seq = blist([None]) seq *= 2**30
And then you can set only indexes that matter:
for i, value in zip(indices, values): seq[i] = value
Full documentation here .
Note that blist provides other efficient operations, such as:
- Combining two
blist takes O(log n) time - Taking
[i:j] slice takes O(log n) time - Inserting an element with a given index accepts
O(log n) operations - The appearance of an element (from each position) performs
O(log n) operations
Since you gave a few digits, here is how they compare with dict s:
>>> from blist import blist >>> b = blist([None]) >>> b *= 5000 >>> for i in range(100):b[i] = i ... >>> b.__sizeof__() 2660 >>> d = dict() >>> for i in range(100):d[i] = i ... >>> d.__sizeof__() 6216 >>> b = blist([None]) >>> b *= 750 >>> for i in range(30):b[i] = i ... >>> b.__sizeof__() 1580 >>> d = dict() >>> for i in range(30):d[i] = i ... >>> d.__sizeof__() 1608
In both cases, a blist takes up less memory (in the first case, it takes 1/3 of the memory equivalent to dict ). Note that memory taken with blist also depends on whether the indices are contiguous (contiguous are better). However, even using random indexes:
>>> b = blist([None]) >>> b *= 5000 >>> import random >>> for i in range(100):b[random.randint(0, 4999)] = i ... >>> b.__sizeof__() 2916
He is still much better than a dict .
Even search times are better:
In [1]: from blist import blist ...: import random ...: In [2]: b = blist([None]) In [3]: b *= 5000 In [4]: for i in range(100):b[random.randint(0, 4999)] = i In [5]: %timeit b[0] 10000000 loops, best of 3: 50.7 ns per loop In [6]: d = dict() In [7]: for i in range(100):d[random.randint(0, 4999)] = i In [10]: %timeit d[1024]
Note that a list takes about 47 ns to search for an index on my machine, so blist really very close to list in terms of search performance in small lists like what you have.