Best way to handle a large list of dictionaries in Python

Question

Best way to handle a large list of dictionaries in Python

I am doing a statistical test that uses 10,000 permutations as a zero distribution.

Each permutation is 10,000 keywords. Each key represents a gene; each value represents a set of patients corresponding to a gene. This dictionary is programmatically generated and can be written and read from a file.

I want to be able to iterate over these permutations to perform my statistical test; however, keeping this large list on the stack slows my performance.

Is there a way to store these dictionaries in stored memory and get permutations when I iterate over them?

Thanks!

+6

python dictionary

Jonathan lu Aug 29 '15 at 18:02

source share

1 answer

Erik cederstrand · Answer 1 · 2015-09-02T08:15:53+0000

This is a general computational problem; you need the speed of the data stored in memory, but not enough memory. You have at least the following options:

Buy additional RAM (obviously)
Let the exchange process. This leaves OS OS to decide which data to store on disk and store in memory.
Do not load everything into memory at once

Since you are iterating over your dataset, one solution might be to load the data lazily:

def get_data(filename): with open(filename) as f: while True: line = f.readline() if line: yield line break for item in get_data('my_genes.dat'): gather_statistics(deserialize(item))

The option is to split your data into multiple files or save your data in a database so that you can periodically process your data n items.

Best way to handle a large list of dictionaries in Python

More articles: