This is a general computational problem; you need the speed of the data stored in memory, but not enough memory. You have at least the following options:
- Buy additional RAM (obviously)
- Let the exchange process. This leaves OS OS to decide which data to store on disk and store in memory.
- Do not load everything into memory at once
Since you are iterating over your dataset, one solution might be to load the data lazily:
def get_data(filename): with open(filename) as f: while True: line = f.readline() if line: yield line break for item in get_data('my_genes.dat'): gather_statistics(deserialize(item))
The option is to split your data into multiple files or save your data in a database so that you can periodically process your data n items.
source share