I saw a lot of similar questions to this, but nothing that really matched. Most of the other questions seemed to relate to speed. What I'm experiencing is the only json dictionary that is in the 1.1gig file in my local field, occupying all my 16 gigabytes of memory when I try to load it using anything line by line:
f = open(some_file, "rb") new_dictionary = json.load(f)
This happens regardless of which json library I use (I tried ujson, json, yajl) and regardless of whether I read things as a stream of bytes or not. For me it makes absolutely no sense. What about crazy memory usage, and how do I get around this?
In case this helps, a dictionary is just a bunch of nested dictionaries, all of which have ints pointing to other ints. An example looks like this:
{"0":{"3":82,"4":503,"15":456},"956":{"56":823,"678":50673,"35":1232}...}
UPDATE: When I run this using simplejson, it actually only takes 8 gigs. I donβt know why this question is much less than the rest.
UPDATE 2: So, I did some more research. I uploaded my dictionary using simplejson and tried to convert all keys to ints (assuming Liori that strings could take up more space). The space remained unchanged at 8 concerts. Then I tried using Winston Ewert to run gc.collect (). Space still remained at 8 concerts. Finally, annoyed and curious, I pickled my new data structure, exited Python, and rebooted. Listen, it still takes 8 concerts. I think Python just wants a lot of room for a large 2d dictionary. Disappointing, of course, but at least now I know that this is not a JSON problem if I use simplejson to load it.
Eli
source share