I run cron on my amazon EC2 micro instance every 12 hours. It downloads a 118 MB file and parses it using the json library. This, of course, causes the instance to end in memory. My instance has 416 MB of free memory, but then I run the script, it drops to 6 MB, and then the OS kills it.
I wonder what my options are? Is it possible to effectively analyze this with Ruby, or do I need to go down to a low level, such as C? I can get a more efficient instance of amazon, but I really want to know if this can be done through Ruby.
UPDATE: I looked at the juggle. It can give you json objects as it parses, but the problem is that if your JSON file contains only one root object, then it will be forced to parse ALL the file. My JSON looks like this:
--Root -Obj 1 -Obj 2 -Obj 3
So if I do:
parser.parse(file) do |hash|
Since I have only 1 root object, it will parse all JSON. If Obj 1/2/3 were root, then it would work as it would give me them one by one, but my JSON is not like that, and it parses and eats 500 mb of memory ...
UPDATE # 2: Here's a small version of a large 118 megabyte file (7mb):
Gone
This parsing, I didn’t just take some bytes from the file, just so you can see it as a whole. The array I'm looking for is
events = json['resultsPage']['results']['event']
thanks
json ruby amazon-web-services
0xSina
source share