950MB should not be too large for most modern machines to store in memory. I have done this many times in Python programs, and my machine has 4 GB of physical memory. I can imagine doing the same with less memory too.
You definitely don't want to waste memory if you can avoid it. The previous message is mentioned, processing the file line by line and accumulating the result, which is the right way to do this.
If you do not immediately read the entire file in memory, you only need to worry about how much memory your accumulated result occupies, and not the file itself. You can process files much more than you mentioned, if the result that you store in memory does not become too large. If this is the case, then you will want to start saving partial results as the files themselves, but it does not seem like this problem requires it.
Here the simplest solution to your problem is possible:
f = open('myfile.txt') result = {} for line in f: word, count = line.split() result[word] = int(count) + result.get(word, 0) f.close() print '\n'.join(result.items())
If you are running Linux or another UNIX-like OS, use top to monitor memory usage while the program is running.
source share