I wrote about 50 classes that I use to connect and work with websites using mechanization and streaming. All of them work simultaneously, but they are not dependent on each other. Thus, this means that 1 class - 1 site - 1 stream. This is not a particularly elegant solution, especially for code management, since a lot of code is repeated in each class (but not enough to turn it into one class to pass arguments, as some sites may require additional processing of the extracted data in the middle of methods - like "login "- for others it may not be necessary). As I said, this is not elegant - but it works. Needless to say, I welcome all the recommendations on how to write it better without using 1 class for each approach to the site. Adding additional features or general code management for each class is challenging.
However, I found out that each thread takes about 8 MB of memory, so with 50 working threads we look at about 400 MB of use. If it worked on my system, I would not have a problem with this, but since it works on VPS with 1 GB of memory, this is starting to be a problem. Can you tell me how to reduce memory usage, or is there another way to work with multiple sites at the same time?
I used this program for quick python testing to check if the data is stored in the variables of my application using memory or something else. As you can see from the following code, it only processes the sleep () function, but each thread uses 8 MB of memory.
from thread import start_new_thread from time import sleep def sleeper(): try: while 1: sleep(10000) except: if running: raise def test(): global running n = 0 running = True try: while 1: start_new_thread(sleeper, ()) n += 1 if not (n % 50): print n except Exception, e: running = False print 'Exception raised:', e print 'Biggest number of threads:', n if __name__ == '__main__': test()
When I run this, the output is:
50 100 150 Exception raised: can't start new thread Biggest number of threads: 188
And by removing the line running = False
, I can measure free memory using the free -m
command in the shell:
total used free shared buffers cached Mem: 1536 1533 2 0 0 0 -/+ buffers/cache: 1533 2 Swap: 0 0 0
The actual calculation, by which I know that it takes about 8 MB per thread, is simple, dividing the separation of the difference in memory used before and during the above test application, divided by the maximum threads that it managed to run.
This is probably only allocated memory, because looking at top
, the python process uses only about 0.6% of the memory.