How to reduce memory usage in python threaded code?

I wrote about 50 classes that I use to connect and work with websites using mechanization and streaming. All of them work simultaneously, but they are not dependent on each other. Thus, this means that 1 class - 1 site - 1 stream. This is not a particularly elegant solution, especially for code management, since a lot of code is repeated in each class (but not enough to turn it into one class to pass arguments, as some sites may require additional processing of the extracted data in the middle of methods - like "login "- for others it may not be necessary). As I said, this is not elegant - but it works. Needless to say, I welcome all the recommendations on how to write it better without using 1 class for each approach to the site. Adding additional features or general code management for each class is challenging.

However, I found out that each thread takes about 8 MB of memory, so with 50 working threads we look at about 400 MB of use. If it worked on my system, I would not have a problem with this, but since it works on VPS with 1 GB of memory, this is starting to be a problem. Can you tell me how to reduce memory usage, or is there another way to work with multiple sites at the same time?

I used this program for quick python testing to check if the data is stored in the variables of my application using memory or something else. As you can see from the following code, it only processes the sleep () function, but each thread uses 8 MB of memory.

from thread import start_new_thread from time import sleep def sleeper(): try: while 1: sleep(10000) except: if running: raise def test(): global running n = 0 running = True try: while 1: start_new_thread(sleeper, ()) n += 1 if not (n % 50): print n except Exception, e: running = False print 'Exception raised:', e print 'Biggest number of threads:', n if __name__ == '__main__': test() 

When I run this, the output is:

 50 100 150 Exception raised: can't start new thread Biggest number of threads: 188 

And by removing the line running = False , I can measure free memory using the free -m command in the shell:

  total used free shared buffers cached Mem: 1536 1533 2 0 0 0 -/+ buffers/cache: 1533 2 Swap: 0 0 0 

The actual calculation, by which I know that it takes about 8 MB per thread, is simple, dividing the separation of the difference in memory used before and during the above test application, divided by the maximum threads that it managed to run.

This is probably only allocated memory, because looking at top , the python process uses only about 0.6% of the memory.

+7
source share
4 answers
+4
source

Using "one thread per request" is okay and easy for many use cases. However, this will require a lot of resources (as you did).

The best approach is to use asynchronous, but unfortunately it is much more complicated.

Some tips in this direction:

+2
source

The solution is to replace the code as follows:

1) Do something.
2) Wait for something to happen. 3) Do something else.

With this code:

1) Do something.
2) Arrange it so that when something happens, something else will be done.
3) Done.

Elsewhere, you have several threads that do this:

1) Wait until something happens. 2) Handle what happened.
3) Go to step 1.

In the first case, if you expect 50 events, you will have 50 threads waiting for 50 events. In the second case, you have one thread waiting around that will do what of these 50 things need to be done.

So, do not use a thread to wait until one thing happens. Instead, arrange it so that when this happens, some other thread will do whatever needs to be done next.

+1
source

I'm not a Python expert, but maybe there are several thread pools that control the total number of active threads and disable the "request" to the thread as soon as this is done with the previous thread. The request does not have to be a complete stream object; there is enough data to complete any request.

You can also structure it so that you have thread pool A with N threads, ping websites, after receiving the data, transfer data to thread pool B with threads Y crunching data.

0
source

All Articles