Python requests do not free memory when loading with sessions

Question

Python requests do not free memory when loading with sessions

I have an application in which I use requests to download .mp3 files from the server.

The code is as follows:

self.client = requests.session(headers={'User-Agent': self.useragent}) def download(self, url, name): request = self.client.get(url) with open(name, "wb") as code: code.write(request.content) print "done"

The problem is that when the download is finished, python does not clear the memory, so every time I download mp3, the memory usage in the application increases by the size of mp3. The memory is not cleared again, which leads to the fact that my application uses a lot of memory.

I assume this is due to how I save the file or how request.session works.

Any suggestions.

Edit: Here is the code: https://github.com/Simon1988/VK-Downloader

The corresponding part is in lib / vklib.py

+4

python python-requests

scandinavian_ Jan 11 '13 at 1:03

source share

2 answers

I don’t think there is a real problem here, except that you don’t understand how memory allocation works.

When Python requires more memory, it requests the OS for more. When this is done with this memory, it usually does not return it to the OS; instead, he holds it for subsequent objects.

So, when you open the first 10 MB of mp3, the use of your memory goes, say, from 3 MB to 13 MB. Then you free this memory, but you are still 13 MB. Then you open the second 10 MB mp3, but it reuses the same memory, so you are still 13 MB. And so on.

In your code, you create a thread for each download. If you have 5 threads at a time, all using 10 MB, obviously this means you are using 50 MB. And this 50MB will not be released. But if you wait for their completion, make another 5 downloads, it will again use the same 50 MB.

Since your code does not limit the number of threads in any way, there is nothing (other than CPU speed and context switching costs) to stop you from running hundreds of threads, each of which uses 10 MB, which means a gigabyte of RAM. But just by switching to the thread pool or not allowing the user to start more downloads if there is too much gong, etc., Solves this.

So usually this is not a problem. But if so, there are two ways:

Create a child process (for example, through the multiprocessing module) to perform work with memory. In any modern OS, when the process leaves, its memory is restored. The problem is that distributing and releasing 10 MB again and again actually slows down your system, rather than speeding it up, and the cost of starting the process (especially on Windows) will be even worse. So you probably want a much larger batch of jobs to depend on the child process c.
Do not read all this in memory at once; use the streaming API instead of the full file API. With requests this means setting stream=True in the original request, and then usually uses r.raw.read(8192) , r.iter_content() or r.iter_lines() in a loop instead of accessing r.content .

+3

abarnert Jan 11 '13 at 1:16

source share

Blender · Accepted Answer · 2013-01-11T01:07:38+0000

You can try streaming content in chunks:

 def download(self, url, name): request = self.client.get(url, stream=True) # `prefetch=False` for older # versions of requests with open(name, "wb") as code: for chunk in request.iter_content(1024): if not chunk: break code.write(chunk)

Python requests do not free memory when loading with sessions

More articles: