Python3 urllib.request will not close connections immediately

I have the following code to run a continuous loop to get some content from a website:

from http.cookiejar import CookieJar from urllib import request cj = CookieJar() cp = request.HTTPCookieProcessor(cj) hh = request.HTTPHandler() opener = request.build_opener(cp, hh) while True: # build url req = request.Request(url=url) p = opener.open(req) c = p.read() # process c p.close() # check for abort condition, or continue 

The content is read correctly. But for some reason, TCP connections do not close. I am monitoring the active number of connections from the dd-wrt router interface, and it is constantly growing. If the script continues to run, it will exhaust the connection limit of the 4096 router. When this happens, the script simply enter the wait state (the router will not allow new connections, but the timeout has not yet hit). After a couple of minutes, these connections will be closed, and the script will resume again.

I was able to observe the status of these hanging connections from the router. They have the same state: TIME_WAIT.

I expect this script to use no more than 1 TCP connection at a time. What am I doing wrong?

I am using Python 3.4.2 on Mac OS X 10.10.

+7
python urllib macos
source share
1 answer

In some research, I discovered the cause of this problem: TCP protocol design . In a nutshell, when you disconnect, the connection is not immediately dropped, it goes into TIME_WAIT state and expires after 4 minutes. Unlike what I expected, the connection does not disappear right away.

In accordance with this issue, it is also impossible to forcefully delete the connection (without restarting the network stack).

In my particular case, for example, this question is indicated , the best option would be to use a persistent connection that supports the aka HTTP protocol. Since I request the same server, this will work.

+4
source share

All Articles