I have a python script that disconnects and makes several HTTP and urllib requests for different domains.
We have a huge number of domains for processes and we need to do this as quickly as possible. Since HTTP requests are slow (i.e., they may not be available on the website in the domain), I run several scripts at any time, loading them from the list of domains in the database.
The problem that I see over a certain period of time (from several hours to 24 hours) is that all scripts start to slow down, and ps -al shows that they were sleeping.
The servers are very powerful (8 cores, 72 GB RAM, 6 TB Raid 6, etc. etc. 80 MB 2: 1) and are never exceeded, i.e. Free -m shows
-/+ buffers/cache: 61157 11337 Swap: 4510 195 4315
Top impressions between 80-90% downtime
sar -d shows the average value of 5.3% util
and more interestingly, iptraf starts at a speed of 50-60 MB / s and ends at 8-10 MB / s in about 4 hours.
I currently use about 500 script versions on each server (2 servers), and both of them show the same problem.
ps -al shows that most python scripts are sleeping, and I don’t understand why for example:
0 S 0 28668 2987 0 80 0 - 71003 sk_wai pts/2 00:00:03 python 0 S 0 28669 2987 0 80 0 - 71619 inet_s pts/2 00:00:31 python 0 S 0 28670 2987 0 80 0 - 70947 sk_wai pts/2 00:00:07 python 0 S 0 28671 2987 0 80 0 - 71609 poll_s pts/2 00:00:29 python 0 S 0 28672 2987 0 80 0 - 71944 poll_s pts/2 00:00:31 python 0 S 0 28673 2987 0 80 0 - 71606 poll_s pts/2 00:00:26 python 0 S 0 28674 2987 0 80 0 - 71425 poll_s pts/2 00:00:20 python 0 S 0 28675 2987 0 80 0 - 70964 sk_wai pts/2 00:00:01 python 0 S 0 28676 2987 0 80 0 - 71205 inet_s pts/2 00:00:19 python 0 S 0 28677 2987 0 80 0 - 71610 inet_s pts/2 00:00:21 python 0 S 0 28678 2987 0 80 0 - 71491 inet_s pts/2 00:00:22 python
The sleep state is not executed in the script, so I can’t understand why ps -al shows that most of them are sleeping, and why they should slower and slower to make fewer IP requests over time, when the processor, memory, disk access and bandwidth available in abundance.
If anyone could help, I would be very grateful.
EDIT:
The code is massive, as I use exceptions from it to catch domain diagnostics, i.e. the reasons why I can’t connect. Will host the code somewhere if necessary, but the main calls through HTTPLib and URLLib are directly on python examples.
Additional Information:
AND
quota -u mysql quota -u root
come back without anything
nlimit -n returns with 1024 have a change limit.conf to allow mysql to allow 16,000 soft and hard connections and can run up to 2000 script so far, but still a problem.
SOME PROGRESS
Well, that's why I changed all the restrictions for the user, ensured that all sockets were closed (there were none), and although everything will be better, I still slow down, although not so badly.
Interestingly, I also noticed some memory leak - the scripts use more and more memory the longer they work, but I'm not sure what this causes. I store the output in a line and then print it to the terminal after each iteration, I also clear the line at the end, but can the memory all increase to a terminal that saves all the output?
Edit: No, it seems, no - 30 scripts were run without output to the terminal and still the same leaks. I don't use anything smart (just strings, HTTPlib and URLLib) - I wonder if there are any problems with the mysql python connector ...?