I have a structure consisting of various tools written in python in a multi-user environment.
When you log in for the first time and run one command, it only takes 6 seconds to show several help lines. If I issue the same command again, it will take 0.1 s. After a couple of minutes, he returns for 6 seconds. (proof of short-term cache)
The system is on GPFS, so disk bandwidth should be fine, although access may be low due to the number of files in the system.
strace -e open python tool | wc -l
displays 2154 files that can be obtained when the tool starts.
strace -e open python tool | grep ENOENT | wc -l
shows 1945 missing files. (A very bad hit / miss ratio you ask me :-)
I suspect that the excessive time it takes to download the tool is consumed by requesting GPFS about all these files, and they are cached for the next call (at the system or GPFS level), although I do not know how to verify / prove this. I do not have root access to the system, and I can only write to GPFS and / tmp.
Can this python quest for missing files be improved?
Any idea on how to test this in a simple way? (Reinstalling everything on / tmp is not easy, since there are many packages, virtualenv will not help me either (I think), since it just links files in the gpfs ββsystem).
Of course, the option would have a demon, which forks, but this is far from "simple" and would be the last solution.
Thanks for reading.
estani
source share