Detect file descriptor leaks in python?

My program seems to be skipping files. How to find out where?

My program uses file descriptors in several different places - output from child processes, calling the ctypes API (ImageMagick) opens the files, and they are copied.

It crashes in shutil.copyfile , but I'm sure this is not where it flows.

 Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 874, in main magpy.run_all() File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 656, in run_all [operation.operate() for operation in operations] File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 417, in operate output_file = self.place_image(output_file) File "C:\Python25\Lib\site-packages\magpy\magpy.py", line 336, in place_image shutil.copyfile(str(input_file), str(self.full_filename)) File "C:\Python25\Lib\shutil.py", line 47, in copyfile fdst = open(dst, 'wb') IOError: [Errno 24] Too many open files: 'C:\\Documents and Settings\\stuart.axon\\Desktop\\calzone\\output\\wwtbam4\\Nokia_NCD\\nl\\icon_42x42_V000.png' Press any key to continue . . . 
+7
python file handle
source share
5 answers

I had similar problems ending in file descriptors during subprocess.Popen () calls. I used the following script to debug what happens:

 import os import stat _fd_types = ( ('REG', stat.S_ISREG), ('FIFO', stat.S_ISFIFO), ('DIR', stat.S_ISDIR), ('CHR', stat.S_ISCHR), ('BLK', stat.S_ISBLK), ('LNK', stat.S_ISLNK), ('SOCK', stat.S_ISSOCK) ) def fd_table_status(): result = [] for fd in range(100): try: s = os.fstat(fd) except: continue for fd_type, func in _fd_types: if func(s.st_mode): break else: fd_type = str(s.st_mode) result.append((fd, fd_type)) return result def fd_table_status_logify(fd_table_result): return ('Open file handles: ' + ', '.join(['{0}: {1}'.format(*i) for i in fd_table_result])) def fd_table_status_str(): return fd_table_status_logify(fd_table_status()) if __name__=='__main__': print fd_table_status_str() 

You can import this module and call fd_table_status_str() to register the state of the file descriptor table at different points in your code.

Also, make sure that subprocess.Popen instances are destroyed. Saving references to Popen instances on Windows prevents the GC from starting. And if the instances are saved, the linked channels are not closed. More details here .

+4
source share

Look at the output of the ls -l/proc/$pid/fd/ (of course, replacing the PID of your process) to see which files are open [or, on win32, use Process Explorer to view a list of open files]; then find out where in your code you open them, and make sure close() called. (Yes, the garbage collector will eventually close things, but it is not always fast enough to avoid running out of fds).

Checking for circular references that might interfere with garbage collection is also good practice. (The loop collector will eventually get rid of them - but it may not work often enough to avoid running out of the file descriptor; it bit me personally).

+3
source share

Use the Process Explorer , select your process, View-> Lower Pane View-> Handles - then find what seems inappropriate - usually a lot of the same or similar files open points to the problem.

+3
source share

lsof -p <process_id> works well on several UNIX-like systems, including FreeBSD.

+3
source share

Although the OP has a Windows system, I am sure that many here (for example, me) are also looking for others (not even labeled Windows).

Google has a psutil package with get_open_files() . It looks like a great interface, but it seems that it has not been supported for a couple of years. I actually wrote an implementation for my own Python 2 project for Linux. I use it with unittest to make sure my functions clean up their resources.

 import os # calling this **synchronously** will accurately relay open files on Linux def get_open_files(pid): # directory spawned by Python process, containing its file descriptors path = "/proc/%d/fd" % pid # list the abspaths belonging to that directory links = ["%s/%s" % (path, f) for f in os.listdir(path)] # filter out the bad ones returned by os.listdir() valid_links = filter(lambda f: os.path.exists(f), links) # these links are fd integers, so map them to their actual file devices devices = map(lambda f: os.readlink(f), valid_links) # remove any ones that are stdin, stdout, stderr, etc. return filter(lambda f: "/dev/pts" not in f, devices) 
0
source share

All Articles