How to improve your search with os.walk and fnmatch

I use os.walk and fnmatch with filters to search pc hdd for all image files. This works fine, but very slow, since it takes about 9 minutes to search for + -70,000 images.

Any ideas on optimizing this code to work faster? Any other suggestions?

I am using python 2.7.2 by the way.

 import fnmatch import os images = ['*.jpg', '*.jpeg', '*.png', '*.tif', '*.tiff'] matches = [] for root, dirnames, filenames in os.walk("C:\\"): for extension in images: for filename in fnmatch.filter(filenames, extension): matches.append(os.path.join(root, filename)) 
+4
source share
3 answers

I am not one of those regular expression maniacs who always resort to the re hammer to solve all problems, but in fact this has led to the fact that in my tests, as your version of fnmatch, is two times faster than your version of fnmatch:

 import os import re matches = [] img_re = re.compile(r'.+\.(jpg|png|jpeg|tif|tiff)$', re.IGNORECASE) for root, dirnames, filenames in os.walk(r"C:\windows"): matches.extend(os.path.join(root, name) for name in filenames if img_re.match(name)) 
+4
source

Python looks very good to me.

You can experiment with

 for root, dirnames, filenames in os.walk("C:\\"): for extension in extensions: matches.extend(os.path.join(root, filename) for filename in fnmatch.filter(filenames, extension)) 

If this does not matter (I believe that this will not happen), I believe that your hard drive has become a bottleneck in this process (remember that the drive == is slow and you repeat and list the files of each directory on your system).

If the hard drive is a bottleneck, the results of several dir /s ... statements should definitely not be extravagantly faster than a Python solution.

+2
source
 import os extns = ('.jpg', '.jpeg', '.png', '.tif', '.tiff') matches = [] for root, dirnames, fns in os.walk("C:\\"): matches.extend( os.path.join(root, fn) for fn in fns if fn.lower().endswith(extns) ) 
+2
source

Source: https://habr.com/ru/post/1413254/


All Articles