How can I search for subfolders using the glob.glob module?

I want to open a series of subfolders in a folder and find text files and print a few lines of text files. I use this:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt') 

But it also cannot access subfolders. Does anyone know how I can use the same command to access subfolders?

+85
python filesystems glob fnmatch
Feb 10 '13 at 13:27
source share
10 answers

In Python 3.5 and later, use the new recursive **/ functionality:

 configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True) 

When recursive set, ** followed by a path separator, matches 0 or more subdirectories.

In earlier versions of Python, glob.glob() could not recursively list files in subdirectories.

In this case, I would use os.walk() combination with fnmatch.filter() :

 import os import fnmatch path = 'C:/Users/sam/Desktop/file1' configfiles = [os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(path) for f in fnmatch.filter(files, '*.txt')] 

This will recursively navigate your directories and return all absolute paths to the corresponding .txt files. In this particular case, fnmatch.filter() might be redundant, you can also use .endswith() :

 import os path = 'C:/Users/sam/Desktop/file1' configfiles = [os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(path) for f in files if f.endswith('.txt')] 
+128
Feb 10 '13 at
source share

To find files in direct subdirectories:

 configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt') 

For a recursive version that traverses all subdirectories, you can use ** and pass recursive=True starting with Python 3.5 :

 configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True) 

Both function calls return lists. You can use glob.iglob() to return paths one by one. Or use pathlib :

 from pathlib import Path path = Path(r'C:\Users\sam\Desktop') txt_files_only_subdirs = path.glob('*/*.txt') txt_files_all_recursively = path.rglob('*.txt') # including the current dir 

Both methods return iterators (you can get the paths one by one).

+17
Feb 10 '13 at
source share

The glob2 package supports wild cards and fast enough

 code = ''' import glob2 glob2.glob("files/*/**") ''' timeit.timeit(code, number=1) 

It takes about 2 seconds on my laptop to match > 60,000 file paths .

+17
Mar 13 '14 at 19:10
source share

You can use Formic with Python 2.6

 import formic fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/") 

Disclosure - I am the author of this package.

+8
Feb 12 '13 at 1:12
source share

Here is an adapted version that allows glob.glob as functionality without using glob2 .

 def find_files(directory, pattern='*'): if not os.path.exists(directory): raise ValueError("Directory not found {}".format(directory)) matches = [] for root, dirnames, filenames in os.walk(directory): for filename in filenames: full_path = os.path.join(root, filename) if fnmatch.filter([full_path], pattern): matches.append(os.path.join(root, filename)) return matches 

So if you have the following dir structure

 tests/files ├── a0 │  ├── a0.txt │  ├── a0.yaml │  └── b0 │  ├── b0.yaml │  └── b00.yaml └── a1 

You can do something like this

 files = utils.find_files('tests/files','**/b0/b*.yaml') > ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml'] 

It doesn't match the fnmatch pattern fnmatch all for the whole file name, not just the file name.

+3
Mar 26 '15 at 2:20
source share

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

Not working for all cases, use glob2 instead

 configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt") 
+2
May 26 '16 at 9:31
source share

If you can install the glob2 package ...

 import glob2 filenames = glob2.glob("C:\\top_directory\\**\\*.ext") # Where ext is a specific file extension folders = glob2.glob("C:\\top_directory\\**\\") 

All file names and folders:

 all_ff = glob2.glob("C:\\top_directory\\**\\**") 
+2
Jul 26 '16 at 15:05
source share

If you are using Python 3.4+, you can use the pathlib module. The Path.glob() method supports the ** pattern, which means "this directory and all subdirectories, recursively." It returns a generator giving Path objects for all relevant files.

 from pathlib import Path configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt") 
+2
Jun 29 '17 at 23:07 on
source share

As Martijn pointed out, glob can only do this through the ** operator introduced in Python 3.5. Since the OP explicitly requests the glob module, the following returns a lazy evaluation iterator that behaves similarly

 import os, glob, itertools configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt')) for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/')) 

Please note that in this approach you can only repeat once over configfiles . If you need a real list of configuration files that can be used in several operations, you will need to explicitly create this using list(configfiles) .

0
Dec 05 '15 at 23:45
source share

There is a lot of confusion on this topic. Let me see if I can clarify this (Python 3.7):

  1. glob.glob('*.txt') : matches all files ending with '.txt' in the current directory
  2. glob.glob('*/*.txt') : same as 1
  3. glob.glob('**/*.txt') : matches all files ending in '.txt' only in direct subdirectories , but not in the current directory
  4. glob.glob('*.txt',recursive=True) : same as 1
  5. glob.glob('*/*.txt',recursive=True) : same as 3
  6. glob.glob('**/*.txt',recursive=True): matches all files ending with '.txt' in the current directory and in all subdirectories

Therefore, it is better to always specify recursive=True.

0
Jul 14 '19 at 6:13
source share



All Articles