Recursively matching file names with the glob argument

I am trying to get a list of files matching the glob pattern in the command line argument ( sys.argv[1] ) recursively using glob.glob and os.walk . The problem is that bash (and many other shells) is an auto-extension of glob templates to file names.

How do standard unix programs (e.g. grep -R ) do this? I understand that they are not in python, but if this happens at the shell level, it does not matter, does it? Is there any way for the script to tell the shell not to auto-expand the ball patterns? It seems that set -f will disable globbing, but I'm not sure how to do this early enough, so to speak.

Have I seen Using Glob () to search for files recursively in Python? , but this does not apply to actually getting glob patterns from command line arguments.

Thanks!

Edit:

grep-like perl script ack accepts the perl regular expression as one of its arguments. Thus, ack .* Displays each line of each file. But .* Must apply to all hidden files in the directory. I tried reading the script, but I don't know perl; How can I do that?

+4
source share
3 answers

The shell executes the glob extension before it even comes up with a command call. Programs like grep do nothing to prevent swallowing: they cannot. You, as the caller from these programs, must tell the shell that you want to pass special characters to the program, such as * and ? , and prevent shell interpretation. You do this by putting them in quotation marks:

 grep -E 'ba(na)* split' *.txt 

(look for ba split , bana split , etc. in all files called <something> .txt ). In this case, single quotes or double quotes will do the trick. Between single quotes, the shell does not expand anything. Between the double quotes $ , ` and \ are still interpreted. You can also protect one character from shell expansion by preceding it with a backslash. These are not just wildcards that need to be protected; for example, above, the space in the template is in quotation marks, so it is part of the grep argument, not a separator of the arguments. Alternative ways of writing the above snippet include

 grep -E "ba(na)* split" *.txt grep -E ba\(na\)\*\ split *.txt 

In most shells, if the argument contains wildcards, but the pattern does not match any files, the pattern remains unchanged and is passed to the base command. So a team like

 grep b[an]*a *.txt 

has a different effect, depending on which files are present on the system. If the current directory does not contain a file whose name begins with b , the command looks for the pattern b[an]*a in files whose name matches *.txt . If the current directory contains files named baclava , bnm and hello.txt , the command expands to grep baclava bnm hello.txt , so it searches for the baclava template in the two bnm and hello.txt . Needless to say, it is a bad idea to rely on this in scripts; on the command line, it can sometimes save input, but it is dangerous.

When you run ack .* In a directory that does not contain a point file, the shell starts ack . .. ack . .. The behavior of the ack command is to print all non-empty lines (pattern.: Matches any one character) in all files under .. (the parent of the current directory) recursively. Contrast with ack '.*' , Which looks for the .* Pattern (which matches something) in the current directory and its subdirectories (due to ack behavior when you do not pass any file name argument).

+6
source

When it comes to grep, it simply accepts a list of file names and does not itself execute the glob extension. If you really need to pass the template as an argument, it must be specified on the command line with single quotes. But before you do this, consider letting the shell complete the task for which it is intended.

+1
source

Yes, set -f , you're on the right track.

It looks like you're going to call your python program a wrapper.

Whenever you use a shell to issue a command, it tries to scan the cmd line and processes wild-cards, command substitution, and a whole bunch of other things.

So, you must turn off the globe before starting the program on the command line

 set -f echo * * myprogram *.txt 

will pass the string '* .txt' to your program. Then you can use internal globbing to retrieve your files.

OR you can do almost the same thing by creating a shell script

  #!/bin/bash set -f myProgram ${@} 

where ${@} are the arguments you pass in when you start myProgram` either from the command line, crontab, or via exec (...) from another process.

Hope this helps.

+1
source

All Articles