How to skip a directory in awk?

Let's say I have the following file and directory structure:

$ tree . β”œβ”€β”€ a β”œβ”€β”€ b └── dir └── c 1 directory, 3 files 

That is, two files a and b together with dir dir , where another file c stands.

I want to process all files using awk ( GNU Awk 4.1.1 , for sure), so I am doing something like this:

 $ gawk '{print FILENAME; nextfile}' * */* a b awk: cmd. line:1: warning: command line argument `dir' is a directory: skipped dir/c 

Everything is fine, but * also expands to the dir directory and awk tries to process it.

So, I wonder: is there any native awk path that can check if a given element is a file or not, and if so, skip it? That is, without using system() for it.

I did this by calling the external system in BEGINFILE :

 $ gawk 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}} ENDFILE{print FILENAME, FNR}' * */* a a 10 a.wk a.wk 3 b b 10 dir dir is a dir, skipping dir/c dir/c 10 

Please also note that if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile} if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile} counter works intuitively: it should return 1 when true, but it returns an exit code.

I read in A.5 Extensions in gawk Not in POSIX awk :

And then the linked page says:

4.11 Directories on the command line

According to the POSIX standard, files named on the awk command line must be text files; this is a fatal error if they are not. Most awk versions treat the directory on the command line as a fatal error.

By default, gawk issues a warning for the directory in the line command, but otherwise ignores it. This makes it easy to use a wildcard shell with your awk program:

 $ gawk -f whizprog.awk * Directories could kill this program 

If any of the --posix or --traditional options are given, then gawk returns to processing the directory on the command line as a fatal error.

See Extend Sample Readdir for a way to treat directories as used data from an awk program.

And actually it is: the same command as before does not work with --posix :

 $ gawk --posix 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}} ENDFILE{print FILENAME, NR}' * */* gawk: cmd. line:1: fatal: cannot open file `dir' for reading (Is a directory) 

I checked section 16.7.6 Reading Directories , which is linked above, and they talk about readdir :

The readdir extension adds an input parser for directories. Usage is as follows:

@load "readdir"

But I'm not sure how to call it, nor how to use it from the command line.

+6
source share
2 answers

If you want to protect your script from other people by mistaking it with a directory (or something else that is not a readable text file), you can do this:

 $ ls -F tmp bar dir/ foo $ cat tmp/foo line 1 $ cat tmp/bar line 1 line 2 $ cat tmp/dir cat: tmp/dir: Is a directory $ cat tst.awk BEGIN { for (i=1;i<ARGC;i++) { if ( (getline line < ARGV[i]) <= 0 ) { print "Skipping:", ARGV[i], ERRNO delete ARGV[i] } close(ARGV[i]) } } { print FILENAME, $0 } $ awk -f tst.awk tmp/* Skipping: tmp/dir Is a directory tmp/bar line 1 tmp/bar line 2 tmp/foo line 1 $ awk --posix -f tst.awk tmp/* Skipping: tmp/dir tmp/bar line 1 tmp/bar line 2 tmp/foo line 1 

On POSIX, getline returns -1 if / when it does not try to get a record from a file (for example, an unreadable file or the file does not exist or the file is a directory), you just need GNU awk to tell you, of these failures, it was ERRNO if you are interested.

+2
source

I would just avoid transferring directories to awk, since even POSIX says that all file names must be textual.

You can use find to navigate the directory:

 find PATH -type f -exec awk 'program' {} + 
+5
source

All Articles