Let's say I have the following file and directory structure:
$ tree . βββ a βββ b βββ dir βββ c 1 directory, 3 files
That is, two files a and b together with dir dir , where another file c stands.
I want to process all files using awk ( GNU Awk 4.1.1 , for sure), so I am doing something like this:
$ gawk '{print FILENAME; nextfile}' * *
Everything is fine, but * also expands to the dir directory and awk tries to process it.
So, I wonder: is there any native awk path that can check if a given element is a file or not, and if so, skip it? That is, without using system() for it.
I did this by calling the external system in BEGINFILE :
$ gawk 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}} ENDFILE{print FILENAME, FNR}' * *
Please also note that if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile} if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile} counter works intuitively: it should return 1 when true, but it returns an exit code.
I read in A.5 Extensions in gawk Not in POSIX awk :
And then the linked page says:
4.11 Directories on the command line
According to the POSIX standard, files named on the awk command line must be text files; this is a fatal error if they are not. Most awk versions treat the directory on the command line as a fatal error.
By default, gawk issues a warning for the directory in the line command, but otherwise ignores it. This makes it easy to use a wildcard shell with your awk program:
$ gawk -f whizprog.awk * Directories could kill this program
If any of the --posix or --traditional options are given, then gawk returns to processing the directory on the command line as a fatal error.
See Extend Sample Readdir for a way to treat directories as used data from an awk program.
And actually it is: the same command as before does not work with --posix :
$ gawk --posix 'BEGINFILE{print FILENAME; if (system(" [ ! -d " FILENAME " ]")) {print FILENAME, "is a dir, skipping"; nextfile}} ENDFILE{print FILENAME, NR}' * */* gawk: cmd. line:1: fatal: cannot open file `dir' for reading (Is a directory)
I checked section 16.7.6 Reading Directories , which is linked above, and they talk about readdir :
The readdir extension adds an input parser for directories. Usage is as follows:
@load "readdir"
But I'm not sure how to call it, nor how to use it from the command line.