How to select unnecessary results in a non-w50> shell

Question

How to select unnecessary results in a non-w50> shell

Ok, so I have a script handling the output, separated by zeros of find , and I can easily handle this with the bash shell, for example:

 #!/bin/sh find "$1" -print0 | while read -rd '' path; do echo "$path"; done

A pretty stupid example, as it converts the results to newlines anyway, but this is just to give you an idea of what I'm looking for. This basic method works great and avoids possible problems due to files that may contain newlines on different file systems.

However, I need to do the same in the bash shell, which means that I am losing read -d support. Thus, without resorting to the special functions of bash (or other shells), is there a way by which I can process non-zero results, similar to the above?

If not, what is better for protecting against newlines in the results? I thought I could use the -exec option to find to replace newlines in filenames with some escaped value, but I'm not sure if you can find and replace newlines (I can't use tr for example) or which use a replacement, so null characters are the best option if available.

+8

bash shell

Haravikk Mar 31 '14 at 13:27

source share

6 answers

Adding to @Adrian Frühwirth an excellent answer:

Here is a rigorous POSIX-compliant solution , both in terms of shell code and using utilities and their options:

 find . -exec sh -c 'for f in " $@ "; do echo "$f"; done' - {} +

This avoids both find -print0 and read -d .

(There is a hypothetical possibility that your shell code will be called more than once, namely when there are so many input file names that they are not suitable for a single command line.
getconf ARG_MAX tells you your platform max. the length of the command line for calling external utilities, but note that in practice the limit is lower; see http://www.in-ulm.de/~mascheck/various/argmax/ )

+6

mklement0 Mar 31 '14 at 14:28

source share

See the section “Iterating Zero Separated Results in Non-w50> Shell”. So far, most answers have offered special solutions for find . -print0 find . -print0 , actually bypassing the iteration through a list of zero split lines (e.g. find . -exec ... or shell globbing ).

The files "/ proc / <pid> / environ" or "/ proc / <pid> / cmdline" are good (Linux) examples that really require iteration through a list of strings with a terminating zero. The only solution that only works correctly for the POSIX shell (for example, dash) AFAIK uses xargs -0 (or similar tools like parallel -0 ), as mentioned by Adrian Frühwirth and FatalError in answers:

 #!/bin/sh xargs -0 sh -c 'for i; do printf "%s\n" "$i"; done' my_cmd </proc/1/environ

The above example requires that it be run as root. It also works for strings containing newlines and other special characters.

+3

linuxball Sep 26 '14 at 8:15

source share

One thing you can do is use the xargs -0 option to pass arguments to another shell, for example:

 $ find . -print0 | xargs -0 sh -c 'for f in " $@ "; do echo "$f"; done'

+2

Fatalerror Mar 31 '14 at 13:32

source share

Adrian Frühwirth's answer is definitely the most correct and complete, but for those interested in this issue, I just wanted to share the code that I just used:

 NL=$'\n' read_path() { path= IFS= while [ -z "$path" ]; do read -r path || return $? while [ ! -e "$path" ]; do read -r path_next || { path=; return $?; } [ "${path_next:0:6}" != '~:/\/:' -o ! -e "$find_path_next" ] && path="$path$NL$path_next" || path="$path_next" done done }

This works when you run find as follows:

 find . -exec printf '~:/\/:%s\n' {} \; | while read_path; do echo "$path"; done

Since the line added at the beginning of the results should never appear in the actual file names (if there is a simpler line, let me know!), Then it should be safe to use it when deciding whether to combine the results into a single line.

I am going to use this in combination with the test to support -print0 and read -d , so I can use this for simplicity where possible, but the above should be safe or at least work in all the environments that I have tested so far since then, and it seems they are doing this job when I cannot use a more beautiful method; for example, if I cannot use globbing because I need more specific results from find or ls

0

Haravikk Apr 1 '14 at 12:39

source share

1. Use `zsh`

The simplest solution is to use zsh , which is a non- bash that supports reading values separated by read -d "" using read -d "" (since version 4.2, released in 2004), and the only main shell that can store zeros in variables. Moreover, the last component of the pipeline does not start in a subshell in zsh , so the variables set there are not lost. We can simply write:

 #!/usr/bin/env zsh find . -print0 |while IFS="" read -r -d "" file; do echo "$file" done

With zsh we can also easily avoid the null-delimiter problem (at least in the case of find. -print ) by using setopt globdots , which makes globes match hidden files, and ** , which returns to subdirectories. This works in almost all versions of zsh , even those older than 4.2:

 #!/usr/bin/env zsh setopt globdots for file in **/*; do echo "$file" done

2. Use the POSIX shell and `od`

2.1 Use pipes

A generic POSIX-compliant solution to iterate over values separated by zeros should convert the input data so that information is not lost, and zero values are converted to something else that is easier to handle. We can use od to print the octal values of all input bytes and easily convert the data back using printf :

 #!/usr/bin/env sh find . -print0 |od -An -vto1 |xargs printf ' %s' \ |sed 's/ 000/@/g' |tr @ '\n' \ |while IFS="" read -r file; do file='printf '\134%s' $file' file='printf " $file@ "' file="${file%@}" echo "$file" done

2.2 Use a variable to store intermediate results

Note that in while loop will work in a sub-shell (at least different from the zsh shells and the original, non-public domain of the Corn shell), which means that the variables set in this loop will not be visible in the rest of the code. If this is unacceptable, then in while loop can be started from the main building, and its input can be stored in a variable:

 #!/usr/bin/env sh VAR='find . -print0 |od -An -vto1 |xargs printf ' %s' \ |sed 's/ 000/@/g' |tr @ '\n'' while IFS="" read -r file; do file='printf '\134%s' $file' file='printf " $file@ "' file="${file%@}" echo "$file" done <<EOF $VAR EOF

2.3 Use a temporary file to store intermediate results

If the output from the find very long, the script will not be able to save the output to a variable and may crash. Moreover, most shells use temporary files to implement heredoc , so instead of using a variable, we could explicitly write to a temporary file and avoid the problems of using variables to store intermediate results.

 #!/usr/bin/env sh TMPFILE="/tmp/$$_'awk 'BEGIN{srand(); print rand()}''" find . -print0 |od -An -vto1 |xargs printf ' %s' \ |sed 's/ 000/@/g' |tr @ '\n' >"$TMPFILE" while IFS="" read -r file; do file='printf '\134%s' $file' file='printf " $file@ "' file="${file%@}" echo "$file" done <"$TMPFILE" rm -f "$TMPFILE"

2.4 Use named pipes

We can use named pipes to solve the two problems mentioned above: now reading and writing can be performed in parallel, and we do not need to store intermediate results in variables. Please note, however, that this may not work in Cygwin.

 #!/usr/bin/env sh TMPFILE="/tmp/$$_'awk 'BEGIN{srand(); print rand()}''" mknod "$TMPFILE" p { exec 3>"$TMPFILE" find . -print0 |od -An -vto1 |xargs printf ' %s' \ |sed 's/ 000/@/g' |tr @ '\n' >&3 } & while IFS="" read -r file; do file='printf '\134%s' $file' file='printf " $file@ "' file="${file%@}" echo "$file" done <"$TMPFILE" rm -f "$TMPFILE"

3. Modify the above solutions to work with the original Bourne shell.

The above solutions should work in any POSIX shell, but fail in the original Bourne shell, which defaults to /bin/sh in Solaris 10 and earlier. This shell does not support % -substitution, and trailing newlines in file names must be stored in a different way, for example:

 #!/usr/bin/env sh TMPFILE="/tmp/$$_'awk 'BEGIN{srand(); print rand()}''" mknod "$TMPFILE" p { exec 3>"$TMPFILE" find . -print0 |od -An -vto1 |xargs printf ' %s' \ |sed 's/ 000/@/g' |tr @ '\n' >&3 } & while read -r file; do trailing_nl="" for char in $file; do if [ X"$char" = X"012" ]; then trailing_nl="${trailing_nl} " else trailing_nl="" fi done file='printf '\134%s' $file' file='printf "$file"' file="$file$trailing_nl" echo "$file" done <"$TMPFILE" rm -f "$TMPFILE"

4. Use a non-zero separator

As stated in the comments, Haravikka's answer is not entirely correct. Here is a modified version of his code that handles all kinds of strange situations, such as paths starting with ~:/\/: and ending with line feeds in file names. Note that this only works for relative path names; a similar trick can be done with absolute paths, preceding them with /./ , but read_path() needs to be changed to handle this. This method is based on the tricks of Richs sh (POSIX shell) .

 #!/usr/bin/env sh read_path() { path= IFS= read -r path || return $? read -r path_next || return 0 if [ X"$path" = X"././" ]; then path="./" read -r path_next || return 0 return fi path="./$path" while [ X"$path_next" != X"././" ]; do path='printf '%s\n%s' "$path" "$path_next"' read -r path_next || return 0 done } find ././ |sed 's,^\./\./,&\n,' |while read_path; do echo "$path" done

0

michau Jul 2 '19 at 17:02

source share

Adrian frühwirth · Accepted Answer · 2014-03-31T13:35:37+0000

See How can I find and safely handle file names containing newlines, spaces, or both? .

You can, for example, use find -exec :

 find [...] -exec <command> {} \;

or xargs -0 :

 find [...] -print0 | xargs -r0 <command>

Note that in the example above, you still need to set IFS or you trim leading / trailing spaces:

 while IFS= read -rd '' file; do do_something_with "${file}" done

You are right, this is a real bummer that this read only works correctly in bash . I usually don’t give a damn about possible new lines in the file names and just make sure that the portable code will not break if they happen (unlike ignoring the problem and your script explosion), which, in my opinion, is sufficient for most scenarios, eg,

 while IFS= read -r file; do [ -e "${file}" ] || continue # skip over truncated filenames due to newlines do_something_file "${file}" done < <(find [...])

or use globbing (if possible), which behaves correctly:

 for file in *.foo; do [ -e "${file}" ] || continue # or use nullglob do_something_file "${file}" done

How to select unnecessary results in a non-w50> shell

1. Use zsh

2. Use the POSIX shell and od

2.1 Use pipes

2.2 Use a variable to store intermediate results

2.3 Use a temporary file to store intermediate results

2.4 Use named pipes

3. Modify the above solutions to work with the original Bourne shell.

4. Use a non-zero separator

More articles:

1. Use `zsh`

2. Use the POSIX shell and `od`