Best way to modify a file when using pipes?

Question

Best way to modify a file when using pipes?

I often have shell programming tasks in which I run this template:

cat file | some_script > file

This is unsafe - the cat may not have read the entire file before some_script starts writing to it. I really don't want to write the result to a temporary file (it's slow, and I don't want to add the complexity of coming up with a unique new name).

Perhaps there is a standard shell command that will buffer the entire stream until EOF is reached? Something like:

 cat file | bufferUntilEOF | script > file

Ideas?

+8

bash shell pipe

user48956 Jan 18 '10 at 23:07

source share

8 answers

chazomaticus · Answer 1 · 2010-01-18T23:26:46+0000

You are looking for a sponge .

Juliano · Answer 2 · 2010-01-18T23:31:21+0000

Using a temporary file is the right solution here. When you use ">" type redirection, it is processed by the shell, and no matter how many commands are in your pipeline, the shell can freely delete and overwrite the output file before executing any command (when setting up the pipeline).

R Samuel Klatchko · Answer 3 · 2010-01-19T00:32:59+0000

Like many others, I like to use temporary files. I use the shell process identifier as part of the temporary name, so if multiple instances of the script work at the same time, they will not conflict. Finally, I only overwrite the original file if the script succeeds (using a short circuit with the Boolean operator - it is a little tight, but very nice for simple command lines). Putting it all together, it will look like this:

 some_script < file > smscrpt.$$ && mv smscrpt.$$ file

This will result in a temporary file if the command fails. If you want to clear the error, you can change it to:

 some_script < file > smscrpt.$$ && mv smscrpt.$$ file || rm smscrpt.$$

By the way, I got rid of the cat’s misuse and replaced it with input redirection.

Dennis williamson · Answer 4 · 2010-01-18T23:56:48+0000

Using mktemp(1) or tempfile(1) saves you money on creating a unique file name.

John weldon · Answer 5 · 2010-01-18T23:24:20+0000

Using a temporary file is IMO better than trying to buffer data in a pipeline.

It almost defeats the goal of pipelines for buffering them.

D.Shawley · Answer 6 · 2010-01-19T00:16:49+0000

I think the best way is to use a temporary file. However, if you need a different approach, you can use something like awk to buffer input into memory before your application starts receiving input. The following script will buffer all the input to the lines array before it starts to output it to the next user in the pipeline.

 { lines[NR] = $0; } END { for (line_no=1; line_no<=NR; ++line_no) { print lines[line_no]; } }

You can collapse it into a single line if you want:

 cat file | awk '{lines[NR]=$0;} END {for(i=1;i<=NR;++i) print lines[i];}' > file

With all this, I still recommend using a temporary file for output and then overwriting the original file.

cxw · Answer 7 · 2015-07-09T18:53:55+0000

In response to the OP question above about using sponge without external dependencies and on @ D.Shawley answer , you can only have a sponge effect with gawk dependency, which is not uncommon on Unix or Unix-like systems:

 cat foo | gawk -voutfn=foo '{lines[NR]=$0;} END {if(NR>0){print lines[1]>outfn;} for(i=2;i<=NR;++i) print lines[i] >> outfn;}'

Checking for NR>0 is truncating the input file.

To use this in a shell script, change -voutfn=foo to -voutfn="$1" or any other syntax your shell uses for file name arguments. For example:

 #!/bin/bash cat "$1" | gawk -voutfn="$1" '{lines[NR]=$0;} END {if(NR>0){print lines[1]>outfn;} for(i=2;i<=NR;++i) print lines[i] >> outfn;}'

Please note that unlike a real sponge , this may be limited by RAM size. sponge actually performs buffering in a temporary file, if necessary.

Rich · Answer 8 · 2019-07-04T15:30:59+0000

I think you need to use mktemp . Something like this will work:

 FILE=example-input.txt TMP='mktemp' some_script <"$FILE" >"$TMP" mv "$TMP" "$FILE"

Best way to modify a file when using pipes?

More articles: