Bash for a loop with numerical names

I am currently working on a math project and just bumping into a brick wall with programming in bash.

I currently have a directory containing 800 text files, and what I want to do is run the cat loop of the first 80 files (from _01 to -80) into a new file and save to another location, then the next 80 (from _81 to _160) files, etc.

all files in the directory are listed as ath_01, ath_02, ath_03, etc.

Can anyone help?

So far, I:

#!/bin/bash for file in /dir/* do echo ${file} done 

Which just simply lists my file. I know that I need to use cat file1 file2> newfile.txt somehow, but this confuses me with the numbered extension _01, _02, etc.

Would it help me if I changed the file name to use something other than underscore? like ath.01 etc.

Greetings

+4
source share
4 answers

Since you know in advance how many files you have and how to number them, it may be easier to β€œexpand the loop”, so to speak, and use copy-and-paste and a little tweak to write a script that uses the parenthesis extension.

 #!/bin/bash cat ath_{001..080} > file1.txt cat ath_{081..160} > file2.txt cat ath_{161..240} > file3.txt cat ath_{241..320} > file4.txt cat ath_{321..400} > file5.txt cat ath_{401..480} > file6.txt cat ath_{481..560} > file7.txt cat ath_{561..640} > file8.txt cat ath_{641..720} > file9.txt cat ath_{721..800} > file10.txt 

Or else, use nested for-loops and the seq command

 N=800 B=80 for n in $( seq 1 $B $N ); do for i in $( seq $n $((n+B - 1)) ); do cat ath_$i done > file$((n/B + 1)).txt done 

The outer loop will iterate n through 1, 81, 161, etc. The inner loop will repeat i from 1 to 80, then from 81 to 160, etc. The body of the inner loops simply unloads the contents if the i th file is for standard output, but the aggregated loop output is stored in file 1, then 2, etc.

+5
source

You can try something like this:

 cat "$file" >> "concat_$(( ${file#/dir/ath_} / 80 ))" 
  • with ${file#/dir/ath_} you remove the prefix /dir/ath_ from the file name
  • $(( / 80 )) you get the suffix divided by 80 (integer division)

Also change the loop to

 for file in /dir/ath_* 

So, you get only the files you need

+4
source

If you need groups of 80 files, you should do everything possible so that the names are sortable; why leading zeros were often used. Assuming you have only one underscore in the file names, and also there are no newlines in the names, then:

 SOURCE="/path/to/dir" TARGET="/path/to/other/directory" ( cd $SOURCE || exit 1 ls | sort -t _ -k2,2n | awk -v target="$TARGET" \ '{ file[n++] = $1 if (n >= 80) { printf "cat" for (i = 0; i < 80; i++) printf(" %s", file[i] printf(" >%s/%s.%.2d\n", target, "newfile", ++number) n = 0 } END { if (n > 0) { printf "cat" for (i = 0; i < n; i++) printf(" %s", file[i] printf(" >%s/%s.%.2d\n", target, "newfile", ++number) } }' | sh -x ) 

Two directories are indicated (where files are located and where summaries should be displayed); The command changes the directory to the source directory (where 800 files are located). It lists the names (you can specify the glob pattern if you need to) and sorts them numerically. The output is served in awk , which generates a shell script on the fly. It collects 80 names at a time, then generates a cat command that copies these files into a single destination file, such as "newfile.01" ; configure printf() to suit your naming and numbering conventions. Shell commands are then passed to the shell for execution.

During testing, replace sh -x with nothing, or sh -vn or something similar. Add an active shell only when you are sure that it will do what you want. Remember that the shell script is in the source folder when it is running.

Superficially, the xargs command would be convenient to use; complexity coordinates the output file number. There may be a way to do this with the -n 80 option to group 80 files at a time and some fancy way to generate a call number, but I don't know about that.

Another option is to use xargs -n to execute a shell script that can print the correct output file number, indicating that it is already in the destination directory. That would be cleaner in many ways:

 SOURCE="/path/to/dir" TARGET="/path/to/other/directory" ( cd $SOURCE || exit 1 ls | sort -t _ -k2,2n | xargs -n 80 cpfiles "$TARGET" ) 

Where cpfiles as follows:

 TARGET="$1" shift if [ $# -gt 0 ] then old=$(ls -r newfile.?? | sed -n -e 's/newfile\.//p; 1q') new=$(printf "%.2d" $((old + 1))) cat " $@ " > "$TARGET/newfile. $new fi 

The test for null arguments avoids the problem of xargs executing a command once with null arguments. In general, I prefer this solution with awk .

+3
source

Here's the macro for @chepner's first solution, using GNU Make as the template language:

 SHELL := /bin/bash N = 800 B = 80 fileNums = $(shell seq 1 $$((${N}/${B})) ) files = ${fileNums:%=file%.txt} all: ${files} file%.txt : start = $(shell echo $$(( ($*-1)*${B}+1 )) ) file%.txt : end = $(shell echo $$(( $* * ${B} )) ) file%.txt: cat ath_{${start}..${end}} > $@ 

For use:

 $ make -n all cat ath_{1..80} > file1.txt cat ath_{81..160} > file2.txt cat ath_{161..240} > file3.txt cat ath_{241..320} > file4.txt cat ath_{321..400} > file5.txt cat ath_{401..480} > file6.txt cat ath_{481..560} > file7.txt cat ath_{561..640} > file8.txt cat ath_{641..720} > file9.txt cat ath_{721..800} > file10.txt 
+1
source

All Articles