Here is one way: GNU awk . Run as:
awk -f script.awk data.txt
The content of script.awk :
/^>/ { file = substr($1,2) next } { a[file][$1] } END { for (i in a) { while ( ( getline line < ("./F1/" i) ) > 0 ) { split(line,b) for (j in a[i]) { if (b[6]==j) { print line > "./F1/" i ".new" } } } system(sprintf("mv ./F1/%s.new ./F1/%s", i, i)) } }
Alternatively, here is a single line:
awk '/^>/ { file = substr($1,2); next } { a[file][$1] } END { for (i in a) { while ( ( getline line < ("./F1/" i) ) > 0 ) { split(line,b); for (j in a[i]) if (b[6]==j) print line > "./F1/" i ".new" } system(sprintf("mv ./F1/%s.new ./F1/%s", i, i)) } }' data.txt
If you have an older version of awk installed that is older than GNU Awk 4.0.0 , you can try the following. Run as:
awk -f script.awk data.txt
The contents of script.awk:
/^>/ { file = substr($1,2) next } { a[file]=( a[file] ? a[file] SUBSEP : "") $1 } END { for (i in a) { split(a[i],b,SUBSEP) while ( ( getline line < ("./F1/" i) ) > 0 ) { split(line,c) for (j in b) { if (c[6]==b[j]) { print line > "./F1/" i ".new" } } } system(sprintf("mv ./F1/%s.new ./F1/%s", i, i)) } }
Alternatively, here is a single line:
awk '/^>/ { file = substr($1,2); next } { a[file]=( a[file] ? a[file] SUBSEP : "") $1 } END { for (i in a) { split(a[i],b,SUBSEP); while ( ( getline line < ("./F1/" i) ) > 0 ) { split(line,c); for (j in b) if (c[6]==b[j]) print line > "./F1/" i ".new" } system(sprintf("mv ./F1/%s.new ./F1/%s", i, i)) } }' data.txt
Note that this script does exactly what you are describing. He expects files such as 1BN5.txt and 1B24.txt to be in the F1 folder in the current working directory. It will also overwrite your original files. If this is not the desired behavior, release the system() call. NTN.
Results:
Content F1/1BN5.txt :
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C ATOM 615 H LEU B 208 3.361 -5.394 -6.021 1.00 10.00 H ATOM 616 HA LEU B 211 2.930 -4.494 -3.302 1.00 10.00 H
Content F1/1B24.txt :
ATOM 631 CG MET B 88 -0.828 -0.688 -7.575 1.00 10.00 C ATOM 632 SD MET B 88 -2.380 -0.156 -6.830 1.00 10.00 S ATOM 643 N ALA B 92 -1.541 -4.371 -5.366 1.00 10.00 N