I have time series files 0000.vx.dat, 0000.vy.dat, 0000.vz.dat; ...; 0077.vx.dat, 0077.vy.dat, 0077.vz.dat ... Each file is a 2D matrix divided by space. I would like to take each triplet of files and combine them all into a data format based on coordinates, that is:
[timestep + 1] [i] [j] [vx (i, j)] [vy (i, j)] [vz (i, j)]
Each file number corresponds to a specific time step. Given the amount of data that I have in this time series (~ 4 GB), bash didnβt shorten it, so it seemed to be time to approach awk ... specifically mawk. It was pretty stupid to try this in bash, but here is my ill-fated attempt:
for x in $(seq 1 78) do tfx=${tf[$x]} # an array of padded zeros for y in $(seq 1 1568) do for z in $(seq 1 1344) do echo $x $y $z $(awk -vi=$z -vj=$y "FNR == i {print j}" $tfx.vx.dat) $(awk -vi=$z -vj=$y "FNR == i {print j}" $tfx.vy.dat) $(awk -vi=$z -vj=$y "FNR == i {print j}" $tfx.vz.dat) >> $file done done done
edit: Thanks, ruakh, for pointing out that I saved j in a variable shell format with $ in front! This is just a fragment of the original script, but I think it will be considered his gut!
Suffice it to say that it would take about six months due to all the bash memory overhead associated with O (MxN) algorithms, subshells and pipes, and much more. I searched most during the day for maximum. Each file is about 18 MB, so this should not be a problem. I would be happy to do this one timestep at a time in awk, provided that I get one output file at a time. I think I could just get around them all without much trouble. It is important, however, that the time step number be the first item in the coordinate list. I could achieve this with the awk -v argument (see above) using the bash procedure. I donβt know how to search for certain matrix elements in three separate files and combine them into one output. This is the main obstacle that I would like to overcome. I was hoping mawk could provide a good balance between effort and computational speed. If this seems too big for an awk script, I could go for something lower and appreciate any of those who answered, letting me know that I should just go to C.
Thank you in advance! I really like awk, but I'm afraid I'm new.
Three files, 0000.vx.dat, 0000.vy.dat and 0000.vz.dat will be read as follows (except for huge and correct sizes):
0000.vx.dat:
1 2 3 4 5 6 7 8 9
0000.vy.dat:
10 11 12 13 14 15 16 17 18
0000.vz.dat:
19 20 21 22 23 24 25 26 27
I would like to be able to enter:
awk -vt=1 -f stackoverflow.awk 0000.vx.dat 0000.vy.dat 0000.vz.dat
and get the following output:
1 1 1 1 10 19 1 1 2 2 11 20 1 1 3 3 12 21 1 2 1 4 13 22 1 2 2 5 14 23 1 2 3 6 15 24 1 3 1 7 16 25 1 3 2 8 17 26 1 3 3 9 18 27
edit: Thanks shellter for suggesting clear input and output!