Convert column format to matrix using awk

I have a gridded data file in column format:

ifile.txt xy value 20.5 20.5 -4.1 21.5 20.5 -6.2 22.5 20.5 0.0 20.5 21.5 1.2 21.5 21.5 4.3 22.5 21.5 6.0 20.5 22.5 7.0 21.5 22.5 10.4 22.5 22.5 16.7 

I would like to convert it to matrix format:

 ofile.txt 20.5 21.5 22.5 20.5 -4.1 1.2 7.0 21.5 -6.2 4.3 10.4 22.5 0.0 6.0 16.7 

Where the upper 20.5 21.5 22.5 indicates y, and the lateral values ​​indicate x, and the internal values ​​indicate the corresponding grid values.

I found a similar question here Convert a file with three columns to matrix format , but the script does not work in my case.

script -

 awk '{ h[$1,$2] = h[$2,$1] = $3 } END { for(i=1; i<=$1; i++) { for(j=1; j<=$2; j++) printf h[i,j] OFS printf "\n" } }' ifile 
+8
linux shell awk
source share
4 answers

The following awk script processes:

  • any matrix size
  • there is no relationship between row and column indices, so it tracks them separately.
  • If the column index of a specific column is not displayed, the default value will be zero.

This is done as follows:

 awk ' BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"} (NR==1){next} {row[$1]=1;col[$2]=1;val[$1" "$2]=$3} END { printf "%8s",""; for (j in col) { printf "%8.3f",j }; printf "\n" for (i in row) { printf "%8.3f",i; for (j in col) { printf "%8.3f",val[i" "j] }; printf "\n" } }' <file> 

How it works:

  • PROCINFO["sorted_in"] = "@ind_num_asc" , states that all arrays are sorted numerically by index.
  • (NR==1){next} : skip the first line
  • {row[$1]=1;col[$2]=1;val[$1" "$2]=$3} , process the row while keeping the row and column index and associated value.
  • The end statement does the whole print.

It is output:

  20.500 21.500 22.500 20.500 -4.100 1.200 7.000 21.500 -6.200 4.300 10.400 22.500 0.000 6.000 16.700 

Note: using PROCINFO is a gawk feature.

However, if you make a couple of assumptions, you can make it much shorter:

  • file contains all possible entries, missing values
  • You do not want to display row and column indices:
  • indexes are sorted into column-major-order

You can use the following short versions:

 sort -g <file> | awk '($1+0!=$1){next} ($1!=o)&&(NR!=1){printf "\n"} {printf "%8.3f",$3; o=$1 }' 

which outputs

  -4.100 1.200 7.000 -6.200 4.300 10.400 0.000 6.000 16.700 

or for transposed:

 awk '(NR==1){next} ($2!=o)&&(NR!=2){printf "\n"} {printf "%8.3f",$3; o=$2 }' <file> 

Displays

  -4.100 -6.200 0.000 1.200 4.300 6.000 7.000 10.400 16.700 
+5
source share

Perl Solution:

 #!/usr/bin/perl -an $h{ $F[0] }{ $F[1] } = $F[2] unless 1 == $.; END { @s = sort { $a <=> $b } keys %h; print ' ' x 5; printf '%5.1f' x @s, @s; print "\n"; for my $u (@s) { print "$u "; printf '%5.1f', $h{$u}{$_} for @s; print "\n"; } } 
  • -n reads line by line input
  • -a breaks each line into spaces with an array @F
  • See sort , print , printf and keys .
+3
source share

awk solution:

 sort -n ifile.txt | awk 'BEGIN{header="\t"}NR>1{if((NR-1)%3==1){header=header sprintf("%4.1f\t",$1); matrix=matrix sprintf("%4.1f\t",$1)}matrix= matrix sprintf("%4.1f\t",$3); if((NR-1)%3==0 && NR!=10)matrix=matrix "\n"}END{print header; print matrix}'; 20.5 21.5 22.5 20.5 -4.1 1.2 7.0 21.5 -6.2 4.3 10.4 22.5 0.0 6.0 16.7 

Explanations:

  • sort -n ifile.txt sort file numerically Header Variable
  • saves all the data necessary to create the header line that it initiates to header="\t" and will be added with the necessary information thanks to header=header sprintf("%4.1f\t",$1) for lines related to (NR-1)%3==1)
  • in the same way you create a matrix using the matrix variable: matrix=matrix sprintf("%4.1f\t",$1) will create the first column and matrix= matrix sprintf("%4.1f\t",$3) will fill the matrix with content, then if((NR-1)%3==0 && NR!=10)matrix=matrix "\n" will add the corresponding EOL
+3
source share

Corrected my old GNU awk solution for your current input:

matrixize.awk script:

 #!/bin/awk -f BEGIN { PROCINFO["sorted_in"]="@ind_num_asc"; OFS="\t" } NR==1{ next } { b[$1]; # accumulating unique indices ($1 != $2)? a[$1][$2] = $3 : a[$2][$1] = $3; # set `diagonal` relation between different indices } END { h = ""; for (i in b) { h = h OFS i # form header columns } print h; # print header column values for (i in b) { row = i; # index column # iterating through the row values (for each intersection point) for (j in a[i]) { row = row OFS a[i][j] } print row } } 

Using:

 awk -f matrixize.awk yourfile 

Exit:

  20.5 21.5 22.5 20.5 -4.1 1.2 7.0 21.5 -6.2 4.3 10.4 22.5 0.0 6.0 16.7 
+3
source share

All Articles