Convert column format to matrix using awk

Question

Convert column format to matrix using awk

I have a gridded data file in column format:

ifile.txt xy value 20.5 20.5 -4.1 21.5 20.5 -6.2 22.5 20.5 0.0 20.5 21.5 1.2 21.5 21.5 4.3 22.5 21.5 6.0 20.5 22.5 7.0 21.5 22.5 10.4 22.5 22.5 16.7

I would like to convert it to matrix format:

 ofile.txt 20.5 21.5 22.5 20.5 -4.1 1.2 7.0 21.5 -6.2 4.3 10.4 22.5 0.0 6.0 16.7

Where the upper 20.5 21.5 22.5 indicates y, and the lateral values indicate x, and the internal values indicate the corresponding grid values.

I found a similar question here Convert a file with three columns to matrix format , but the script does not work in my case.

script -

 awk '{ h[$1,$2] = h[$2,$1] = $3 } END { for(i=1; i<=$1; i++) { for(j=1; j<=$2; j++) printf h[i,j] OFS printf "\n" } }' ifile

+8

linux shell awk

Kay Feb 14 '18 at 8:26

source share

4 answers

Perl Solution:

 #!/usr/bin/perl -an $h{ $F[0] }{ $F[1] } = $F[2] unless 1 == $.; END { @s = sort { $a <=> $b } keys %h; print ' ' x 5; printf '%5.1f' x @s, @s; print "\n"; for my $u (@s) { print "$u "; printf '%5.1f', $h{$u}{$_} for @s; print "\n"; } }

-n reads line by line input
-a breaks each line into spaces with an array @F
See sort , print , printf and keys .

+3

choroba Feb 14 '18 at 8:44

source share

awk solution:

 sort -n ifile.txt | awk 'BEGIN{header="\t"}NR>1{if((NR-1)%3==1){header=header sprintf("%4.1f\t",$1); matrix=matrix sprintf("%4.1f\t",$1)}matrix= matrix sprintf("%4.1f\t",$3); if((NR-1)%3==0 && NR!=10)matrix=matrix "\n"}END{print header; print matrix}'; 20.5 21.5 22.5 20.5 -4.1 1.2 7.0 21.5 -6.2 4.3 10.4 22.5 0.0 6.0 16.7

Explanations:

sort -n ifile.txt sort file numerically Header Variable
saves all the data necessary to create the header line that it initiates to header="\t" and will be added with the necessary information thanks to header=header sprintf("%4.1f\t",$1) for lines related to (NR-1)%3==1)
in the same way you create a matrix using the matrix variable: matrix=matrix sprintf("%4.1f\t",$1) will create the first column and matrix= matrix sprintf("%4.1f\t",$3) will fill the matrix with content, then if((NR-1)%3==0 && NR!=10)matrix=matrix "\n" will add the corresponding EOL

+3

Allan Feb 14 '18 at 9:05

source share

Corrected my old GNU awk solution for your current input:

matrixize.awk script:

 #!/bin/awk -f BEGIN { PROCINFO["sorted_in"]="@ind_num_asc"; OFS="\t" } NR==1{ next } { b[$1]; # accumulating unique indices ($1 != $2)? a[$1][$2] = $3 : a[$2][$1] = $3; # set `diagonal` relation between different indices } END { h = ""; for (i in b) { h = h OFS i # form header columns } print h; # print header column values for (i in b) { row = i; # index column # iterating through the row values (for each intersection point) for (j in a[i]) { row = row OFS a[i][j] } print row } }

Using:

 awk -f matrixize.awk yourfile

Exit:

  20.5 21.5 22.5 20.5 -4.1 1.2 7.0 21.5 -6.2 4.3 10.4 22.5 0.0 6.0 16.7

+3

Romanperekhrest Feb 14 '18 at 9:29

source share

kvantour · Accepted Answer · 2018-02-14T09:22:29+0000

The following awk script processes:

any matrix size
there is no relationship between row and column indices, so it tracks them separately.
If the column index of a specific column is not displayed, the default value will be zero.

This is done as follows:

 awk ' BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"} (NR==1){next} {row[$1]=1;col[$2]=1;val[$1" "$2]=$3} END { printf "%8s",""; for (j in col) { printf "%8.3f",j }; printf "\n" for (i in row) { printf "%8.3f",i; for (j in col) { printf "%8.3f",val[i" "j] }; printf "\n" } }' <file>

How it works:

PROCINFO["sorted_in"] = "@ind_num_asc" , states that all arrays are sorted numerically by index.
(NR==1){next} : skip the first line
{row[$1]=1;col[$2]=1;val[$1" "$2]=$3} , process the row while keeping the row and column index and associated value.
The end statement does the whole print.

It is output:

  20.500 21.500 22.500 20.500 -4.100 1.200 7.000 21.500 -6.200 4.300 10.400 22.500 0.000 6.000 16.700

Note: using PROCINFO is a gawk feature.

However, if you make a couple of assumptions, you can make it much shorter:

file contains all possible entries, missing values
You do not want to display row and column indices:
indexes are sorted into column-major-order

You can use the following short versions:

 sort -g <file> | awk '($1+0!=$1){next} ($1!=o)&&(NR!=1){printf "\n"} {printf "%8.3f",$3; o=$1 }'

which outputs

  -4.100 1.200 7.000 -6.200 4.300 10.400 0.000 6.000 16.700

or for transposed:

 awk '(NR==1){next} ($2!=o)&&(NR!=2){printf "\n"} {printf "%8.3f",$3; o=$2 }' <file>

Displays

  -4.100 -6.200 0.000 1.200 4.300 6.000 7.000 10.400 16.700

Convert column format to matrix using awk

More articles: