How to move a list to a table in bash

I would like to transfer the list of elements (key / value pairs) to the table format. The solution could be a bash script, awk, sed, or some other way.

Suppose I have a long list, for example:

date and time: 2013-02-21 18:18 PM file size: 1283483 bytes key1: value key2: value date and time: 2013-02-21 18:19 PM file size: 1283493 bytes key2: value ... 

I would like to transfer to a table format with a tab or other delimiter to look like this:

 date and time file size key1 key2 2013-02-21 18:18 PM 1283483 bytes value value 2013-02-21 18:19 PM 1283493 bytes value ... 

or like this:

 date and time|file size|key1|key2 2013-02-21 18:18 PM|1283483 bytes|value|value 2013-02-21 18:19 PM|1283493 bytes||value ... 

I looked at solutions such as Efficient way to transfer a file to Bash , but it looks like I have another case. The awk solution works partially for me, it outputs all the rows in a long list of columns, but I need the columns to be bound to a unique list.

 awk -F': ' ' { for (i=1; i<=NF; i++) { a[NR,i] = $i } } NF>p { p = NF } END { for(j=1; j<=p; j++) { str=a[1,j] for(i=2; i<=NR; i++){ str=str" "a[i,j]; } print str } }' filename 

UPDATE

Thanks to everyone who provides your solutions. Some of them look very promising, but I think my version of the tools may be outdated and I get some syntax errors. Now I see that I did not start with very clear requirements. Kudos to sputnick for being the first to offer a solution before I set out all the requirements. I had a long day when I wrote a question, and therefore it was not very clear.

My goal is to develop a very general solution for parsing multiple lists of items in column format. I think the solution does not need to support more than 255 columns. Column names will not be known in advance, so the solution will work for everyone, not just for me. Two well-known things are the separator between kev / value (":") pairs and the separator between lists (empty line). It would be nice to have a variable so that they are set up for others to reuse it.

From a look at the proposed solutions, I understand that a good approach is to make two passes over the input file. The first pass is to collect all the column names, sort them if necessary, and then print the header. Secondly, to capture the values โ€‹โ€‹of the columns and print them.

+4
source share
4 answers

Here's a purely awk solution:

 # split lines on ": " and use "|" for output field separator BEGIN { FS = ": "; i = 0; h = 0; ofs = "|" } # empty line - increment item count and skip it /^\s*$/ { i++ ; next } # normal line - add the item to the object and the header to the header list # and keep track of first seen order of headers { current[i, $1] = $2 if (!($1 in headers)) {headers_ordered[h++] = $1} headers[$1] } END { h-- # print headers for (k = 0; k <= h; k++) { printf "%s", headers_ordered[k] if (k != h) {printf "%s", ofs} } print "" # print the items for each object for (j = 0; j <= i; j++) { for (k = 0; k <= h; k++) { printf "%s", current[j, headers_ordered[k]] if (k != h) {printf "%s", ofs} } print "" } } 

Input example (note that after the last element there should be a new line):

 foo: bar foo2: bar2 foo1: bar foo: bar3 foo3: bar3 foo2: bar3 

Output Example:

 foo|foo2|foo1|foo3 bar|bar2|bar| bar3|bar3||bar3 

Note : you will probably have to change this if your data has built in it .:

+1
source

Here is one way: GNU awk . Run as:

 awk -f script.awk file 

The content of script.awk :

 BEGIN { # change this to OFS="\t" for tab delimited ouput OFS="|" # treat each record as a set of lines RS="" FS="\n" } { # keep a count of the records ++i # loop through each line in the record for (j=1;j<=NF;j++) { # split each line in two split($j,a,": ") # just holders for the first two lines in the record if (j==1) { date = a[1] } if (j==2) { size = a[1] } # keep a tally of the unique key names if (j>=3) { !x[a[1]] } # the data in a multidimensional array: # record number . key = value b[i][a[1]]=a[2] } } END { # sort the unique keys m = asorti(x,y) # add the two strings to a numerically indexed array c[1] = date c[2] = size # set a variable to continue from f=2 # loop through the sorted array of unique keys for (j=1;j<=m;j++) { # build the header line from the file by adding the sorted keys r = (r ? r : date OFS size) OFS y[j] # continue to add the sorted keys to the numerically indexed array c[++f] = y[j] } # print the header and empty print r r = "" # loop through the records ('i' is the number of records) for (j=1;j<=i;j++) { # loop through the subrecords ('f' is the number of unique keys) for (k=1;k<=f;k++) { # build the output line r = (r ? r OFS : "") b[j][c[k]] } # and print and empty it ready for the next record print r r = "" } } 

Here is the contents of a test file called file :

 date and time: 2013-02-21 18:18 PM file size: 1283483 bytes key1: value1 key2: value2 date and time: 2013-02-21 18:19 PM file size: 1283493 bytes key2: value2 key1: value1 key3: value3 date and time: 2013-02-21 18:20 PM file size: 1283494 bytes key3: value3 key4: value4 date and time: 2013-02-21 18:21 PM file size: 1283495 bytes key5: value5 key6: value6 

Results:

 2013-02-21 18:18 PM|1283483 bytes|value1|value2|||| 2013-02-21 18:19 PM|1283493 bytes|value1|value2|value3||| 2013-02-21 18:20 PM|1283494 bytes|||value3|value4|| 2013-02-21 18:21 PM|1283495 bytes|||||value5|value6 
+2
source

This does not make any assumptions about the structure of the columns, so he does not try to arrange them, however, all fields are printed in the same order for all records:

 use strict; use warnings; my (@db, %f, %fields); my $counter = 1; while (<>) { my ($field, $value) = (/([^:]*):\s*(.*)\s*$/); if (not defined $field) { push @db, { %f }; %f = (); } else { $f{$field} = $value; $fields{$field} = $counter++ if not defined $fields{$field}; } } push @db, \%f; #my @fields = sort keys %fields; # alphabetical order my @fields = sort {$fields{$a} cmp $fields{$b} } keys %fields; #first seen order # print header print join("|", @fields), "\n"; # print rows for my $row (@db) { print join("|", map { $row->{$_} ? $row->{$_} : "" } @fields), "\n"; } 
+1
source

Using

 use strict; use warnings; # read the file paragraph by paragraph $/ = "\n\n"; print "date and time|file size|key1|key2\n"; # parsing the whole file with the magic diamond operator while (<>) { if (/^date and time:\s+(.*)/m) { print "$1|"; } if (/^file size:(.*)/m) { print "$1|"; } if (/^key1:(.*)/m) { print "$1|"; } else { print "|"; } if (/^key2:(.*)/m) { print "$1\n"; } else { print "\n"; } } 

Using

 perl script.pl file 

Output

 date and time|file size|key1|key2 2013-02-21 18:18 PM| 1283483 bytes| value| value 2013-02-21 18:19 PM| 1283493 bytes|| value 
0
source

All Articles