Matching CSV fields by name using awk

Question

Matching CSV fields by name using awk

Suppose I have a CSV file with the headers of the following form:

Field1,Field2 3,262000 4,449000 5,650000 6,853000 7,1061000 8,1263000 9,1473000 10,1683000 11,1893000

I would like to write an awk script that takes a list of comma delimited target field names, splits it into an array, and selects only those columns with the names that I specify.

This is what I have tried so far, and I have made sure that the head array contains the desired headers, and the targets array contains the targets passed by this command line.

 BEGIN{ FS="," split(target, targets, ",") } NR==1 { for (i = 1; i <= NF; i++) head[i] = $i } NR !=1{ for (i = 1; i <= NF; i++) { if (head[i] in targets){ print $i } } }

When I invoke this script with the command

awk -v target = Field1 -f GetCol.awk Debug.csv

I am not printing anything.

+4

awk csv gawk

merlin2011 Apr 18 '13 at 20:10

source share

3 answers

My two cents:

 BEGIN{ OFS=FS="," split(target,fields,FS) # We just set FS don't hard the comma here for (i in fields) # Distinct var name to aviod headaches field_idx[fields[i]] = i # Reverse lookup } NR==1 { # Process header for (i=1;i<=NF;i++) # For each field header head[i] = $i # Add to hash for comparision with target next # Skip to next line } { # Don't need invert condition (used next) sep="" # Set for leading separator for (i=1;i<=NF;i++) # For each field if (head[i] in field_idx) { # Test for current field is a target field printf "%s%s",sep,$i # Print the column if matched sep=OFS # Set separator to OFS } printf "\n" # Print newline character }

+5

Chris seymour Apr 18 '13 at 20:41

source share

@Sudo_O solution extension (thanks) that

displays fields from standard input based on command line arguments,
displays fields in the requested order (possibly several times),
displays the placeholder when the field is requested, but not found, and
warns of a standard error about duplicate field names in the header.

 #!/usr/bin/awk -f # Process standard input outputting named columns provided as arguments. # # For example, given foo.dat containing # abcc # 1a 1b 1c 1C # 2a 2b 2c 2C # 3a 3b 3c 3C # Running # cat foo.dat | ./namedcols cbaad # will output # 1c 1b 1a 1a d # 2c 2b 2a 2a d # 3c 3b 3a 3a d # and will warn on standard error that it # Ignored duplicate 'c' in column 4 # Notice that the requested but missing column d contains "d". # # Using awk -F feature it is possible to parse comma-separated data: # cat foo.csv | ./namedcols -F, cbaad BEGIN { for (i=1; i<ARGC; ++i) desired[i] = ARGV[i] delete ARGV } NR==1 { for (i=1; i<=NF; i++) if ($i in names) printf "Ignored duplicate '%s' in column %d\n", $i, i | "cat 1>&2" else names[$i] = i next } { for (i=1; i<ARGC; ++i) printf "%s%s", \ (i==1 ? "" : OFS), \ ((ndx = names[name = desired[i]])>0 ? $ndx: name) printf RS }

+1

Rhys ulerich Nov 25 '13 at 16:26

source share

merlin2011 · Accepted Answer · 2013-04-18T20:25:48+0000

I realized this and am posting an answer if others are facing the same problem.

This is due to the in keyword, which I use to test array membership. This keyword checks if the operand on the left is one of the indices in the array on the right, and not on the value. The fix is to create a reverse lookup array, as shown below.

 BEGIN{ OFS=FS="," split(target, t_targets, ",") for (i in t_targets) targets[t_targets[i]] = i }

Matching CSV fields by name using awk

More articles: