Linux awk merges two files

I have below script to merge two files.

awk -F"\t" ' {key = $1} !(key in result) {result[key] = $0; next;} { for (i=2; i <= NF; i++) result[key] = result[key] FS $i } END { PROCINFO["sorted_in"] = "@ind_str_asc" # if using GNU awk for (key in result) print result[key] } ' $1 $2 > $3 

The first column is the key, and both are $ 1 and $ 2. But if column $ 2 has a key, but column $ 1 does not have a key.

then it concatenates except the line $ 1.

I want to combine only the unlocked key $ 1. How can I just merge these two files?

For instance,

File 1

 Key Column1 Column2 Column3 Test1 500 400 200 Test2 499 400 200 Test5 600 200 150 Test6 600 199 150 Test7 599 199 100 

File2

 Key Column4 Column5 Test1 Good Good Test2 Good Good Test3 Good Good Test4 Good Good Test5 Good Good Test6 Good Good Test7 Good Good 

Current mill

 Key Column1 Column2 Column3 Column4 Column5 Test1 500 400 200 Good Good Test2 499 400 200 Good Good Test5 600 200 150 Good Good Test6 600 199 150 Good Good Test7 599 199 100 Good Good Test3 Good Good Test4 Good Good 

Expected Volume.

 Key Column1 Column2 Column3 Column4 Column5 Test1 500 400 200 Good Good Test2 499 400 200 Good Good Test5 600 200 150 Good Good Test6 600 199 150 Good Good Test7 599 199 100 Good Good 

Thanks!

+5
source share
3 answers

You are going to do it wrong. What you are describing is a join operation, and there is a very good UNIX tool with a very obvious name for this:

 $ join file1 file2 | column -t Key Column1 Column2 Column3 Column4 Column5 Test1 500 400 200 Good Good Test2 499 400 200 Good Good Test5 600 200 150 Good Good Test6 600 199 150 Good Good Test7 599 199 100 Good Good 

or if you insist on awk:

 $ awk 'NR==FNR{m[$1]=$2" "$3; next} {print $0, m[$1]}' file2 file1 | column -t Key Column1 Column2 Column3 Column4 Column5 Test1 500 400 200 Good Good Test2 499 400 200 Good Good Test5 600 200 150 Good Good Test6 600 199 150 Good Good Test7 599 199 100 Good Good 
+7
source

Add condition when saving to array

 {key = $1} !(key in result) && NR == FNR {result[key] = $0; next;} (key in result) { for (i=2; i <= NF; i++) { result[key] = result[key] FS $i } } END { PROCINFO["sorted_in"] = "@ind_str_asc" # if using GNU awk for (key in result) print result[key] } 

NR == FNR ensures that the key that we store in result is from the 1st file. We also add (key in result) to make sure that the key exists before we go through the for loop.

+4
source

You can try the following command:

 awk ' BEGIN { FS = OFS = "\t" } {key = $1} FNR == NR {result[key] = $0; next;} (key in result) { for (i=2; i <= NF; i++) result[key] = result[key] FS $i } END { PROCINFO["sorted_in"] = "@ind_str_asc" # if using GNU awk for (key in result) print result[key] } ' file1 file2 

I changed these checks. FNR == NR saves only result lines from the first file. And (key in result) is applied to the second file and adds only columns for those keys that were found earlier in the first file.

This gives:

 Key Column1 Column2 Column3 Column4 Column5 Test1 500 400 200 Good Good Test2 499 400 200 Good Good Test5 600 200 150 Good Good Test6 600 199 150 Good Good Test7 599 199 100 Good Good 
+4
source

All Articles