Comparing files with awk

Hi, I have two similar files (both with 3 columns). I would like to check if these two files contain the same elements (but are listed in different orders). First of all, I would like to compare only the 1st column

file1.txt

"aba" 0 0 "abc" 0 1 "abd" 1 1 "xxx" 0 0 

file2.txt

 "xyz" 0 0 "aba" 0 0 "xxx" 0 0 "abc" 1 1 

How can I do this with awk? I tried to look around, but found only complicating examples. What if I want to include two other columns in the comparison? The result should give me the number of matching elements.

+8
comparison awk compare
source share
2 answers

To print common items in both files:

 $ awk 'NR==FNR{a[$1];next}$1 in a{print $1}' file1 file2 "aba" "abc" "xxx" 

Explanation:

NR and FNR are awk variables that store the total number of records and the number of records in the current files, respectively (by default, this is a string).

 NR==FNR # Only true when in the first file { a[$1] # Build associative array on the first column of the file next # Skip all proceeding blocks and process next line } ($1 in a) # Check in the value in column one of the second files is in the array { # If so print it print $1 } 

If you want to combine whole lines, use $0 :

 $ awk 'NR==FNR{a[$0];next}$0 in a{print $0}' file1 file2 "aba" 0 0 "xxx" 0 0 

Or a specific set of columns:

 $ awk 'NR==FNR{a[$1,$2,$3];next}($1,$2,$3) in a{print $1,$2,$3}' file1 file2 "aba" 0 0 "xxx" 0 0 
+25
source share

To print the number of matching items, here is one way: awk :

 awk 'FNR==NR { a[$1]; next } $1 in a { c++ } END { print c }' file1.txt file2.txt 

Results using input:

 3 

If you want to add additional columns (for example, columns one, two, and three), use a pseudo-multidimensional array :

 awk 'FNR==NR { a[$1,$2,$3]; next } ($1,$2,$3) in a { c++ } END { print c }' file1.txt file2.txt 

Results using input:

 2 
+6
source share

All Articles