First get the general values ββfrom the third column. Then filter the rows from both files that have the corresponding third column.
If the columns are separated by a single character, you can use cut to extract a single column. For columns that can be separated by any number of spaces, use awk . One way to get the general values ββof column 3 is to extract the columns, sort them, and call comm . Using bash / ksh / zsh substitutions:
comm -12 <(awk '{print $3}' file1 | sort -u) <(awk '{print $3}' file2 | sort -u)
Now turn them into grep templates and filter.
comm -12 <(awk '{print $3}' file1 | sort -u) <(awk '{print $3}' file2 | sort -u) | sed -e 's/[][.\|?*+^$]/\\&/g' \ -e 's/.*/^[^[:space]]+[[:space]]+[^[:space]]+[[:space]]+\1[[:space]]/' | grep -E -f - file1 file2
The method above should work well with huge files. But for 500 thousand lines, you do not have huge files. These files should be conveniently located in memory, and a simple Perl solution will be fine. Download both files, extract the column values, print the corresponding columns.
perl -n -e ' @lines += 1; $c = (split)[2]; $seen{$c}{$ARGV} = 1; END { foreach (@lines) { $c = (split)[2]; print if %{$seen{$c}} == 2; } }' file1 file2
source share