Match the values in the first column of two files and append the corresponding rows in the new file

Question

Match the values in the first column of two files and append the corresponding rows in the new file

I need to find a match for the row in column 1 ($ 1) in file1.txt with the row in column 1 ($ 1) in file2.txt. Then I want to join the lines where there was a match in the new file.

cat file1.txt 1050008 5.156725968 8.404038296 124.9198605 3.23E-21 2.33E-17 38.57865782 3310747 5.631470026 8.581936875 124.6039122 3.34E-21 2.33E-17 38.55204806 5910451 4.900364671 8.455329195 124.5720603 3.35E-21 2.33E-17 38.54935989 730156 5.565210738 8.48792701 122.2168789 4.28E-21 2.33E-17 38.34773989 cat file2.txt 4230037 ILMN Controls ILMN_Controls ERCC-00071 ILMN_333646 ERCC-00071 ERCC-00071 1050008 ILMN Controls ILMN_Controls ERCC-00009 ILMN_333584 ERCC-00009 ERCC-00009 5260356 ILMN Controls ILMN_Controls ERCC-00053 ILMN_333628 ERCC-00053 ERCC-00053 3310747 ILMN Controls ILMN_Controls ERCC-00144 ILMN_333719 ERCC-00144 ERCC-00144 5910451 ILMN Controls ILMN_Controls ERCC-00003 ILMN_333578 ERCC-00003 ERCC-00003 1710435 ILMN Controls ILMN_Controls ERCC-00138 ILMN_333713 ERCC-00138 ERCC-00138 1400612 ILMN Controls ILMN_Controls ERCC-00084 ILMN_333659 ERCC-00084 ERCC-00084 730156 ILMN Controls ILMN_Controls ERCC-00017 ILMN_333592 ERCC-00017 ERCC-00017

I want the output file to look like this:

 out.txt 1050008 5.156725968 8.404038296 124.9198605 3.23E-21 2.33E-17 38.57865782 1050008 ILMN Controls ILMN_Controls ERCC-00009 ILMN_333584 ERCC-00009 ERCC-00009 3310747 5.631470026 8.581936875 124.6039122 3.34E-21 2.33E-17 38.55204806 3310747 ILMN Controls ILMN_Controls ERCC-00144 ILMN_333719 ERCC-00144 ERCC-00144 5910451 4.900364671 8.455329195 124.5720603 3.35E-21 2.33E-17 38.54935989 5910451 ILMN Controls ILMN_Controls ERCC-00003 ILMN_333578 ERCC-00003 ERCC-00003 730156 5.565210738 8.48792701 122.2168789 4.28E-21 2.33E-17 38.34773989 730156 ILMN Controls ILMN_Controls ERCC-00017 ILMN_333592 ERCC-00017 ERCC-00017

Files are tab delimited and have missing values in some columns.

There are 31 columns in file2.txt and> 47000 lines, and I'm trying to do this in bash (OSX)

If you have a solution, I would really appreciate it if you could briefly explain the steps, as I am very new to this.

+4

bash grep awk match

user1911823 Jan 7 '13 at 12:28

source share

2 answers

If you don't mind that the output is ordered by the first column, you can use this call to the join command:

 join <(sort file1.txt) <(sort file2.txt) >out.txt

+8

cmh Jan 7 '13 at 12:32

source share

Dimitre radoulov · Accepted Answer · 2013-01-07T12:32:48+0000

 awk 'BEGIN { FS = OFS = "\t" } NR == FNR { # while reading the 1st file # store its records in the array f f[$1] = $0 next } $1 in f { # when match is found # print all values print f[$1], $0 }' file1 file2

Match the values ​​in the first column of two files and append the corresponding rows in the new file

More articles:

Match the values in the first column of two files and append the corresponding rows in the new file