Match the values ​​in the first column of two files and append the corresponding rows in the new file

I need to find a match for the row in column 1 ($ 1) in file1.txt with the row in column 1 ($ 1) in file2.txt. Then I want to join the lines where there was a match in the new file.

cat file1.txt 1050008 5.156725968 8.404038296 124.9198605 3.23E-21 2.33E-17 38.57865782 3310747 5.631470026 8.581936875 124.6039122 3.34E-21 2.33E-17 38.55204806 5910451 4.900364671 8.455329195 124.5720603 3.35E-21 2.33E-17 38.54935989 730156 5.565210738 8.48792701 122.2168789 4.28E-21 2.33E-17 38.34773989 cat file2.txt 4230037 ILMN Controls ILMN_Controls ERCC-00071 ILMN_333646 ERCC-00071 ERCC-00071 1050008 ILMN Controls ILMN_Controls ERCC-00009 ILMN_333584 ERCC-00009 ERCC-00009 5260356 ILMN Controls ILMN_Controls ERCC-00053 ILMN_333628 ERCC-00053 ERCC-00053 3310747 ILMN Controls ILMN_Controls ERCC-00144 ILMN_333719 ERCC-00144 ERCC-00144 5910451 ILMN Controls ILMN_Controls ERCC-00003 ILMN_333578 ERCC-00003 ERCC-00003 1710435 ILMN Controls ILMN_Controls ERCC-00138 ILMN_333713 ERCC-00138 ERCC-00138 1400612 ILMN Controls ILMN_Controls ERCC-00084 ILMN_333659 ERCC-00084 ERCC-00084 730156 ILMN Controls ILMN_Controls ERCC-00017 ILMN_333592 ERCC-00017 ERCC-00017 

I want the output file to look like this:

 out.txt 1050008 5.156725968 8.404038296 124.9198605 3.23E-21 2.33E-17 38.57865782 1050008 ILMN Controls ILMN_Controls ERCC-00009 ILMN_333584 ERCC-00009 ERCC-00009 3310747 5.631470026 8.581936875 124.6039122 3.34E-21 2.33E-17 38.55204806 3310747 ILMN Controls ILMN_Controls ERCC-00144 ILMN_333719 ERCC-00144 ERCC-00144 5910451 4.900364671 8.455329195 124.5720603 3.35E-21 2.33E-17 38.54935989 5910451 ILMN Controls ILMN_Controls ERCC-00003 ILMN_333578 ERCC-00003 ERCC-00003 730156 5.565210738 8.48792701 122.2168789 4.28E-21 2.33E-17 38.34773989 730156 ILMN Controls ILMN_Controls ERCC-00017 ILMN_333592 ERCC-00017 ERCC-00017 

Files are tab delimited and have missing values ​​in some columns.

There are 31 columns in file2.txt and> 47000 lines, and I'm trying to do this in bash (OSX)

If you have a solution, I would really appreciate it if you could briefly explain the steps, as I am very new to this.

+4
source share
2 answers
 awk 'BEGIN { FS = OFS = "\t" } NR == FNR { # while reading the 1st file # store its records in the array f f[$1] = $0 next } $1 in f { # when match is found # print all values print f[$1], $0 }' file1 file2 
+10
source

If you don't mind that the output is ordered by the first column, you can use this call to the join command:

 join <(sort file1.txt) <(sort file2.txt) >out.txt 
+8
source

All Articles