Unix: merge files based on column value

Question

Unix: merge files based on column value

I have two files that look like this:

File 1 (2 columns):

ID1 123
ID2 234
ID3 232
ID4 344
...

File 2 (> 1 million columns)

ID2 A C ...
ID3 G T ...
ID1 C T ...
ID4 A C ... 
...

I want to add the values from column 2 of file 1 based on the identifier to file 2 as the second column. Thus, the merged file should look like this:

ID2 234 A C ...
ID3 232 G T ...
ID1 123 C T ...
ID4 344 A C ... 
...

Exactly the same as file 2 (same row order), but with a second column added. Identifiers are the values of the first column (present in both files). File 1 has more lines / identifiers than file 2. All identifiers from file 2 are in file 1, but not all IDs from file 1 are in file 2.

Does anyone know how to do this in unix / bash? Many thanks!

+3

merge unix bash

Abdel Mar 21 '12 at 14:32

1

kev · Accepted Answer · 2012-03-21T14:34:50+0000

$ join <(sort file1) <(sort file2)
ID1 123 C T ...
ID2 234 A C ...
ID3 232 G T ...
ID4 344 A C ...

file2

$ join -1 1 -2 2 <(sort file1) <(cat -n file2 | sort -k2,2) | sort -k3,3n | cut -d' ' -f1-2,4-
ID2 234 A C ...
ID3 232 G T ...
ID1 123 C T ...
ID4 344 A C ...

Unix: merge files based on column value

More articles: