Join bash like in SAS

I would like to join two files in bash using a common column. I want to save both all inimitable and unpaid lines from both files. Unfortunately, using join , I could save non-dangerous fields from only one file, for example. join -1 1 -2 2 -a1 -t" " .
I would also like to keep all the pairs for re-entries (in the join column) from both files. That is, if file1 is x id1 ab
x id1 cd
x id1 df
x id2 cx
x id3 fv

and the second file is

id1 df cf
id1 ds dg
id2 cv df
id2 as ds
id3 cf cg

the resulting file should be:

x id1 ab df cf
x id1 ab ds dg
x id1 cd df cf
x id1 cd ds dg
x id1 df df cf
x id1 df ds dg
x id2 cx cv df
x id2 cx as ds
x id3 fv cf cg

That's why I always used SAS to create such a join after sorting the corresponding columns.

data x;
merge file1 file2;
by common_column;
run;

It works fine, but
1. since I have been using Ubuntu for most of the time, I have to switch to Windows to integrate the data into SAS. 2 .. Most importantly, SAS can trim data records that are too long.

This is why I would prefer to join my files in bash, but I don't know the corresponding command.
Can someone help me or direct me to the appropriate resource?

+3
source share
2 answers

According to join help page -a <filenum> saves all fatal lines from the <filenum> file (1 or 2). So, just add -a1 -a2 to your command line, and you should do it. For example:

 # cat a 1 blah 2 foo # cat b 2 bar 3 baz # join -1 1 -2 1 -t" " ab 2 foo bar # join -1 1 -2 1 -t" " -a1 ab 1 blah 2 foo bar # join -1 1 -2 1 -t" " -a2 ab 2 foo bar 3 baz # join -1 1 -2 1 -t" " -a1 -a2 ab 1 blah 2 foo bar 3 baz 

Is this what you were looking for?

Edit:

Since you provided more detailed information, here is how to create the desired result (note that my file a is your first file and my file b your second file. I had to cancel -1 1 -2 2 to -1 2 -2 -2 1 for joining id). I also added a list of fields for formatting the output - note that "0" is the union field in it:

 # join -1 2 -2 1 -o 1.1,0,1.3,1.4,2.2,2.3 ab 

creates what you gave. Add -a1 -a2 to save unrecoverable lines from both files, then you will get two more lines (you can guess my test data from them):

 x id4 ut id5 ui oi 

This is pretty impenetrable since any remaining field is just space. Therefore, replace them with "-", which will result in:

 # join -1 2 -2 1 -a1 -a2 -e- -o 1.1,0,1.3,1.4,2.2,2.3 ab x id1 ab df cf x id1 ab ds dg x id1 cd df cf x id1 cd ds dg x id1 df df cf x id1 df ds dg x id2 cx cv df x id2 cx as ds x id3 fv cf cg x id4 ut - - - id5 - - ui oi 
+4
source

If the join command is not powerful enough, I usually use sqlite if I need to perform such operations in the shell.

You can easily import flat files into tables, and then SQL SELECT with the proper connection.

Note that with sqlite you can use index to make the connection even faster .

 sqlite3 << EOF! CREATE TABLE my table1 (.... -- define your table here CREATE TABLE my table2 (.... -- define your table here .separator "," -- define input field separator here if needed .import input_file.txt mytable1 .import input_file.txt mytable2 SELECT ... JOIN ... EOF! 

sqlite is free and mutiplatform. Very comfortably.

+1
source

All Articles