How to produce cartesian products in bash?

I want to create such a file (Cartesian product [1-3]X[1-5] ):

 1 1 1 2 1 3 1 4 1 5 2 1 2 2 2 3 2 4 2 5 3 1 3 2 3 3 3 4 3 5 

I can do this with a nested loop, for example:

 for i in $(seq 3) do for j in $(seq 5) do echo $i $j done done 

Is there any solution without using loops?

+6
source share
3 answers

Combine two extension extensions !

 $ printf "%s\n" {1..3}" "{1..5} 1 1 1 2 1 3 1 4 1 5 2 1 2 2 2 3 2 4 2 5 3 1 3 2 3 3 3 4 3 5 

This works using one extension extension:

 $ echo {1..5} 1 2 3 4 5 

and then combined with another:

 $ echo {1..5}+{a,b,c} 1+a 1+b 1+c 2+a 2+b 2+c 3+a 3+b 3+c 4+a 4+b 4+c 5+a 5+b 5+c 
+8
source

The best alternative for a Cartesian product in bash is by far, as @fedorqui pointed out, to use a parameter extension. However, if your input is not easy to get (ie. If {1..3} and {1..5} are missing), you can simply use join .

For example, if you want to convert the Cartesian product of two ordinary files, for example "a.txt" and "b.txt", you can do the following. Firstly, two files:

 $ echo -en {a..c}"\tx\n" | sed 's/^/1\t/' > a.txt $ cat a.txt 1 ax 1 bx 1 cx $ echo -en "foo\nbar\n" | sed 's/^/1\t/' > b.txt $ cat b.txt 1 foo 1 bar 

Note that the sed command is used to add each line with an identifier. The identifier must be the same for all lines, and for all files, so join will give you a Cartesian product - instead of deferring part of the resulting lines. So, join looks like this:

 $ join -j 1 -t $'\t' a.txt b.txt | cut -d $'\t' -f 2- ax foo ax bar bx foo bx bar cx foo cx bar 

After combining both files, cut used as an alternative to deleting a column with a previously added column.

+6
source

A shorter (but hacky) version of Rubens's answer:

 join -j 999999 -o 1.1,2.1 file1 file2 

Since the field 999999 most likely does not exist, it is considered equal for both sets, and therefore join must do the Cartesian product. It uses O (N + M) memory and produces 100..200 Mbps output on my machine.

I don’t like the shell extension method for parentheses, for example echo {1..100}x{1..100} for large datasets, because it uses O (N * M) memory and can be used when the machine is slopped to the knees. This is hard to stop because ctrl + c does not interrupt the bracket extension that is executed by the shell itself.

+3
source

All Articles