66% gearboxes begin to make an actual decrease (0-33% - shuffle, 33-66% - sorting). In conjunction with the hive, the gearbox performs a Cartesian product between two data sets.
I am going to suggest that there is at least one foreign key that often appears in all datasets. Watch for null and default values.
For example, in a join, imagine that the "abc" key appears ten times in each of the six tables (10 ^ 6). That a million output records for this one key. If "abc" appears 1000 times in one table, 1000 in another, 1000 in another, then twice in three other tables, you get 8 billion records (1000 ^ 3 * 2 ^ 3). You can see how it gets out of hand. I assume that there is at least one key that leads to a large number of output records.
This is a common practice that should be avoided in RDBMSs outside the Hive. Making multiple internal connections between many-to-many relationships can cause you many problems.
source share