TL DR: use GROUP EACH BY in your inner query.
Internal request
Select a, max(x) as b from Tbl1 group by a
not paralyzed. You can calculate partial results in parallel, but in order to know the global max(x) for each value of a ', you only need to run this run in one place. When you use the JOIN EACH in an external query, you instruct the query mechanism that you need to parallelize when executing the JOIN . However, you do not have a parallelizable source, so the request fails.
There are two ways to fix this: first, use an explicitly parallel version of the internal query - just use GROUP EACH BY instead of GROUP BY . This can be done in parallel, because the first step is to sort the base table in field a so that you can calculate the global maximum in field a in a parallel workstation. If the inner query is paralellizable, then the outer query can be executed in parallel. It will look like this:
Select Table1.a,Table1.b,Table2.c From (Select a, max(x) as b from Tbl1 group EACH by a
The second option is to use JOIN instead of JOIN EACH for an external request. This can lead to size restrictions depending on the size of table2 (since to perform not every join, the table on the right should be "small".).
This is admittedly suboptimal; You must understand how the request mechanism works in order for your requests to be executed. However, we (on BigQuery and DREMEL teams) are working hard to create “just work” queries, so you don’t have to deal with this kind of problem. We have made some progress in this, over the past few months some of these types of errors have been removed (for example, the inner join also failed until recently), but we still have ways.
Thank you for bringing this to our attention. I filed an internal error so that we could improve this work in the future.
source share