SQL Summary

Any idea how to calculate the total in BigQuery SQL?

id value running total -- ----- ------------- 1 1 1 2 2 3 3 4 7 4 7 14 5 9 23 6 12 35 7 13 48 8 16 64 9 22 86 10 42 128 11 57 185 12 58 243 13 59 302 14 60 362 

Not a problem for traditional SQL servers using either a correlated scalar query:

 SELECT a.id, a.value, (SELECT SUM(b.value) FROM RunTotalTestData b WHERE b.id <= a.id) FROM RunTotalTestData a ORDER BY a.id; 

or join:

 SELECT a.id, a.value, SUM(b.Value) FROM RunTotalTestData a, RunTotalTestData b WHERE b.id <= a.id GROUP BY a.id, a.value ORDER BY a.id; 

But I could not find a way to make it work in BigQuery ...

+6
source share
3 answers

You probably already understood that. But here is one, and not the most effective way:

JOIN can only be performed using equality comparisons, i.e. b.id <= a.id cannot be used.

https://developers.google.com/bigquery/docs/query-reference#joins

It's pretty lame if you ask me. But there is one job. Just use equality comparison for some dummy value to get the Cartesian product, and then use WHERE for <=. This is insanely suboptimal. But if your tables are small, this will work.

 SELECT a.id, SUM(a.value) as rt FROM RunTotalTestData a JOIN RunTotalTestData b ON a.dummy = b.dummy WHERE b.id <= a.id GROUP BY a.id ORDER BY rt 

You can also manually limit the time:

 SELECT a.id, SUM(a.value) as rt FROM ( SELECT id, timestamp RunTotalTestData WHERE timestamp >= foo AND timestamp < bar ) AS a JOIN ( SELECT id, timestamp, value RunTotalTestData WHERE timestamp >= foo AND timestamp < bar ) b ON a.dummy = b.dummy WHERE b.id <= a.id GROUP BY a.id ORDER BY rt 

Update:

You do not need a special property. You can just use

 SELECT 1 AS one 

and join this.

As billing moves to the connection table in processing.

+2
source
Update

2013: you can use SUM () OVER () to calculate current totals.

In your example:

 SELECT id, value, SUM(value) OVER(ORDER BY id) FROM [your.table] 

Working example:

 SELECT word, word_count, SUM(word_count) OVER(ORDER BY word) FROM [publicdata:samples.shakespeare] WHERE corpus = 'hamlet' AND word > 'a' LIMIT 30; 
+21
source

The problem is the second query that BigQuery will be UNION 2 tables in the FROM expression.

I am not sure about the first, but it is possible that bigquery does not like subsamples in Select expressions, only with FromExpression. So you need to move the subquery to the expression fromexpression and JOIN the results.

Alternatively, you can try our JDBC driver: JCBC Starchema BigQuery driver

Just load it into Squirrel SQL or RazorSQL or some tool that supports JDBC drivers, make sure you enable Query Transformer by installing:

transformQuery = true

In properties or in the JDBC URL, all information can be found on the project page. After that, try to run the second query, it will be converted to a BigQuery-compatible connection.

+1
source

All Articles