Thanks for sharing the @alan dataset! Let's see how it looks:

This is an interesting table: it has 3 columns and 3 rows (a small but regular SQL table). The interesting part is that the third column may contain nested records. The first line has nothing (null), the second line has only 1 value, and the third line has 5 different values.
Things get interesting when you start counting in a column:
SELECT COUNT(*) FROM [realself-main:rs_public.test_count] 3
It makes sense, the data set has 3 rows.
SELECT COUNT(dr_id) FROM [realself-main:rs_public.test_count] 3
It also makes sense, there are 3 dr_id.
SELECT COUNT(cover_photos.is_published) FROM [realself-main:rs_public.test_count] 6
Now everything has become more interesting. This is 6 because there are 6 values ββfor cover_photos.is_published (zero does not count).
SELECT COUNT(cover_photos.is_published), COUNT(dr_id) FROM [realself-main:rs_public.test_count] 6 3
That still makes sense: 6 cover_photos.is_published, 3 dr_id.
SELECT COUNT(*) FROM ( SELECT cover_photos.is_published, dr_id FROM [realself-main:rs_public.test_count] ) 3
This is also interesting: if we execute a subquery, COUNT (*) looks at the number of rows returned. 3 rows were returned. That still makes sense.
But then:
SELECT COUNT(*), COUNT(cover_photos.is_published) FROM ( SELECT cover_photos.is_published, dr_id FROM [realself-main:rs_public.test_count] ) 7 6
7 and 6. Seven? Why 7?
Well, BigQuery had to choose a smoothing strategy for the subquery. Look at the table that I inserted there, you see how it has 7 rows? These are seven counted lines.
Let's look at them explicitly:
SELECT dr_id, cover_photos.is_published FROM ( SELECT cover_photos.is_published, dr_id FROM [realself-main:rs_public.test_count] )

Cm? These are seven lines. When selecting rows with nested records (a good feature for BigQuery), BigQuery sometimes has to smooth data to process certain queries. The first 2 lines were flattened into exactly 2 lines (one with the lid_photos.is_published as null), and the third line was smoothed to 5 lines, one for each of its cover_photos.is_published.
The moral of this story is to be careful when working with embedded data: some queries will smooth it out in ways that are unexpected for the user, but which make a lot of sense to the computer when it tries to solve it.
Release deeper upon request:
Look at the difference between these two queries:
SELECT COUNT(*) FROM ( SELECT * FROM ( SELECT * FROM [realself-main:rs_public.test_count] WHERE is_published ) ) SELECT COUNT(*) FROM ( SELECT * FROM ( SELECT * FROM [realself-main:rs_public.test_count] ) ) WHERE is_published
Before looking at the results, you can guess what results each query will give you? No, you can not. Both queries are ambiguous, so to get BigQuery's answer you will need to make some guesses and optimizations.
The result for the first query is 7, and for the second - 3. Go and try.
Who cares? Well, looking at the results of these queries, I can say that in the second BigQuery I saw that the only column that interests you is is_published, so it optimizes the tree, so only this column is read. But BigQuery complicates the optimization of the first query - so it guesses: βMaybe they really want to,β and βmeans I need to flatten the table before passing it to the next level.β It aligns the table, so a later query contains 7 rows.
Is any of these results guaranteed? No - queries are ambiguous. How to reduce ambiguity? Instead of using "SELECT *", tell BigQuery which columns you want to look for, so you don't need to guess.