I just crossed an interesting question with BigQuery.
Essentially, there is a batch job that recreates a table in BigQuery - to delete data - and immediately starts feeding a new set via the stream interface.
Used to work in this for quite some time - successfully.
Recently, he began to lose data.
A small test case confirmed the situation - if the data flow begins immediately after the reconstruction (successfully!) Of the table, parts of the data set will be lost. That is, Of the 4,000 entries that are submitted, only 2,100 - 3,500 will do this.
I suspect that creating a table may succeed before table operations (deletion and creation) were successfully distributed throughout the environment, so the first parts of the data set are fed to old table replicas (speculating here).
To confirm this, I set a timeout between creating the table and starting the data feed. Indeed, if the timeout is less than 120 seconds, parts of the data set are lost.
If it is more than 120 seconds - it seems to work fine.
Previously not required for this timeout. We use US BigQuery. Did I miss something obvious here?
: , , - - , , . BigQuery append. -, , .