Amazon RDS Sync with Google BigQuery

People, the company I work for has some MySQL databases on AWS (Amazon RDS). We are creating POCs with BigQuery, and now I am learning how to replicate databases on BigQuery (existing registers and new ones in the future). My doubts:

  • How to replicate MySQL tables and rows in BigQuery. Is there any tool for this (I'm reading about the Amazon Database Migration Service)? Should I copy to Google Cloud SQL and export to BigQuery?
  • How to copy future registers? Is it possible to create a task inside MySQL to send new registers after a predefined number? For example, after entering 1000 new lines (or time), some event “fires”, and new registers are copied to Cloud SQL / BigQuery?

My initial idea is to reset the original database, load it into another and use the script to listen to new registers and send them to the new database.

Did I explain this correctly? It's clear?

+6
source share
2 answers

In addition to what Vadim said, you can try:

  • mysqldump for CSV files on s3 (I believe RDS allows this)
  • run the "gsutil" Google Cloud Storage utility to copy data from s3 to GCS
  • run "bq load file.csv" to load the file in BigQuery

I am interested to hear your experience, so feel free to ping me in private.

+2
source

You will need to use one of the ETL tools that integrate with mySQL and BigQuery to complete the initial data transfer and copy the subsequent changes to BigQuery. Take a look at the list of available tools [1]

You can also implement your own tool by developing a process that will extract data from mySQL into a CSV file and then load this file into BigQuery using data import [2]

[1] https://cloud.google.com/bigquery/third-party-tools

[2] https://cloud.google.com/bigquery/loading-data-into-bigquery

+1
source

All Articles