Large backups and redundancy

The Google Cloud Bigtable looks fantastic, however I have some questions about backups and backups.

Are there any options for backing up data to protect against human errors?

Clusters currently operate in the same zone - are there ways to mitigate the unavailability of the zone?

+5
source share
2 answers

One way to back up data available today is to start exporting MapReduce, as described here:

https://cloud.google.com/bigtable/docs/exporting-importing#export-bigtable

You are right that today access to the Bigtable Cluster is tied to the availability of the zone in which they operate. If higher availability is a concern, you can look at various replication methods for your records (for example, kafka), but that does add other complexity to the system you are building, for example, managing consistency between clusters. (What happens if there is an error in your software and you skip distributing some entries?)

Using a different system, such as Cloud Datastore, avoids this problem, as it is not one zonal system, but it provides other trade-offs to consider.

+4
source

It seems that the replication function is not available at this stage, so I see the following options, given that read access to the Write Ahead Log (or regardless of the BigTable TX log name) is not provided:

  • At Google We Trust. Rely on their expertise in accessibility and recovery. One of the attractions hosted by BigTable developers for HBase is the lower administrative overhead, without worrying about backups and restores.

  • Deploy the BigTable secondary cluster in another AZ and send it a copy of each mutation in asynchronous mode with more aggressive write buffering on the client, since low latency is not a priority. You can even deploy a regular HBase cluster instead of a BigTable cluster, but the extent to which the Google HBase client and Apache HBase client can coexist in the same runtime remains to be seen.

  • Copy mutations to a local file, uploaded on schedule, to the selected GCP storage classes: standard or DRA. Play files during recovery.

  • Option 3). Deploy a Kafka cluster distributed across multiple availability zones. Deploy the manufacturer and send Mutations to Kafka, its throughput should be higher than BigTable / HBase. Keep track of the bias and repetition of the Mutation, consuming messages from Kafka during recovery.

Another thought ... If the story was some lesson, AWS did not have the Multi-AZ option from the start. It took them some time to develop.

+1
source

All Articles