Recommended application storage backup strategies

Right now I am using remote_api and appcfg.py download_data to take a snapshot of my database every second. It takes a lot of time (6 hours) and is expensive. Without collapsing my own change-based backup (I would be too scared to do something like that), what is the best option to ensure that my data is safe from failure?

PS: I understand that Google data is probably more secure than mine. But what if one day I accidentally write a program that removes all this?

+7
source share
1 answer

I think you pretty much determined all of your options.

  • Trust Google not to lose your data, and hope that you do not accidentally ask them to destroy it.
  • Make a full backup with download_data , perhaps less than once a day if it's too expensive.
  • Create your own incremental backup solution.

Option 3 is a really interesting idea. You will need a modification timestamp for all objects, and you will not catch deleted objects, but otherwise it is very convenient with remote_api and cursors.

Edit

Here is a simple incremental bootloader for use with remote_api. Again, reservations are that they will not notice deleted objects, and it is assumed that all entities store the last modification time in a property called updated_at. Use it at your own risk.

 import os import hashlib import gzip from google.appengine.api import app_identity from google.appengine.ext.db.metadata import Kind from google.appengine.api.datastore import Query from google.appengine.datastore.datastore_query import Cursor INDEX = 'updated_at' BATCH = 50 DEPTH = 3 path = ['backups', app_identity.get_application_id()] for kind in Kind.all(): kind = kind.kind_name if kind.startswith('__'): continue while True: print 'Fetching %d %s entities' % (BATCH, kind) path.extend([kind, 'cursor.txt']) try: cursor = open(os.path.join(*path)).read() cursor = Cursor.from_websafe_string(cursor) except IOError: cursor = None path.pop() query = Query(kind, cursor=cursor) query.Order(INDEX) entities = query.Get(BATCH) for entity in entities: hash = hashlib.sha1(str(entity.key())).hexdigest() for i in range(DEPTH): path.append(hash[i]) try: os.makedirs(os.path.join(*path)) except OSError: pass path.append('%s.xml.gz' % entity.key()) print 'Writing', os.path.join(*path) file = gzip.open(os.path.join(*path), 'wb') file.write(entity.ToXml()) file.close() path = path[:-1-DEPTH] if entities: path.append('cursor.txt') file = open(os.path.join(*path), 'w') file.write(query.GetCursor().to_websafe_string()) file.close() path.pop() path.pop() if len(entities) < BATCH: break 
+3
source

All Articles